Data Collection and Use Policy

The JetBrains AI service can collect two types of data related to the usage of AI features: behavioral and detailed data. Both of these types of data collection are fully controlled by the user.

The data from the JetBrains AI service is sent to third-party language model providers (such as OpenAI), which means that said data is also processed on those providers’ servers (and according to their policies). Neither the user nor JetBrains has control over this third-party data processing. JetBrains does not work with large language model (LLM) providers that use customer data for training models, but providers can store data for other purposes such as abuse/misuse monitoring. Please check the list of engaged third-party language model providers and the documents describing how they handle the data here. JetBrains accesses OpenAI services through its API, so the data submitted through the JetBrains AI service is handled as “API Content” according to OpenAI’s terms of use.

Behavioral Data Collection

Behavioral data collection includes such data as:

  • Types of AI features used.

  • Rates of acceptance for suggestions from different AI features.

  • Performance data (such as the amount of time it took to generate AI suggestions).

  • User feedback on the quality of results produced by different AI features.

This type of data does not include any personally identifiable data, or any source code files or fragments from the user’s project.

This data is used by various teams at JetBrains for analyzing product usage, improving product features, and training machine learning (ML) models that control the behavior of different product features (for example, controlling the automatic activation of ML features). It is not used for training ML models that generate code, text, or another type of data from which outputs could be extracted.

Collection of this type of data is controlled by the standard data sharing settings (see the product documentation for details). It is disabled by default in EAP and release builds.

Detailed Data Collection

Detailed data collection includes full data about the interactions with large language models. This means the full text of inputs sent by the IDE to the large language model and its responses, including source code snippets.

Collection of this type of data is controlled by the option Tools | AI Assistant | Data Sharing | Allow detailed data collection. It is disabled by default in both EAP and release builds. Detailed data collection is only performed when the user enables this option and gives explicit consent to the collection of detailed data.

Access to the collected data will be restricted only to the teams at JetBrains that specifically work on large language model development and integration. This data will be analyzed to understand product usage and identify opportunities for improvement. It will not be used for training any ML models that generate code or text, or revealed in any form to any other users.

We will also implement a retention policy for this data; it will be stored only for a limited amount of time not exceeding one year.

If the user does not opt in to detailed data collection, the inputs will be sent directly to the LLM provider and processed according to their data collection and use policy, and the outputs will be sent directly to the user IDE. The inputs and outputs will not be persistently stored on JetBrains servers.

Last modified: 05 December 2023