Data collection and use policy
This document describes how JetBrains handles the JetBrains AI service usage related data.
The JetBrains AI service can collect two types of data related to the usage of AI features:
Behavioral data
Detailed data
The user fully controls both types of data collection.
The data from the JetBrains AI service is sent to third-party language model providers (such as OpenAI), which means this data is also processed on the servers of these providers (and according to their policies); neither the user nor JetBrains has control over this third-party data processing. JetBrains does not work with the large language model providers that use customer data for training models.
Please check the list of the engaged third-party language model providers and the documents describing how they handle the data here.
Behavioral data collection
Behavioral data collection includes data such as:
Types of AI features used.
Rates of acceptance for suggestions from different AI features.
Performance data (for example, the amount of time it takes to generate AI suggestions).
User feedback on the quality of results produced by different AI features.
Behavioral data does not include any personally identifiable data or any source code files or fragments from the user project.
This data is used by various teams at JetBrains for analyzing product usage, improving product features, and training machine learning (ML) models that control the behavior of different product features (for example, controlling the automatic activation of ML features). It is not used for training ML models that generate code or text, or another type of data from which outputs could be extracted.
Collection of behavioral data is controlled by the standard data sharing settings (see the product documentation for details). It is enabled by default in EAP builds and disabled by default in release builds.
Detailed data collection
Detailed data collection includes complete data about interactions with large language models. This means the full text of inputs sent by the IDE to the large language model and its responses, including source code snippets.
Access to the collected data will be restricted only to the teams at JetBrains that specifically work on large language model development and integration. This data will be analyzed to understand product usage and identify opportunities for improvement. It will not be used for training any ML models that generate code or text, or revealed in any form to any other users.
We will also implement a retention policy for this data; it will be stored only for a limited amount of time not exceeding 30 days.
Collection of detailed data is enabled only based on explicit approval of users and is controlled in the product settings.
If the user does not opt in to detailed data collection, the inputs will be sent directly to the LLM provider and processed according to their data collection and use policy. The outputs will be sent directly to the user IDE. The inputs and outputs will not be persistently stored on JetBrains servers.
For more information on zero-data retention (ZDR), see Data retention.