Methodology

This is a public report, and its contents may be used as long as the source is appropriately credited.

The number of respondents

More than 38,000 people participated in the Developer Ecosystem Survey 2022. To ensure we were working with the most representative sample possible, we cleaned the data through the process described below. As a result the report is based on the input of 29,269 developers from 187 countries and regions, including two responses reportedly from Antarctica. The data was weighted according to several criteria, as described in the closing portions of this section.

The data cleaning process

We used partial responses except in cases where the respondent left the survey before answering the questions about their primary programming languages. We also used a set of criteria to identify and exclude suspicious responses. Here are some of the indicators we checked for:

Surveys that were filled out too fast.
Surveys from identical IP addresses, as well as surveys with responses that were overwhelmingly similar. If two surveys with the same IP address were more than 75% identical, we kept the one that was more complete.
Surveys with conflicting answers, for example, “18–20 years old” and “more than 16 years of professional experience”.
Surveys with only a single option chosen for almost all the multiple-choice questions.
Surveys submitted from the same email address. In such cases, we kept the survey that was the most complete.

Reducing the response burden

This year, the survey consisted of 527 questions. Though our goal was to cover as many research topics as possible, and despite the applied logic, we still felt it was too long.

To shorten the survey and reduce its response burden, we took measures to randomize some of the questions:

We randomized 8 sections, of which each respondent saw only 2:
- Continuous Integration, issue tracking, and VCS
- DevOps and hosting
- Static analysis, open-source, etc.
- Education
- Cross-platform and microservices
- Communication tools
- Security
- Remote and collaborative development
We randomly showed the sections about the most popular languages, such as Java, JavaScript, SQL, Python, and GraphQL, to 50% of qualified respondents.
We also randomly hid the questions that didn’t have any logic or dependencies.

Despite our measures to reduce the work required to complete the survey, respondents on average spent about 30–40 minutes filling it out, which we still think is too much. We are already thinking of ways to improve the experience next year.

Targeting our audience

To invite potential respondents to complete the survey, we used Twitter ads, Facebook ads, Instagram, Quora, and JetBrains’ own communication channels. We also posted links to some user groups and tech community channels, and we asked our respondents to share the survey with their peers.

Countries and regions

We collected sufficiently large samples from 14 countries: Argentina, Brazil, Canada, China, France, Germany, India, Japan, Mexico, South Korea, Spain, Turkey, the United Kingdom, and the United States.

This year, we avoided using paid ads to collect responses from Belarus, Russia, and Ukraine. The responses from Belarus were combined with Eastern Europe, the Balkans, and the Caucasus region.

The remaining countries were distributed among 6 regions:

Africa, the Middle East, and Central Asia
Eastern Europe, the Balkans, and the Caucasus
Northern Europe and Benelux
Other European countries
Southeast Asia and Oceania, Australia, and New Zealand
Central and South America

For each geographical region (except for Canada and Japan), we collected at least 300 responses from external sources, such as ads.

Localization

To minimize any potential bias against respondents who don’t speak English, the survey was also available in 8 additional languages: Chinese, French, German, Japanese, Korean, Brazilian Portuguese, Spanish, and Turkish.

Sampling-bias reduction

The report is based on the data weighted according to where the responses came from. We took as a base responses collected from external sources that are less biased toward JetBrains users, such as paid ads on Twitter, Facebook, Instagram, Quora, and respondents’ referrals. We took each respondent’s source into account individually to generate results based on the weighting procedures.

We performed three stages of weighting to get a less-biased picture of the worldwide developer population.

First weighting stage: adjusting for the populations of professional developers in each region

In the first stage, we assembled the responses collected while targeting different countries, and then we applied our estimations of the populations of professional developers in each country to these data.

First we took the survey data on professional developers and working students that came from ads posted on various social networks in the 20 regions, along with the data that came from various peer referrals. Though we did not collect data for Russia and Ukraine this year, we included these two countries in the report and weighted them by using an approximation from last year's data. We reasoned that both countries have a significant number of developers, and removing them from the report could have unforeseen consequences.

Then we weighted the responses according to our estimated populations of professional developers in those 22 regions. This ensured that the distribution of the responses corresponded to the numbers of professional developers in each country.

Second weighting stage: the proportions of currently employed and unemployed developers

In the second stage, we forced the proportion of students and unemployed respondents to be 17% in every country. We did this to maintain consistency with the previous year’s methodology, as that is the only estimate of their populations we have available.

By this point, we had a distribution of 14,330 responses from external sources weighted both by region and employment status.

Third weighting stage: employment status, programming languages, and JetBrains product usage

The third stage was rather sophisticated, as it included calculations obtained by solving systems of equations. We took those 14,330 weighted responses, and for the developers from each region, in addition to their employment status, we calculated the shares for each of the 30+ programming languages, as well as the shares for those who answered “I currently use JetBrains products” and “I have never heard of JetBrains or its products”. Those shares became constants in our equations.

The next step was to add two more groups of responses from other sources: JetBrains internal communication channels, such as JetBrains social-network accounts and our research panel, and social-network ad campaigns targeted at users of certain programming languages. This yielded 14,939 more responses, which we again weighted to keep all those shares the same.

Solving the system of 30+ linear equations and inequalities

We composed a system of 30+ linear equations and inequalities that described:

The weighting coefficients for the respondents (as a hypothetical example, Fiona from our sample represents on average 180 software developers from France).
The specific values of their responses (for example, Pierre uses C++, he is fully employed, and he has never heard of JetBrains).
The necessary ratios among the responses (for example, 27% of developers have used C++ in the past 12 months, and so on).

In order to solve this system of equations with the minimum variance of the weighting coefficients (which is important!), we used the dual method of Goldfarb and Idnani (1982, 1983), which helped us collate the optimal individual weighting coefficients for the 29,269 total respondents.

Lingering bias

Despite these measures, some bias is likely present, as JetBrains users might have been more willing on average to complete the survey.

Also, our community ecosystem is evolving, and there might be some data fluctuations despite our weighting stages and efforts. For instance, in 2021 there was a substantial increase in the number of PHP developers (specifically, Laravel) that we polled. The reason was that the personal survey sharing links were posted in some PHP communities, and the link to our blog post was also tweeted by Laravel’s Twitter account. This attracted a disproportionately high share of PHP and Laravel developers to our survey. We will improve our weighting algorithms to compensate for such outbreaks.

We will continue to update and improve our weighting methodology in the future. Stay tuned to see what we do for DevEco 2023!