2018

Data Science Survey

In spring 2018, JetBrains polled over 1,600 people involved in Data Science and based in the US, Europe, Japan, and China, in order to gain insight into how this industry sector is evolving. Here's what we learned.

Methodology

We distributed the survey via targeted ads on Facebook, Twitter, and LinkedIn. We screened respondents by excluding those who replied "I am not involved in data analysis." We collected 400 complete and valid responses from the US, Japan, and China. To represent Europe, we used quotas for select European countries to collect a set of responses which also totaled 400.

Some bias is likely present as JetBrains users may have been more willing on average to complete the survey.

The raw survey data are available for your perusal.

Key Takeaways

Primary language
Most people assume that Python will remain the primary programming language in the field for the next 5 years.
R, Keras and Tableau
Data Science professionals tend to use R, Keras, and Tableau, while amateur data scientists are more likely to prefer Microsoft Azure ML.
1

Tasks

Types of activities

Number of answers: 1666

The above table is based on data from two questions, “Which of the following are areas of interest to you?” and “Which of the following are you involved in?”.

Those who are professionally involved in data analysis have to take care of different types of activities. Data processing and basic statistics are important to many people in the industry, while those who work with data as a hobby tend to focus more on the things they like and enjoy. Specifically, hobbyists tend to prefer working with data visualisation than statistics.

Natalia Vassilieva

Head of Software and AI, Hewlett Packard Labs

According to the very first table, a relatively small percentage of the respondents work on model deployment in production mode. This correlates with multiple market research studies which report that the majority of enterprises are just starting to explore machine learning and deep learning.

They have small teams working on PoCs*, and model deployment in production still needs to be addressed. But I expect this type of activity to become more and more visible within the next few years when more and more businesses will proceed from PoCs to production deployments.

*PoCs — Proof of concepts

2

Programming Languages and Tools

Main programming language for data analysis

Number of answers: 1522

Do you plan to adopt/migrate to other languages in the next 12 months?

Number of answers: 1522

15% of data scientists are going to adopt or migrate to C++ in the next 12 months. This is probably due to performance issues.

In your opinion, what programming language will be most used for data analysis in the next 5 years?

Number of answers: 1522

Most respondents believe that Python will remain on top for the next 5 years.

Overall, people tend to choose the language they use. Of those who don’t use a language they think will dominate, most want to start using it. Half of those who believe Kotlin will dominate are planning to adopt it in the nearest future.

Natalia Vassilieva

Head of Software and AI, Hewlett Packard Labs

No surprises with the programming languages. Traditional data scientists are some of the most likely to still use R, there are plenty of statistics libraries for R. The new generation of data scientists are choosing Python.

When it comes to high-performance data analytics, I’d expect to see C/C++ in the picture. Currently, we are observing that many HPC techniques and tools are being adopted and re-used for high-performance data analytics and deep learning.

Kotlin adopters

7% of data analysts using a programming language want to adopt Kotlin for Data Science in the nearest future.

Which of the following best describes your job roles regardless of your position level?

Number of answers: 60

What programming languages do you regularly use for data analysis, if any?

Number of answers: 112

Kotlin Learning

Vitaly Khudobakhshov

Product Manager, JetBrains

Kotlin is a general-purpose language running on the Java virtual machine. It is concise and easily integrates with popular data processing frameworks such as Hadoop and Spark.

Kotlin is statically typed and uses type inference that increases its reliability. These features all make Kotlin a handy instrument for data engineering and data science.

Thomas Nield has assembled a helpful collection of Kotlin resources for data science on his Github.

If you are new to Kotlin and are considering it for your next language, start from learning the basic syntax.

If you are already familiar with Java, you may want to have a play with Kotlin Koans.

3

Tools & Technologies & Editors

Big Data tools

Number of answers: 1477

One third of those who say they work with big data don’t use any big data tools. Conversely, a third of those who do NOT work with Big data DO use some big data tools. Still, this self-identification does correlate with formal factors.

IDEs and Editors

Number of answers: 1522

Which tools do you use for data analysis, if any?

Number of answers: 1666

What deep learning libraries do you use, if any?

Number of answers: 1666

Which statistics packages do you use to analyze and visualize data, if any?

Number of answers: 1666

What operating systems do you use as your work environment for data analysis?

Number of answers: 1666

What do you use to perform computations?

Number of answers: 1666

78% data science specialists perform computations on local machines.

Cloud services

Number of answers: 527

4

Non-programmers

We received 77 responses from people who don’t use any programming languages and aren’t about to adopt any (5% of all data analysts who responded).

Which statistics packages do you use to analyze and visualize data, if any?

Number of answers: 77

What is the industry you primarily analyze data for?

Number of answers: 43

These respondents use spreadsheet editors more often than average, and most of them work in non-IT industries. They also tend to use data analysis tools less often.

5

Manager’s expertise

What is your manager's level of expertise in data analysis?

Number of answers: 924

To what extent do you associate the following phrase with your manager?: "My manager gives me realistic assignments that are relevant to my skills and responsibilities, with a clear and specific description of the requirements."

Number of answers: 918

Answers:

1 = not at all

5 = a great deal

Correlation of Manager’s expertise and assignment score

Number of answers: 924

Answers:

1 = not at all

5 = a great deal

6

Industry & Demographic

What is the industry you primarily analyze data for?

Number of answers: 1666

Which fields of IT do you primarily analyze data for?

Number of answers: 733

Which industry or industries do you primarily analyze data for?

Number of answers: 933

Demographics

Age range

Number of answers: 1666

What is your main employment status?

Number of answers: 1666

What is the highest level of education you have completed?

Number of answers: 1666

Work experience

Number of answers: 924

This question was directed to professionals, that is, people professionally involved in data science or data analysis and working full-time or part-time.

Work environment and employment

Position level

Number of answers: 1014

Company size

Number of answers: 1086

Number of data analysts in company

Number of answers: 1666

Job Role

Number of answers: 917

JetBrains products for data science and big data

PyCharm Professional Edition is a Python IDE that enables Data Scientists and Web developers to become far more productive.

It offers in-depth Python code analysis and integrates with various libraries, frameworks, and tools. PyCharm's scientific tools are designed specifically with professional data analysts in mind and include a scientific development mode, integration with conda, code cells, Jupyter Notebook support, and much more. There is first-class support available for SQL databases as well.

Datalore is an intelligent web application for data analysis and visualization for Python, with built-in tools and libraries for machine learning all included.

The smart Python code editor helps users write better code with suggestions, autocompletion, and syntax highlighting. Incremental recalculation enables dependencies between multiple computations to be followed, so users don’t have to track which parts of the code were affected by recent edits. And there is access to the extended data storage and high-performance computational resources (including GPU instances) for an enhanced exploration experience.

Thank you for your time!
We hope you found our report useful.

If you have any questions or suggestions,
please contact us at surveys@jetbrains.com.