In spring 2018, JetBrains polled over 1,600 people involved in Data Science and based in the US, Europe, Japan, and China, in order to gain insight into how this industry sector is evolving. Here's what we learned.
We distributed the survey via targeted ads on Facebook, Twitter, and LinkedIn. We screened respondents by excluding those who replied "I am not involved in data analysis." We collected 400 complete and valid responses from the US, Japan, and China. To represent Europe, we used quotas for select European countries to collect a set of responses which also totaled 400.
Some bias is likely present as JetBrains users may have been more willing on average to complete the survey.
The raw survey data are available for your perusal.
The above table is based on data from two questions, “Which of the following are areas of interest to you?” and “Which of the following are you involved in?”.
Those who are professionally involved in data analysis have to take care of different types of activities. Data processing and basic statistics are important to many people in the industry, while those who work with data as a hobby tend to focus more on the things they like and enjoy. Specifically, hobbyists tend to prefer working with data visualisation than statistics.
According to the very first table, a relatively small percentage of the respondents work on model deployment in production mode. This correlates with multiple market research studies which report that the majority of enterprises are just starting to explore machine learning and deep learning.
They have small teams working on PoCs*, and model deployment in production still needs to be addressed. But I expect this type of activity to become more and more visible within the next few years when more and more businesses will proceed from PoCs to production deployments.
15% of data scientists are going to adopt or migrate to C++ in the next 12 months. This is probably due to performance issues.
Most respondents believe that Python will remain on top for the next 5 years.
Overall, people tend to choose the language they use. Of those who don’t use a language they think will dominate, most want to start using it. Half of those who believe Kotlin will dominate are planning to adopt it in the nearest future.
No surprises with the programming languages. Traditional data scientists are some of the most likely to still use R, there are plenty of statistics libraries for R. The new generation of data scientists are choosing Python.
When it comes to high-performance data analytics, I’d expect to see C/C++ in the picture. Currently, we are observing that many HPC techniques and tools are being adopted and re-used for high-performance data analytics and deep learning.
7% of data analysts using a programming language want to adopt Kotlin for Data Science in the nearest future.
Product Manager, JetBrains
Kotlin is a general-purpose language running on the Java virtual machine. It is concise and easily integrates with popular data processing frameworks such as Hadoop and Spark.
Kotlin is statically typed and uses type inference that increases its reliability. These features all make Kotlin a handy instrument for data engineering and data science.
One third of those who say they work with big data don’t use any big data tools. Conversely, a third of those who do NOT work with Big data DO use some big data tools. Still, this self-identification does correlate with formal factors.
78% data science specialists perform computations on local machines.
We received 77 responses from people who don’t use any programming languages and aren’t about to adopt any (5% of all data analysts who responded).
These respondents use spreadsheet editors more often than average, and most of them work in non-IT industries. They also tend to use data analysis tools less often.
1 = not at all
5 = a great deal
This question was directed to professionals, that is, people professionally involved in data science or data analysis and working full-time or part-time.
Work environment and employment
PyCharm Professional Edition is a Python IDE that enables Data Scientists and Web developers to become far more productive.
It offers in-depth Python code analysis and integrates with various libraries, frameworks, and tools. PyCharm's scientific tools are designed specifically with professional data analysts in mind and include a scientific development mode, integration with conda, code cells, Jupyter Notebook support, and much more. There is first-class support available for SQL databases as well.
Datalore is an intelligent web application for data analysis and visualization for Python, with built-in tools and libraries for machine learning all included.
The smart Python code editor helps users write better code with suggestions, autocompletion, and syntax highlighting. Incremental recalculation enables dependencies between multiple computations to be followed, so users don’t have to track which parts of the code were affected by recent edits. And there is access to the extended data storage and high-performance computational resources (including GPU instances) for an enhanced exploration experience.
If you have any questions or suggestions,
please contact us at email@example.com.