Ezzat Alhalabi
Kaggle 2021 ML & Data Science Survey
25,973 respondents EDAView Notebook
Segment
% of group selecting each option · multiple choice
Total Respondents
25,973
Global survey responses
Industry Professionals
~74%
~19,200 working professionals
Students
~26%
~6,750 students surveyed
Top Country
India
~25% of all respondents

Respondent Demographics

Age: The 25–29 bracket is the largest group overall. Students skew 18–21, professionals skew 25–34.
Country: India leads at ~25%, followed by the United States at ~15% — reflecting Kaggle's user base distribution.
Gender: ~83% of respondents identify as male, underscoring a persistent diversity challenge across the industry.

Age Distribution

Q1

Age group of all survey respondents

Top 5 Countries by Respondents

Q3

Share of total respondents (%) by country of residence

Gender Breakdown

Q2

Gender identity across all respondents

Education Level — Students vs Professionals

Q4

Highest level of formal education (% of group)

Students
Professionals

Languages & Development Environments

Python is used by ~90% of both groups — the clear lingua franca of ML and Data Science.
SQL adoption gap: Professionals use SQL considerably more (~58%) than students (~35%), driven by production data workflows.
VSCode is the fastest-growing IDE and nearly matches Jupyter Notebook usage among professionals.

Programming Languages Used Regularly

Q7

Percentage of each group selecting each language (multiple choice, top 8)

Students
Professionals

IDEs Used Regularly

Q9

Integrated development environments (top 6)

Students
Professionals

Hosted Notebook Platforms

Q10

Notebook products used on a regular basis (top 5)

Students
Professionals

Machine Learning Frameworks & Algorithms

Scikit-learn tops both groups — the standard starting point for classical ML across all experience levels.
XGBoost gap: Professionals favour XGBoost (~38%) far more than students (~22%), reflecting production modelling demands.
Linear / Logistic Regression remains the most used algorithm in both groups, underpinning a majority of real-world tasks.

ML Frameworks Used Regularly

Q16

Top 6 frameworks by group (% selecting)

Students
Professionals

Visualization Libraries

Q14

Data visualization tools used regularly (top 5)

Students
Professionals

ML Algorithms Used Regularly

Q17

Algorithm usage by group — % selecting each (top 6)

Students
Professionals

Cloud Platforms & Databases

AWS is the most desired cloud platform — strong aspiration from students and solid professional adoption alike.
MySQL leads database usage in both groups, especially among students learning relational data fundamentals.
PostgreSQL shows a professional lean, adopted more where data engineering and production pipelines are involved.

Cloud Platform Interest — Next 2 Years

Q27

Platforms respondents hope to become more familiar with (top 5)

Students
Professionals

Databases & Big Data Products

Q32

Products used on a regular basis (top 6)

Students
Professionals

Cloud Infrastructure Products — Next 2 Years

Q29

Specific cloud products students and professionals plan to learn

Students
Professionals

Business Intelligence & ML Management Tools

Tableau & Power BI are the two most in-demand BI tools with strong aspiration across both groups.
MLflow leads ML experiment tracking interest among professionals seeking reproducible workflows.
Amazon SageMaker tops managed ML platform interest, reflecting AWS's dominant cloud position in production ML.

Business Intelligence Tools

Q34

BI tools respondents hope to learn in next 2 years (top 5)

Students
Professionals

Managed ML Platforms

Q31

Managed ML products to learn in next 2 years (top 5)

Students
Professionals

ML Experiment Tracking Tools

Q38

Tools respondents plan to adopt in next 2 years (top 5)

Students
Professionals

Learning Platforms & Media Sources

Coursera is the most-used platform across both groups, particularly popular with students seeking structured learning paths.
Udemy shows stronger adoption among professionals who prefer flexible, on-demand skill-building.
Blogs (Towards Data Science, Analytics Vidhya) and YouTube are the top two media sources for staying current in the field.

Learning Platforms

Q40

Platforms used to begin or complete Data Science courses (top 6)

Students
Professionals

Favourite Media Sources

Q42

Sources for ML & Data Science news and content (top 5)

Students
Professionals