Our data science team is always learning and experimenting which means they are closely connected to other research in this area around the world.
Data Science through the looking glass and what we found there
- BY Madhura Jayaratne
- DATE: April 14, 2020
Why we love it
The authors are from Microsoft and they perform one of the largest analysis of Data Science projects to date, focusing on key information that helps both Data Science solution builders and practitioners alike. They analyse publicly accessible Python notebooks in GitHub and Machine Learning pipelines from a corporate Machine Learning platform, AnonSys. While some of the findings are not so surprising such as the 4-fold growth in number from 2017 to 2019, Python emerging as a de-facto standard for Data Science etc. some of the findings are quite interesting.
What I learnt from it
Some of the interesting findings include,
1) “Big” (i.e., most used) libraries are becoming “bigger”, consolidating well in the DS field.
2) Deep Learning is becoming more popular, yet accounts for less than 20% of DS today.
3) Analysis of the top libraries and top transformers used in Data Science pipelines points to how text, a source of unstructured data, is being tapped in to.
Above all, it is fascinating to see how Data Science/Machine Learning is becoming a ubiquitous technology.
Why it’s a must
The paper uncovers the current state and a number of trends in the Data Science/Machine Learning field. These trends provide practitioners with a good indication on which technology or libraries they should invest their time in.
Who should pay attention