ML for Data Scientists Workshop
This course is aimed at practitioners, who want to analyze data with machine learning algorithms.
The four-week workshop follows a "flipped classroom" format, where the participants study the theory and solve the exercises at their own speed and the results are then discussed in the group sessions.
- Understand the fundamentals of machine learning.
- Overview of the most important machine learning algorithms and what needs to be considered when using them.
- First practical experience in the application of machine learning algorithms by solving a variety of exercises.
- Basic knowledge of the central tools of the Python data science ecosystem (numpy, scipy, matplotlib, pandas, scikit-learn, pytorch/keras).
- Good command of the English language (all materials are in English)
- Math skills at university entrance level (especially linear algebra: matrices & vectors, functions)
- Good programming skills (ideally using Python)
- Intrinsically motivated to learn new things
- Desirable: First practical experience analyzing data
- Technical requirements: Python installation on your own computer or access to a cloud environment, where you can run Jupyter notebooks (e.g., Google Colab)
- Complete the Preparation part before the first group session (about 20min; see below)
The course consists of six group sessions, where we discuss questions and results, and a self-study part before each group session, where you read through the chapters of the book, take notes in the workbook, test yourself with short quizzes, and work through the programming exercises. Your answers to the questions in the workbook are the basis for the group discussions.
Important: While this course is meant to be taken in parallel to your normal job, you should block at least six hours in your calendar for each self-study part to make sure you have enough time to work through all of the materials and come prepared to the group discussions. Please complete the Preparation part before the first group session!
The group sessions take place remotely via Microsoft Teams, Google Meet, Zoom, Slack, or similar. Please join the calls with your camera turned on so the sessions feel a bit more personal.
How to get the most out of this course
Think of the questions in the workbook as questions an interested colleague might ask you after the course. Following the Feynman Technique, try to explain what you've learned in your own words, which is the easiest way to identify any gaps in your knowledge.
Don't try to memorize any facts, but instead connect them with what you already know and make sure you understand the "why" behind the answers.
And if anything seems confusing, please make a note of it and ask in the next meeting!
Preparation before the course (~20min)
- Read the first chapter of the book: "Introduction"
- Answer the first questions in the workbook [docx]
1st Group Session (~1h):
- Discuss the first workbook questions from the preparation part
- Distribution of topics for the ML use cases spotlight presentations
Self-Study Part 1: What is ML?
- Read the chapter: "The Basics"
- Again, don't forget to take notes in the workbook [docx]
- Answer Quiz 1 (also available as a pdf in case you can't access Google Forms)
- Answer Quiz 2 [pdf]
- Read the chapter: "ML with Python"
- Complete the Python tutorial
- Answer Quiz 3 [pdf]
- Prepare a 90-second Spotlight presentation [de] for one of the given ML use cases
- Read the chapter: "Data Analysis & Preprocessing"
2nd Group Session (~3h):
- Discuss workbook questions
- Spotlight presentations
Self-Study Part 2: Your first algorithms
- Read the chapter: "Unsupervised Learning"
- Work through Notebook 1: visualize text (after the section on dimensionality reduction)
- Work through Notebook 2: image quantization (after the section on clustering)
- Read the chapter: "Supervised Learning Basics"
- Answer Quiz 4 [pdf]
3rd Group Session (~3h):
- Discuss workbook questions & notebooks
Self-Study Part 3: Avoiding common pitfalls
- Read the chapter: "Supervised Learning Models"
- In parallel, work through the respective sections of Notebook 3: supervised comparison
- Read the chapter: "Avoiding Common Pitfalls"
- Work through Notebook 4: analyze toy dataset
- Case Study! Work through Notebook 5: quality prediction. Plan at least 3 hours for this! And feel free to work in pairs, especially if you're still new to Python.
- Schedule at least 1 check-in session with me to discuss your progress and make sure you're on the right track (after working about 1-2 hours on the case study)
4th Group Session (~3h):
- Discuss workbook questions
- 2-3 people present their case study results
Self-Study Part 4: Advanced topics
- Start reading the chapter "Advanced Topics" up to and including the section: "Information Retrieval (Similarity Search)" and refresh your memory about TF-IDF feature vectors and the cosine similarity (see: Data Preprocessing: Working with Text Data)
- Work through Notebook 6: information retrieval
- Read the section: "Deep Learning"
- Work through Notebook 7: MNIST with torch (recommended) or MNIST with keras (in case others in your organization are working with TensorFlow)
- Read the last sections of the chapter "Advanced Topics": "Time Series Forecasting", "Recommender Systems (Pairwise Data)", and "Reinforcement Learning"
- Work through Notebook 8: RL gridmove
5th Group Session (~3h):
- Discuss workbook questions & exercise from notebook 6
Self-Study Part 5: Conclusion
- Answer Quiz 5 [pdf]
- Read the chapter: "Conclusion"
- Complete the exercise: "Your next ML Project" [de] (aim for a 5 minute presentation)
- Please fill out the Feedback Survey to help me further improve this course! :-)
6th Group Session (~3h):
- Discuss workbook questions
- ML project idea presentations