Kung fu - Machine Learning (FREE version)
Section outline
-
Lesson Goal: Introduce what machine learning is and how it differs from traditional programming, including key concepts and real-world examples (medical diagnosis and a green screen effect) that illustrate how we tell a computer what we want through data and objectives.
-
Lesson Goal: Ensure students can access and run machine learning code without installing software on their own computer, by using Jupyter notebooks and Google Colab. This lesson covers why Python is the go-to language for ML, what notebooks are, how to use Colab in a browser, and how this cloud setup provides more power and zero setup headaches.
-
Lesson Goal: Introduce decision trees as a simple yet powerful machine learning method that learns logical if-then rules from data. Students will see how a decision tree can learn explicit decision rules (like a flowchart) to solve problems, through examples like a spelling rule and a medical prediction. They will also understand how decision trees learn (splitting criteria) and their pros and cons (interpretable but can overfit).
-
Lesson Goal: Demystify what’s happening inside a trained neural network (“open the black box”) by walking through a simple neural network step-by-step. This lesson also revisits the green screen example from Lesson 1, but now using a neural network approach to actually implement the effect, illustrating how the network’s internals operate. The aim is to give students an intuition for how data moves through a network and how we might interpret or debug a network.
-
Lesson Goal: Introduce Bayesian reasoning in machine learning, focusing on the Naive Bayes classifier for predicting probabilities (like spam detection). Students will learn Bayes’ theorem conceptually, see how Naive Bayes makes simplifying independence assumptions, and understand how it uses evidence (features) to update probability beliefs. The spam filtering example is used to make it concrete. The lesson emphasizes the “effect to cause” thinking (looking at evidence to infer the cause) that defines Bayesian models.
-
· Overview: Genetic Algorithms (GAs) are optimization methods that evolve solutions by mimicking natural selection. This lesson teaches how GAs work and why they’re useful for creating rules or designs that aren’t easily found by conventional programming.
-
· Overview: This lesson introduces the k-Nearest Neighbors (KNN) algorithm, a simple yet powerful method that makes predictions based on similarity to known examples. Students learn how “show me your neighbors, and I’ll tell you who you are” works in practice, including how to choose the number of neighbors and measure similarity.
-
· Overview: Overfitting is one of the biggest hazards in machine learning. This lesson teaches what overfitting is, how it differs from underfitting, why it happens, and how to detect and prevent it. Students will learn through analogies (like studying for a test by memorization vs understanding) to grasp why a model that performs too well on training data can actually fail in the real world.
-
· Overview: Beyond overfitting, many practical pitfalls can trip you up when applying ML in the real world. This lesson covers a variety of common mistakes and issues: using bad data, evaluation errors (like testing incorrectly), distribution changes when deploying, ethical biases, and blindly trusting models. By recognizing these pitfalls, students will learn to avoid them and build more reliable ML systems.
-
· Overview: This lesson explores unsupervised learning (finding patterns without labels) through clustering, and then introduces semi-supervised learning, which bridges supervised and unsupervised methods. Students will see how algorithms like K-Means form clusters, why choosing the number of clusters is tricky, and how unlabeled data combined with a bit of labeled data can improve learning.
-
· Overview: Recommendation systems help suggest products, movies, or content to users. This lesson frames recommenders through the lens of the three main machine learning paradigms: supervised learning (predicting ratings or preferences), unsupervised learning (finding similarities, e.g., collaborative filtering), and reinforcement learning (learning by trial and reward). Students will understand how each approach contributes to making smart recommendations.
-
Lesson Goal: Introduce inverse reinforcement learning (IRL) – how AI can learn what to optimize by observing human behavior – and why this is key for aligning AI with human goals.
-
Lesson Goal: Explain the difference between correlation and causation, why traditional ML struggles with causal relationships, and introduce tools of causal inference (experiments and causal models) that are increasingly being integrated with machine learning to create AI that truly understands cause and effect.
-
Lesson Goal: Explore the counterintuitive phenomenon in modern ML where using massively more parameters than data (over-parameterization) can actually improve performance. Students will learn what over-parameterization means, why traditionally it was feared due to overfitting, and how deep learning defied expectations through phenomena like double descent, leading to new understanding of generalization.
-
Lesson Goal: Examine the privacy challenges posed by machine learning (using personal data, risk of leaking sensitive info) and explore key techniques to protect privacy, such as data anonymization pitfalls, differential privacy, and federated learning, empowering students to build and use AI systems that respect user privacy.
-
Lesson Goal: Summarize and reinforce the end-to-end process of solving problems with machine learning – from problem definition and data gathering to model deployment and maintenance – giving students a roadmap to follow for their own projects and an appreciation of how all the pieces (from previous lessons) fit into a coherent workflow. This ties together technical skills with project management and strategy, essential for “mastery.”
