100 Days of Data - Episode 25 | Unsupervised Learning

Episode summary

In Episode 25 of '100 Days of Data,' Jonas and Amy unpack the fundamentals of unsupervised learning, a branch of machine learning where algorithms find structure in unlabeled data. They explore key techniques like clustering and dimensionality reduction, illustrating their real-world applications in customer segmentation, fraud detection, and genetic research. With relatable examples from retail, finance, and healthcare, the hosts show how unsupervised learning empowers AI to uncover patterns that humans may overlook. They also discuss common algorithms such as k-means and PCA, along with the importance of domain expertise in interpreting results. The episode emphasizes how unsupervised learning's exploratory nature opens new doors for businesses dealing with complex, unlabeled datasets.

Episode video

Episode transcript

JONAS: Welcome to Episode 25 of 100 Days of Data. I'm Jonas, an AI professor here to explore the foundations of data in AI with you.
AMY: And I, Amy, an AI consultant, excited to bring these concepts to life with stories and practical insights. Glad you're joining us.
JONAS: Imagine a machine that can find patterns and structure in data without being told what to look for — discovering hidden groups or themes all on its own. That's the power of unsupervised learning.
AMY: Yeah, it’s like when you get a big pile of puzzle pieces, but no picture on the box. The machine has to figure out how they fit together without any hints.
JONAS: Exactly. Unsupervised learning is a type of machine learning where the model learns from data that is not labeled. In contrast to supervised learning, where we provide the correct answers, here the system has to identify structure, patterns, or features on its own.
AMY: So, no teacher, no correct answer key — the algorithm behaves like an explorer, trying to make sense of the world in its own way. That’s pretty exciting from a business standpoint because sometimes you don’t have labeled data.
JONAS: Right, and this ability to find structure in raw data has been around for decades. It underpin many AI breakthroughs, especially in areas where labeling is expensive or impossible. Two of the most common approaches in unsupervised learning are clustering and dimensionality reduction.
AMY: Let’s break those down. Clustering is about grouping things that are alike, right? Like putting customers into buckets based on buying behavior.
JONAS: Yes. Clustering algorithms try to divide data points into groups or clusters, such that points in the same cluster are more similar to each other than to those in other clusters. Think of it as finding natural groupings.
AMY: I remember working with a retail client who wanted to segment their customers but had no idea what categories would emerge. Using a clustering approach, we identified groups like bargain hunters, premium buyers, and occasional visitors—all without predefining these categories.
JONAS: That’s a fantastic example. There are a variety of clustering techniques—k-means, hierarchical clustering, DBSCAN—each has its own way of measuring similarity and grouping.
AMY: K-means is probably the most popular, right? It’s like picking k number of clusters, then assigning points to the nearest cluster center, and iterating until things settle down.
JONAS: Correct. It works well when you have an idea of how many clusters to expect. But sometimes you don’t know that upfront. Then hierarchical clustering, which builds a tree of clusters, or density-based methods like DBSCAN, which find clusters of any shape based on how densely packed points are, are valuable.
AMY: In automotive, clustering helps detect different driving patterns from sensor data. Instead of labeling every trip, unsupervised approaches revealed groups like aggressive drivers, cautious commuters, or eco-friendly ones. This insight helps insurers create better risk models.
JONAS: Exactly. Clustering uncovers meaningful groups that can drive business actions—from marketing personalization to fraud detection.
AMY: So, how about dimensionality reduction? It sounds fancy but I’ve seen it used a lot for making complex data easier to work with.
JONAS: That’s a good summation. When we have data with many features — say hundreds or thousands — it can become unwieldy. Dimensionality reduction transforms the data into a lower-dimensional space while preserving the most important information.
AMY: It’s like squeezing a big file into a smaller folder without losing the key parts.
JONAS: Precisely. Principal Component Analysis, or PCA, is one of the oldest and most well-known methods for this. It finds new axes — called principal components — that capture the maximum variance in the data.
AMY: We used PCA for a healthcare client analyzing genetic data. They had thousands of gene expressions per patient and needed to simplify this massive dataset to find patterns linked to diseases.
JONAS: Reducing dimensionality also helps with visualization. When you can plot data in two or three dimensions instead of hundreds, it's easier to spot trends or clusters.
AMY: Definitely. And from a practical standpoint, it also improves model efficiency and sometimes accuracy—too many features can confuse algorithms or cause overfitting.
JONAS: One interesting point is that unsupervised learning can also serve as a preprocessing step to enhance supervised models, combining the best of both worlds.
AMY: I’ve seen that too. For example, in finance, clustering can identify groups of transactions that are unusual, which then feed into a fraud detection system that’s supervised.
JONAS: Another exciting area is anomaly detection, which is often framed as unsupervised learning. Here, the model learns the normal data pattern and flags anything that stands out as unusual.
AMY: Yeah, that’s super relevant for cybersecurity. Systems can automatically spot hacking attempts or network intrusions without having a labeled dataset of attacks.
JONAS: However, unsupervised learning comes with challenges: because it’s exploratory, results can be less predictable or harder to interpret compared to supervised learning.
AMY: Right, and it requires a bit more skill to tune and validate. You often need domain knowledge to make sense of what the algorithm finds. In practice, collaboration between data scientists and business experts is critical.
JONAS: That's a key insight. While unsupervised learning unlocks hidden patterns, the human in the loop is essential to apply those insights meaningfully.
AMY: So for business leaders wanting to leverage unsupervised learning, the first step is to identify problems where structure or segmentation is unknown but valuable—like customer segmentation, fraud detection, or anomaly detection.
JONAS: And keep in mind that unsupervised learning can be applied to many data types: images, text, sensor data, or even raw logs.
AMY: I also advise clients to pair unsupervised learning with visualization tools. Seeing clusters or reduced dimensions on a plot often sparks new ideas and strategic responses.
JONAS: To sum up, unsupervised learning is about machines teaching themselves to find meaningful patterns and structure in unlabeled data. It’s exploratory by nature and a powerful way to unlock insights when labels are unavailable or expensive to get.
AMY: And on the ground, it drives real business impact—from better customer understanding and risk management to operational efficiencies and new product ideas.
JONAS: Key takeaway: Unsupervised learning empowers AI to see what humans might miss by discovering hidden structure in data without explicit labels.
AMY: And practically speaking, it’s your friend when you don’t have all the answers upfront and want the data itself to guide you—whether it’s segmenting customers or detecting anomalies.
JONAS: Next time, we'll dive into Reinforcement Learning — where machines learn by trial and error to achieve specific goals. It’s a fascinating step beyond unsupervised and supervised learning.
AMY: If you're enjoying this, please like or rate us five stars in your podcast app. Leave comments or questions; we might feature them in upcoming episodes.
AMY: Until tomorrow — stay curious, stay data-driven.

Next up

In the next episode, Jonas and Amy explore how machines learn by doing in the exciting world of Reinforcement Learning.

Episode 25-Unsupervised Learning

Episode summary

Episode video

Episode transcript

Next up

Written by:

Amy & Jonas

Member discussion: