Episode summary

In Episode 32 of '100 Days of Data,' Jonas and Amy compare model training to an athlete’s workout routine, emphasizing the critical roles of training, validation, and testing datasets in building reliable AI models. They unpack the function of each data split: the training set teaches the model, the validation set fine-tunes it, and the testing set evaluates its real-world performance. Drawing from industry examples in healthcare, finance, retail, and automotive, they illustrate how improper use—or neglect—of these splits can lead to misleading results and failed deployments. The conversation also introduces techniques like cross-validation to handle small datasets and discusses the importance of transparency and documentation to gain stakeholder trust. This episode bridges foundational AI concepts with practical implementation, empowering listeners to build smarter, more trustworthy models.

Episode video

Episode transcript

JONAS: Welcome to Episode 32 of 100 Days of Data. I'm Jonas, an AI professor here to explore the foundations of data in AI with you.
AMY: And I, Amy, an AI consultant, excited to bring these concepts to life with stories and practical insights. Glad you're joining us.
JONAS: Let’s start with our hook for today—think of model training and testing as a workout plan for algorithms.
AMY: I love that. Just like a marathon runner doesn’t show up on race day without training, AI models need careful exercise—and testing—before they can perform in the real world.
JONAS: Exactly. At the heart of AI is the concept of a model that learns from data. But that learning has to be structured. We split data into parts—training, validation, and testing—to guide and check the model’s progress.
AMY: Right—and that split is crucial. Without it, you risk having an AI that thinks it’s a genius in the gym but fails miserably in the race. I’ve seen this happen in retail projects where models perform great on paper but flop when deployed.
JONAS: Let’s break down the three sets. First, the training set—that’s the data the algorithm uses to learn patterns. Think of it as the coach’s manual, showing the model what to look for.
AMY: In practice, say a healthcare company wants to predict patient readmissions. They feed past patient records—the training data—into the model so it can find the link between symptoms, treatments, and outcomes.
JONAS: Precisely. But training data alone isn’t enough. We also need a validation set. This data is separate from training and helps tune the model’s settings or hyperparameters.
AMY: This part’s like a dress rehearsal before the big show. I recall helping a financial firm choose the right balance between model complexity and speed. They tested different setups on a validation set to avoid the trap of overfitting, which we’ll dive into next episode.
JONAS: Yes, the validation set helps to avoid learning the training data too well—capturing noise instead of meaningful patterns. It provides feedback during development.
AMY: And then, the testing set is the final exam. Data the model has never seen before. It acts like a fresh crowd watching at the marathon, measuring real performance.
JONAS: Exactly. It gives an unbiased estimate of how well the model generalizes. Without test data, we can’t trust that the model will perform in the wild.
AMY: In a recent auto industry project, a colleague’s team split their data 70% training, 15% validation, and 15% testing. When they skipped the test set, their model seemed fantastic internally but failed with new sensor data once deployed.
JONAS: That illustrates why the testing set is sacred. It must remain untouched during training and tuning—only used at the very end.
AMY: But sometimes, especially with small datasets, splitting like this is tricky. I’ve seen teams struggle because their testing set ended up too small, giving unreliable results.
JONAS: A common workaround is cross-validation. Instead of one fixed split, the data is partitioned multiple ways, and models are trained and tested repeatedly to get a more robust average performance.
AMY: Brilliant approach. I’ve used cross-validation in consulting with startups that have dozens, not thousands, of records. It’s a neat trick to make the most of scarce data.
JONAS: To add, the process also helps identify when a model starts to memorize instead of learning—signs of overfitting.
AMY: And just to clarify for listeners, overfitting is when a model performs well on training data but poorly on unseen data, right?
JONAS: Exactly. It’s like memorizing answers to a quiz instead of understanding the subject. The model fails to generalize.
AMY: Which is why training, validation, and testing splits—and the right balance—are critical. In finance, for example, models need to recognize patterns that hold over time, not just quirks of last month’s market.
JONAS: A historical perspective might help here. Early AI systems didn’t always reserve data for testing. Developers would tune models on all available data, leading to overly optimistic results.
AMY: I can see how tempting that would be. When you’re under pressure to deliver, using all your data to make the model look good might seem smart—but it’s risky.
JONAS: Indeed. The concept of training-validation-testing came from the AI community’s need to measure real-world readiness. It’s a quality gate.
AMY: From my side, I’ve observed that many non-technical business leaders are surprised by the iteration this requires. They expect a model to work perfectly the first time.
JONAS: Models need multiple rounds of tuning and evaluation. The validation set helps decide which version is best before final testing.
AMY: For example, in retail recommendation engines, a company might adjust how much the model weighs recent purchases versus older data based on validation feedback.
JONAS: In that scenario, the validation set helps you answer: Are we capturing the customer’s current preferences or outdated habits?
AMY: And after all that tuning, the test set confirms if your tweak actually improved things for new users.
JONAS: It’s worth emphasizing that the test set is never reused, so think of it as your model’s final report card.
AMY: Something else I’ve found important is documenting these splits and results. It builds trust in model performance among stakeholders.
JONAS: Absolutely. Transparency about training, validation, and testing encourages confidence and helps diagnose issues if performance changes after deployment.
AMY: I recently helped a healthcare provider who was skeptical of AI predictions. When we showed clear validation and testing results aligned with their clinical insights, they got on board much quicker.
JONAS: One final note: the exact split percentages vary depending on dataset size and the project, but the principle remains—keep data sets distinct to properly train and evaluate.
AMY: And remember, in the business world, this process isn’t just theory—it impacts how successful your AI solution will be in the real market.
JONAS: To wrap up, the training set educates the model, the validation set tunes it, and the testing set assesses its true capability.
AMY: And from what I see, treating these sets seriously saves companies time, money, and reputations.
JONAS: Key takeaway for our listeners—think of training, validation, and testing as the essential checkpoints on an algorithm's fitness journey.
AMY: And from a practical side, insist on clear data splits and thorough testing before trusting an AI model in your projects.
JONAS: Next episode, we’ll explore overfitting and underfitting—two common pitfalls when training models that try too hard or not enough.
AMY: If you're enjoying this, please like or rate us five stars in your podcast app. We’d love to hear your questions or comments too, and who knows—they might be featured in a future episode.
JONAS: Thanks for being with us today.
AMY: Until tomorrow — stay curious, stay data-driven.

Next up

Next time, Jonas and Amy tackle overfitting and underfitting—how to spot when your model’s trying too hard or not hard enough.