Episode summary

In Episode 15 of '100 Days of Data,' Jonas and Amy dive into the world of Exploratory Data Analysis (EDA), the essential first step in any data project. With a mix of real-world examples and practical tools, they explore how EDA helps uncover hidden patterns, anomalies, and trends in messy datasets. From visualizing engine failure spikes in the automotive industry to identifying unusual spending patterns in finance, the hosts demonstrate the power of combining statistics with storytelling. They also discuss tools ranging from Excel and Python’s Pandas to no-code platforms like Tableau, making EDA accessible to both analysts and business users. Whether it’s missing values or subtle correlations, this episode emphasizes that knowing your data well is key to building trustworthy AI systems and driving strategic decisions.

Episode video

Episode transcript

JONAS: Welcome to Episode 15 of 100 Days of Data. I'm Jonas, an AI professor here to explore the foundations of data in AI with you.
AMY: And I, Amy, an AI consultant, excited to bring these concepts to life with stories and practical insights. Glad you're joining us.
JONAS: Let’s start with the fun part today: discovering patterns in messy data.
AMY: That’s right! Exploratory Data Analysis is like being a detective in a chaotic crime scene—except the clues are numbers and charts.
JONAS: Well put, Amy. Exploratory Data Analysis, or EDA for short, is the process where you dive into raw data to understand what’s there before applying any complex algorithms. It’s about making sense of the data through summary statistics and visualizations.
AMY: And it’s often the very first step in any data project. I’ve seen companies rush into building AI models, only to get stuck because they didn’t really know what was in their data. EDA helps avoid that by shining a light on hidden patterns or weird anomalies early on.
JONAS: Exactly. Historically, EDA was popularized in the 1970s by John Tukey, a statistician who argued that before jumping into hypothesis testing, we should play with data—exploring it freely and visually to discover the unexpected.
AMY: That’s a valuable perspective. In the real world, data rarely arrives neat and tidy. For example, in retail, when we analyze sales data, there might be missing values for certain products or dates where data is duplicated. EDA helps spot these issues quickly.
JONAS: Right, so EDA involves several key techniques: calculating summary statistics like means, medians, and variances, checking distributions, identifying correlation between variables, and of course, visualizing the data with charts and plots.
AMY: Visuals really bring data to life. I remember working with an automotive company trying to reduce warranty claims. By plotting failure rates over time, we saw an unexpected spike in a particular batch of engines. Without that visualization, it would have been buried in the data.
JONAS: That’s a perfect example of detecting anomalies—data points that don’t fit the usual pattern. Spotting these early allows teams to ask why they exist. Are they errors, or do they reveal important insights?
AMY: Sometimes those anomalies tell the real story. In healthcare, for instance, while reviewing patient data, an anomaly might reveal a rare side effect or a misreported symptom. EDA flags these outliers so doctors or analysts know where to dig deeper.
JONAS: When we talk about patterns, it often means recurring themes or relationships in data. For example, you might see that as temperature increases, sales of ice cream also go up. These correlations guide further analysis.
AMY: But we always have to be careful—correlation doesn’t mean causation, as you often remind me. Just because two things happen together doesn’t mean one causes the other. I once saw a dataset where ice cream sales and shark attacks both increased at the same time — but it was really summer driving both.
JONAS: Absolutely. EDA helps us generate hypotheses like “is A related to B?” but doesn’t confirm causality on its own. It’s a foundation for deeper modeling and testing.
AMY: Speaking of foundation, what tools do you usually recommend for EDA? Because for business folks listening, the idea might be intimidating but there are user-friendly ways to do this.
JONAS: Great question! Many people start with spreadsheet software like Excel for basic exploration—calculating averages, filtering data, and making simple charts. For more advanced exploration, Python libraries like Pandas and visualization tools like Matplotlib or Seaborn are popular. But there are also platforms like Tableau or Power BI that let you visually explore data interactively without coding.
AMY: That matches what I see on client projects — non-technical business analysts often use Tableau dashboards to spot trends or anomalies before involving data scientists. One client in finance used dashboards to detect unusual transaction patterns that hinted at fraud, thanks to EDA principles embedded in the design.
JONAS: And the process often includes iterating between looking at statistics and visualizations. For example, you might start by calculating the average sales per month, then create a time-series plot, and notice a seasonal pattern or a sudden dip.
AMY: You highlight it, then the team asks: What happened in that month? Was there a marketing campaign, a supply chain disruption, or a data issue? This conversation leads to better decisions.
JONAS: Another key part of EDA is identifying missing data. Missing values can be random or systematic, and understanding their pattern helps to decide how to handle them later in the modeling process.
AMY: I’ve seen missing data handled differently depending on the business impact. In one healthcare project, missing patient data was critical and required follow-up, while in a retail analysis, missing entries for some products were just dropped because they weren’t top sellers.
JONAS: You bring up an important point—EDA isn’t just about statistics and charts; it’s about context. Understanding the domain behind the data is crucial. The numbers tell part of the story, but the story needs background knowledge.
AMY: Exactly. Numbers without context can mislead. For instance, a drop in user engagement may look bad, but if it coincides with a major app update, it might be temporary or expected. EDA helps uncover such nuances before jumping to conclusions.
JONAS: So, to sum up, EDA helps us get to know our data intimately. It’s like a health checkup: identifying strengths, weaknesses, surprises, or outright issues that could impact downstream analysis or AI models.
AMY: And from a practical perspective, spending time on EDA pays off big. It saves headaches later on and leads to better, more reliable insights and decisions. I always advise teams to dedicate at least 20-30% of their time to exploratory analysis.
JONAS: Before we wrap, Amy, do you have a favorite example that illustrates the power of EDA?
AMY: Absolutely. I worked with a retail chain analyzing customer purchase behaviors. EDA revealed a subtle but consistent drop in sales in certain stores right after paydays, which was puzzling until we overlaid local event calendars and saw those stores were near venues with big weekend events. It turned out customers shopped less there because they were out at events. This insight helped shift marketing strategy to target those times better.
JONAS: That’s a fantastic example of how visualizing and understanding data uncovers real-world phenomena. It reminds us that the story behind data is rarely straightforward.
AMY: For sure. Alright, time for our key takeaways?
JONAS: Here’s mine: Exploratory Data Analysis is the crucial first step in any data project. It transforms messy, raw data into well-understood information by revealing patterns, anomalies, and relationships through statistics and visualization.
AMY: And my takeaway: EDA saves businesses time and money by uncovering hidden insights and data issues early, helping teams make informed decisions and avoid costly mistakes down the road.
JONAS: Next time, we’re going deeper into visualization itself—how to present data clearly and powerfully. So get ready to see numbers come alive in charts and graphs.
AMY: Yep, we’ll talk about design principles, choosing the right chart for your data, and how visuals can drive action in your organization.
JONAS: If you're enjoying this, please like or rate us five stars in your podcast app. We’d love to hear your comments or questions about today’s episode—maybe we’ll feature them in a future show.
AMY: Thanks for listening.

Until tomorrow — stay curious, stay data-driven.

Next up

Next episode, Jonas and Amy dive deep into data visualization—how to tell powerful stories through charts and graphs.