Episode summary

In Episode 3 of '100 Days of Data,' Jonas and Amy dive into the origins of data, exploring how people, machines, and sensors form the core sources fueling modern AI systems. From human-generated clicks and voice commands to machine logs and real-time sensor readings in IoT devices, the hosts explain how each type contributes uniquely to data-driven decision-making across industries. Real-world examples—from connected cars and smart healthcare to online retail—showcase the power and challenges of sourcing trustworthy, high-volume data. The episode also emphasizes the importance of data quality, infrastructure, and understanding the broader digital ecosystem to make effective use of AI. Whether you're a data professional or just getting started, this episode lays a crucial foundation for building smarter systems by starting at the source.

Episode video

Episode transcript

JONAS: Welcome to Episode 3 of 100 Days of Data. I'm Jonas, an AI professor here to explore the foundations of data in AI with you.
AMY: And I, Amy, an AI consultant, excited to bring these concepts to life with stories and practical insights. Glad you're joining us.
JONAS: Where does all the world’s data actually come from? It seems like every second, we create massive amounts of information—but tracing its origins reveals a fascinating landscape.
AMY: Exactly. Data isn’t just numbers floating in the cloud. It’s rooted in real-world sources—everything from sensors on machines to people interacting with technology. Understanding these sources is the first step in unlocking data’s true power for business.
JONAS: Let’s start with the basics. At its core, data originates from three main categories: people, machines, and sensors. People generate data when they use devices, fill out forms, or create content. Machines generate data through operations, status updates, and automated logs. And sensors produce data by detecting physical conditions like temperature or motion.
AMY: That’s a great structure. In my consulting work, I see that companies often overlook how critical these sources are. Take the automotive industry: modern cars can be packed with sensors monitoring everything from tire pressure to engine performance. That sensor data allows companies to predict maintenance needs and avoid breakdowns—turning raw numbers into a better customer experience.
JONAS: Absolutely. Sensors are a specialized type of data source often linked to what’s called the Internet of Things, or IoT. This involves physical devices connected to the internet, all continuously generating data. Think smart thermostats, wearable fitness trackers, or even industrial robots on a factory floor.
AMY: IoT has been a game-changer in healthcare, too. I worked with a hospital implementing wearable devices for chronic disease management. Patients wore devices that tracked heart rates and blood sugar levels in real time. That data fed into AI models predicting potential health risks. It’s living proof that data from sensors isn’t just abstract numbers—it directly impacts lives.
JONAS: That’s an important point about the immediacy and usefulness of sensor data. Meanwhile, data from people—also called human-generated data—covers a huge range. It might be obvious when someone fills a survey, but it also includes more subtle things like click patterns on a website or voice commands to a virtual assistant.
AMY: And that’s where the complexity really shows up for businesses. Human-generated data can be messy and unpredictable. Unlike sensors that produce structured numerical input, people’s behavior is varied and sometimes contradictory. But here’s the kicker: mastering this data is key to personalization. Retailers use it all the time—think about how Amazon suggests products based on your browsing and purchase history.
JONAS: The unpredictability of human data partly explains why machine learning models rely heavily on large datasets to find patterns, especially for applications like recommendation engines.
AMY: Right, and on the flip side, machine-generated data—from devices like servers, ATM machines, or factory robots—is often highly structured but voluminous. In finance, for example, every transaction, balance update, or fraud alert generates machine data that feeds into risk models. Banks depend on this constant stream to keep operations smooth and secure.
JONAS: If we look historically, you could say the amount and variety of data sources have exploded alongside technological progress. Early computing focused mostly on manual data entry, but today’s digital environment creates data through countless touchpoints.
AMY: True—for businesses, that signals a shift from having a snapshot of their operations to a continuous video. For instance, in retail, stores used to only know sales data after the fact. Now, with IoT sensors, cameras, and customer apps, they capture foot traffic, shelf interactions, and real-time inventory levels. This rich data ecosystem allows immediate decision-making.
JONAS: Let’s talk a bit about the challenges in sourcing this data. One major issue is quality. Sensors can fail or report incorrect readings, machine logs might be incomplete, and human data can be biased or inaccurate. From a theoretical perspective, this affects data validity and the reliability of any AI that consumes it.
AMY: Yes, I can’t stress enough how often I’ve seen projects struggle because of poor data sourcing or quality. For example, a healthcare client tried to use wearables for patient monitoring, but inconsistent device syncing led to gaps in data. That made it hard to trust the AI predictions. So, knowing where your data comes from—and its limits—is crucial before jumping into modeling.
JONAS: And related to that is the question of scale and storage. As these data streams grow, organizations need infrastructure that can handle continuous ingestion from millions of sensors or user interactions.
AMY: It’s something companies in automotive know well. Connected cars generate gigabytes of data per day per vehicle. Storing and processing that quickly enough to provide predictive insights—for maintenance or driver safety—is a massive technical feat.
JONAS: Another source we should mention is public data—records, social media, satellite imagery, and more. While it’s not always linked to direct sensors or people’s devices, it contributes enormously to modern AI initiatives.
AMY: That’s especially important in industries like finance or marketing, where external data can supplement a company’s internal datasets. For instance, weather data can affect retail demand forecasts, and social sentiment data can guide brand strategies.
JONAS: So, tying this all together, sources of data are varied but fall into three big buckets: people-generated, machine-generated, and sensor-generated data. Each comes with its own characteristics, challenges, and business implications.
AMY: Exactly. And for any company looking to embrace AI, understanding these sources is step one. Because your AI’s insights will only ever be as good as the data it’s built on.
JONAS: Before we wrap up, it’s worth noting that the term “Internet of Things” represents the modern ecosystem that actively ties these data sources together. It’s not just sensors or devices in isolation, but a network that creates a continuous feedback loop.
AMY: And that feedback loop is where we see transformation. Whether it’s smart factories adjusting machines based on sensor input or healthcare providers responding to patient data in real time—they all trace back to understanding where your data is coming from.
JONAS: To hit the key takeaway—knowing your data’s origin—whether from sensors, people, or machines—is foundational for building effective AI systems. It influences everything from data quality to model design and ultimately, business impact.
AMY: And from my side, I’d say always think about the source in practical terms: What devices, systems, or behaviors generate your data? Are you capturing it reliably? Can you trust it? Getting this right opens the door to real-world AI success.
JONAS: Next time, we’ll take a journey through the history of data—looking at how data collection and use evolved over time to shape today’s AI landscape.
AMY: I can’t wait. It’s always fascinating to see how past innovations inform what we do now.
JONAS: If you're enjoying this, please like or rate us five stars in your podcast app. We’d also love to hear your questions or comments on today’s episode—some might even be featured down the line.
AMY: Until tomorrow — stay curious, stay data-driven.

Next up

In the next episode, Jonas and Amy trace the fascinating history of data collection and how it shaped today's AI landscape.