100 Days of Data - Episode 7

Episode summary

In Episode 7 of '100 Days of Data,' Jonas and Amy explore the evolution of data storage — from floppy disks to modern data lakes — and how different storage methods shape AI capabilities. They explain foundational concepts like files, relational databases, data warehouses, and data lakes, using real-world examples from industries like healthcare and automotive. The episode also highlights the critical role of cloud storage and integrated platforms in making data more accessible, scalable, and ready for innovation. Listeners will gain a clearer understanding of how each storage type serves specific business needs, and why choosing the right approach is crucial for data-driven success.

Episode video

Episode transcript

JONAS: Welcome to Episode 7 of 100 Days of Data. I'm Jonas, an AI professor here to explore the foundations of data in AI with you.
AMY: And I, Amy, an AI consultant, excited to bring these concepts to life with stories and practical insights. Glad you're joining us.
JONAS: From floppy disks to data lakes, today we’re diving into the world of data storage — the places where data lives before it fuels AI.
AMY: Yeah, it’s wild to think how far we’ve come. From storing just a few megabytes on a floppy disk to handling petabytes in the cloud. But what really goes on behind the scenes? That’s what we’ll unpack.
JONAS: To start, let’s think about what data storage actually is. At its simplest, it’s any method or technology used to keep digital information so it’s available when needed. Early on, this was just files on physical media like tapes or disks.
AMY: Right, and in the early days of companies using data, that might have been spreadsheets or text files saved here and there. But as data grew, businesses needed more structured ways to organize and retrieve information efficiently.
JONAS: Exactly. That’s where databases come in. A database is an organized collection of data, designed so you can store, search, and update information quickly. The first widely used kind was the relational database, introduced in the 1970s. It organizes data into tables with rows and columns.
AMY: Like a fancy spreadsheet on steroids! In retail, for example, a relational database might hold customer information, product details, and sales transactions all linked together. It’s what made managing large volumes of business data practical.
JONAS: Yes, and the structure ensures consistency and easy querying. You use a language called SQL — Structured Query Language — to interact with relational databases. Think of it like a formal way to ask questions of your data: \"Show me all sales last quarter,\" or \"Find customers from New York.\"
AMY: That’s still the backbone for many companies today. But as more varied and massive data started pouring in — say, from sensors on cars or social media feeds — the traditional relational databases sometimes struggled.
JONAS: Right, which led to the rise of data warehouses. These are systems designed specifically for analysis and reporting. Instead of managing day-to-day operations, they combine data from different sources to give a big picture, often to inform business decisions.
AMY: I worked with a healthcare provider recently who used a data warehouse to pull patient records, treatment outcomes, and insurance data together. This helped their analysts spot trends and improve care protocols — something their transactional databases weren’t built for.
JONAS: Data warehouses typically structure data in a way optimized for queries, often using a format called a star or snowflake schema, grouping related information for quick retrieval. But even warehouses can hit limits when the data becomes extremely diverse or unstructured.
AMY: That’s where data lakes enter the scene. Unlike databases or warehouses, a data lake stores raw data in its native format — whether it’s text documents, images, videos, or logs. It’s like dumping everything into a big pool — hence the name — without forcing it into a strict structure first.
JONAS: Exactly. This gives extreme flexibility and scale. AI and machine learning thrive on data lakes because they can sift through unprocessed or semi-structured data to find patterns that rigid databases might miss.
AMY: For instance, in the automotive industry, manufacturers gather mountains of sensor data from connected vehicles. They store it in data lakes so AI models can analyze driving patterns, detect anomalies, or optimize maintenance schedules.
JONAS: The downside is that a data lake without proper management can become a “data swamp” — where data is hard to find or trust. So governance, metadata, and cataloging are critical to keep data lakes useful.
AMY: Governance is often the trickiest part in the real world. I’ve seen companies struggle to make sense of their lakes because there was no clear inventory or ownership. But when done right, data lakes enable agile exploration, rapid experimentation, and innovation.
JONAS: To sum up, think of data storage evolving like different sorts of containers. Files are simple boxes holding isolated items. Databases are organized shelves with labeled compartments. Warehouses are huge libraries curated for reference. And data lakes are vast reservoirs storing everything in its raw form, ready for discovery.
AMY: Nice analogy! And the choice depends on business needs. If you want reliable daily operations and transactions, relational databases work best. For strategic insights and reporting across sources, data warehouses shine. But if you want flexibility to explore diverse data types at scale, data lakes are the way to go.
JONAS: There’s also cloud storage to mention — services by Amazon, Microsoft, and Google have revolutionized data storage accessibility and scalability. You don’t need to buy physical servers anymore. Instead, you pay for what you use, and the cloud providers handle backups, security, and updates.
AMY: Cloud storage is huge in my consulting work. One finance client moved their on-premises data warehouse to the cloud, cutting costs by 40% and improving query performance. Plus, they gained easy integration with AI tools and services.
JONAS: The cloud also blurs the lines between storage types. For example, many platforms offer integrated solutions — combining data lakes, warehouses, and databases — making it easier to move data between them without friction.
AMY: And that integration helps businesses become more data-driven. A retailer can combine customer transactions (from a warehouse) with clickstream data and social media sentiment (from a lake) to personalize marketing campaigns.
JONAS: To wrap up the technical basics, it’s critical to understand the “fit for purpose” aspect. No single storage method fits all scenarios. Each has trade-offs between structure, flexibility, scalability, and cost.
AMY: Definitely. And practical considerations like compliance, latency, and user skills play big roles. You might need highly controlled databases for sensitive healthcare data, but open data lakes for exploratory AI projects.
JONAS: So, what’s our key takeaway for today? Amy, want to start?
AMY: Sure! I’d say, knowing the types of data storage helps you ask the right questions in your business — like, “Is our data easy to access? Can we combine different datasets? Do we have the flexibility to innovate?” Picking the right storage strategy lays the foundation for AI success.
JONAS: Great summary. From my side, I’d add that understanding the evolution from files to databases, warehouses, and lakes offers insight into how data supports AI and analytics. Each storage form builds on the last, adding new capabilities to handle more volume and complexity.
AMY: And next time, we’ll dive into data formats — the languages your data speaks, from CSV and JSON to images and audio. It’s the next step after knowing where data lives.
JONAS: If you're enjoying this, please like or rate us five stars in your podcast app. We’d also love to hear your questions or comments — some might show up in future episodes.
AMY: Until tomorrow — stay curious, stay data-driven.

Next up

Next episode, Jonas and Amy decode the languages data speaks — from CSV files to audio formats — with a deep dive into data types and formats.

Episode 7-Data Storage

Episode summary

Episode video

Episode transcript

Next up

Written by:

Amy & Jonas

Member discussion: