Episode summary

Episode 29 of '100 Days of Data' dives into the world of computer vision—how machines learn to 'see' and interpret images like humans. Jonas and Amy explore the fundamentals of transforming pixels into patterns using convolutional neural networks (CNNs), discuss real-world industry use cases, and highlight the importance of robust training data. They also touch on emerging models like Vision Transformers, the difference between recognition and detection, and the challenges of adversarial examples. With insights from manufacturing, retail, and healthcare, this episode offers both technical foundations and business applications for leaders and practitioners looking to harness the power of visual AI.

Episode video

Episode transcript

JONAS: Welcome to Episode 29 of 100 Days of Data. I'm Jonas, an AI professor here to explore the foundations of data in AI with you.
AMY: And I, Amy, an AI consultant, excited to bring these concepts to life with stories and practical insights. Glad you're joining us.
JONAS: Teaching machines to see like us — that’s what computer vision is all about.
AMY: That’s right, Jonas. It sounds almost like science fiction, but computer vision is one of the fastest-growing areas in AI and it’s changing how businesses operate every day.
JONAS: Let’s start with the basics. Computer vision is a field of artificial intelligence that enables machines to interpret and make sense of visual information from the world — like images or videos — much like humans do.
AMY: In practice, that means a machine can look at a photo or video and tell you what it sees — whether that’s recognizing a face, identifying a car, or even detecting defects on a factory line.
JONAS: The foundation of all this starts with images. In AI, an image is essentially a grid of tiny squares called pixels, each with values representing color or brightness.
AMY: That’s something people don’t always realize — to a computer, an image is nothing more than a huge table of numbers. So the challenge is teaching the AI how to interpret those numbers as meaningful objects.
JONAS: Exactly. Early computer vision systems tried to extract features like edges, shapes, or colors manually, but that was limited in accuracy and flexibility. The real breakthrough came with deep learning and a special type of neural network called convolutional neural networks, or CNNs.
AMY: I see CNNs mentioned everywhere in projects I consult on. Can you unpack what a convolution actually means in this context?
JONAS: Sure. The analogy I like to use is looking through a sliding window. Imagine you look at one small patch of an image at a time — say a 3x3 set of pixels — and you apply a small filter that can highlight features like edges or textures in that patch.
JONAS: This sliding window moves across the entire image, generating a map of where certain features pop up. By stacking many layers of these convolutions, the neural network learns increasingly complex patterns — from simple edges to full objects.
AMY: That makes sense. In business, this means the AI can learn to identify parts of objects—like wheels on a car or eyes on a face—and eventually recognize the entire object.
AMY: I once worked with an automotive manufacturer who used computer vision on their assembly line. The camera system scanned every car door to check if handles and locks were installed correctly. Because the AI could ‘see’ these small details quickly, it saved hours of manual inspection and significantly reduced defects making it to the next stage.
JONAS: That’s a great example of real-world impact. Another interesting concept here is the idea of training versus inference. The AI first learns from thousands or even millions of labeled images — that’s training.
JONAS: During training, the CNN adjusts millions of internal parameters so it can accurately recognize patterns. Then, during inference, we feed new images and get predictions based on what it has learned.
AMY: And this training stage is where the business challenge often lies. Getting enough labeled images can be tough and costly, especially in specialized industries like healthcare.
AMY: For instance, I worked on a project with a medical imaging company that wanted to detect early signs of diabetic retinopathy. Collecting thousands of annotated eye scans was a huge hurdle. They partnered with hospitals and used expert radiologists to label the images — a time-consuming but crucial step.
JONAS: That calls attention to another interesting point — why CNNs work so well. Their architecture mimics how our brain processes visual information.
JONAS: Instead of focusing on the entire image at once, the network processes small regions first and then combines those parts into wholes. It’s hierarchical, similar to how humans recognize things — first edges, then shapes, then objects.
AMY: Which explains why CNNs have revolutionized computer vision. But this field is moving fast. New techniques like transformers and self-supervised learning are shaking things up.
JONAS: Absolutely. Transformers started in language models but have recently been applied to images, with systems like Vision Transformers (ViTs) that handle image data differently from CNNs, often improving performance.
AMY: From a consultant’s perspective, the variety of tools and models means that companies need to choose solutions carefully. Sometimes CNNs are perfect, but other times newer models offer better accuracy or efficiency.
AMY: For example, a retail chain I worked with used computer vision powered by CNNs to monitor shelf stock in their stores. Cameras would scan shelves and detect empty spots, triggering restocking notifications.
AMY: However, as their needs grew — they wanted more detailed product recognition and faster processing — we evaluated newer models that gave a speed boost and better differentiation between similar products.
JONAS: It’s a good reminder that success in computer vision isn’t just about technology, but about aligning AI capabilities with business goals and constraints.
JONAS: Let's also touch on recognition versus detection — terms often used interchangeably but with distinct meanings.
JONAS: Recognition means identifying what an object is in an image, like spotting a cat. Detection goes further and locates where the cat is within the image, often by drawing a bounding box around it.
AMY: That distinction is key in practical applications. For instance, in autonomous vehicles, detection allows the car to know precisely where pedestrians or other vehicles are to navigate safely.
AMY: Whereas in retail, recognition might be used to identify products in a photo to manage inventory.
JONAS: Another key aspect to understand is how computer vision models handle variability — different lighting, angles, or occlusions. Traditional programming struggled here, but learning-based models handle this through exposure to diverse training data.
AMY: I love that point because it links back to business data strategy. The more diverse and representative your training dataset, the better your model performs in the real world.
AMY: That’s why companies invest in data collection and augmentation — techniques that artificially expand datasets by tweaking images with rotations, color changes, or crops.
JONAS: Speaking of twists, one fascinating side effect of computer vision models is adversarial examples — tiny changes in an image that can fool a model into misclassifying it.
JONAS: For example, adding subtle noise to a stop sign might make a model ‘see’ it as a speed limit sign. This highlights the need for robust model design and testing.
AMY: True, and that’s where practical experience matters. In industries like autonomous driving or security, mistakes can cost lives, so testing models in diverse, real-world settings is essential.
AMY: I’ve seen startups spend months refining their computer vision systems through real-world trials before scaling.
JONAS: To sum up, computer vision is about teaching machines to interpret visual data by converting images into numerical formats, then using layers of convolution to detect patterns.
JONAS: With large, labeled datasets, these models learn to recognize and detect objects accurately, enabling countless applications from manufacturing to healthcare.
AMY: And in the field, it’s about applying the right model, building diverse datasets, testing rigorously, and aligning AI with business needs — whether that’s reducing defects, improving safety, or enhancing customer experience.
JONAS: Key takeaway from me — computer vision transforms raw pixels into actionable insights by mimicking human sight through layered pattern detection.
AMY: And mine — in business, computer vision isn’t magic but a powerful tool when paired with quality data, smart model choice, and clear goals.
JONAS: Next episode, we’ll explore the fascinating battle between AI and traditional programming — when to use one, the other, or both.
AMY: If you're enjoying this, please like or rate us five stars in your podcast app. We’d love to hear your comments or questions about computer vision — you might even get featured in a future episode.
AMY: Until tomorrow — stay curious, stay data-driven.

Next up

Next episode, Jonas and Amy compare AI with traditional programming—discover when each approach makes sense.