XII UNIT 3 HOW CAN MACHINES SEE?

HOW CAN MACHINES SEE?

The answer is with the help of Computer vision .

Computer Vision (CV) helps machines "see" and understand images, just like humans do with their eyes. But instead of retinas and optic nerves, machines use:

Cameras 🎥 → To capture images.
Data & Algorithms 💻 → To analyze and interpret the images.
Deep Learning Models 🤖 → To recognize patterns, objects, and actions.

CV allows machines to make decisions based on images, like:

✅ Identifying objects (cars, faces, animals).

✅ Inspecting products for defects in industries.

✅ Detecting diseases from X-rays or MRIs.

✅ Assisting self-driving cars in understanding the road.

Since AI can analyze thousands of images faster than humans, it often performs better in tasks like facial recognition and object detection.

How Digital Images Work

A digital image is just a picture stored on a computer, made up of tiny squares called pixels (short for "picture elements").
Each pixel has a color assigned to it, helping to build the full image.

Interpreting Digital Images in Computers

🔹 When a computer processes an image, it doesn’t "see" like humans—it reads pixels as numbers!

A black-and-white image is stored using numbers ranging from 0 to 255:

0 = Black
255 = White
Values in between represent shades of gray.

Imagine a chessboard 🏁:

The black squares = pixels with 0 (black).
The white squares = pixels with 255 (white).

For color images, each pixel is stored using three values (Red, Green, Blue = RGB). This combination helps AI recognize different colors.

💡 The higher the number of pixels, the clearer the image (higher resolution). Lower pixels make the image blurry or pixelated.

5 Stages of Computer Vision

Computer vision helps computers understand images. It follows five main steps:

1. Image Acquisition (Capturing Images)

This is the first step where digital images or videos are collected. These images can come from:

- Digital cameras 📷

- Scanners 📄

- Design software 🎨

The quality of the image matters! If the camera has a higher resolution, it captures finer details. Things like lighting and angles also affect how clear the image is.

In medicine, special devices like MRI and CT scans help capture detailed images of tissues inside the body.

2. Preprocessing (Cleaning the Image)

Before using an image for AI tasks, we clean and improve it. Some common steps:

Noise Reduction: Removes blurry spots and distractions.
Normalization: Adjusts brightness and contrast so images look similar.
Resizing/Cropping: Makes all images the same size for easy analysis.
Histogram Equalization: Makes dark areas clearer and bright areas less intense.

3. Feature Extraction (Finding Important Parts)

The AI picks out useful details from the cleaned image:

Edge detection: Finds the outlines of objects.
Corner detection: Spots sharp bends in shapes.
Texture analysis: Checks for patterns like roughness or smoothness.
Color-based features: Uses colors to separate different objects.

Instead of humans picking features manually, deep learning (CNNs) helps AI learn automatically which features matter.

4. Detection & Segmentation

This step helps AI identify objects in an image and understand where they are. It's like giving AI the ability to "see" and recognize different things within a picture.

There are two main tasks:

1. Single Object Tasks (Recognizing One Object)

Sometimes, an image has only one important object to analyze. AI focuses on this one thing and performs two actions:

(i) Classification: AI identifies what the object is.

Example: Looking at a picture of an animal and deciding it's a "dog."
Methods used: K-Nearest Neighbors (KNN) (for labeled data), K-Means Clustering(for grouping objects without labels).

(ii) Classification + Localization: AI not only identifies the object but also marks its location within the image.

Example: AI recognizes a "dog" and draws a box around it.
The box is called a bounding box, which highlights where the object is.

2. Multiple Object Tasks (Detecting Many Objects)

This step helps AI find several objects inside an image instead of just one. There are two key techniques:

(i) Object Detection: AI scans the image to find multiple objects and classifies them. AI places bounding boxes around each detected object.

Example: Looking at a street scene and identifying cars, pedestrians, and traffic signs.
Some common AI methods for object detection:
R-CNN (Region-Based Convolutional Neural Network) → Divides the image into sections and searches for objects.
YOLO (You Only Look Once)→ A fast method that looks at the whole image at once.
SSD (Single Shot Detector) → Quickly detects objects in real-time.

(ii) Image Segmentation: AI separates objects more precisely by labeling pixels. Instead of using bounding boxes segmentation creates masks over objects, showing their exact shape.

Example: A medical scan where AI highlights only tumors instead of marking a whole area.
AI precisely separates buildings, cars, and trees in a city image.

Types of Image Segmentation

1.Semantic Segmentation: AI groups similar objects together. Objects belonging to the same class are not differentiated. In this image for example the pixels are identified under class animals but do not identify the type of animal.

Limitation: It doesn’t separate individual objects.

2. Instance Segmentation: AI separates each object individually. All the objects in the image are differentiated even if they belong to the same class. In this image for example the pixels are separately masked even though they belong to the same class

Used in advanced tasks where identifying individual objects matters.

Importance

AI can recognize objects in real-world environments (self-driving cars, security cameras, medical scans).
Helps in medical diagnosis (spotting diseases in X-rays).

5.High-Level Processing (Understanding the Image)

Now the AI interprets the image to make decisions, like:

- Recognizing objects 🚗
- Understanding scenes 🌆
- Analyzing images in medical scans, self-driving cars , and security cameras.

Applications

Face Tagging: Social media sites, like Facebook, use facial recognition to detect and tag users in photos.

Doctor's Help: In healthcare, CV helps doctors check for problems like cancerous tumors and spot diseases or unusual things in medical scans.

• Smart Cars: Self-driving vehicles use CV to understand their surroundings by capturing video from different angles. This helps them find other cars and objects and read traffic signals and pedestrian paths.

• Reading Text (OCR): Optical Character Recognition (OCR) lets computers pull out printed or handwritten text from images or documents like articles, bills, and invoices.

• Checking Products: CV is used in machine inspection to find defects, flaws, or other irregularities in things that are manufactured.

• Creating 3D Worlds: It helps build 3D computer models from real-life objects, which is useful for things like Robotics, 3D tracking, and Augmented/Virtual Reality (AR/VR).

• Safety Watch: Surveillance systems use live CCTV video in public places to identify suspicious behavior or dangerous objects, helping to maintain law and order.

• ID Check: It recognizes fingerprints and biometrics (unique body traits) to confirm a user's identity.

Challenges

1. Understanding Images: Computer vision doesn’t just need to see objects—it needs to understand them. If it can’t understand what is happening in an image, it cannot give correct answers.

2. Taking Good Pictures: Computer vision needs clear images. But things like bad lighting, different angles, and objects blocking each other make it hard to get good pictures. Without good images, the computer makes mistakes.

3. Privacy Problems: Using cameras and facial recognition can invade people’s privacy. Many people worry about how their pictures are used, so these technologies need rules and careful use.

4. Fake Images and Videos: Some people create fake or edited images and videos that look real. This can spread lies and confuse computer systems. It can also cause harm or misinformation.

Future

Computer vision has grown a lot over the years. It started with simple tasks like reading basic images, but now it can understand pictures and videos almost like humans. This big progress happened because of powerful deep learning technology and large amounts of training data.

In the future, computer vision will become even more important. It can help in many areas, such as:

Healthcare: giving personalized and faster medical diagnoses
AR (Augmented Reality): creating more realistic and interactive experiences
Daily Life: improving safety, transportation, education, and more

Reference book:

CBSE handbook

Search This Blog

ARTIFICIAL INTELLGENCE (2025-26)