Unit 5-Computer Vision
Unit 5-Computer Vision
-
What is Computer Vision?
A domain of AI that enables machines to “see” like humans using images or visual data.
-
Machines can capture, process, analyze, and interpret visual information.
How Humans See
-
Eye → captures visuals
-
Brain → interprets
How Machines See
-
Camera / Sensor → captures image
-
Computer Vision Algorithms → analyze & interpret
Quick Overview of Computer Vision
Computer Vision = extracting useful information from:
-
Images
-
Videos
-
Text
-
Visual signals
It helps machines understand visuals just like humans do.
Relationship: AI → Computer Vision → Deep Learning
-
Artificial Intelligence (AI): Makes computers think intelligently.
-
Computer Vision (CV): Enables computers to see.
-
Deep Learning (DL): Helps CV models learn automatically from large image datasets.
Computer Vision vs Image Processing
Computer Vision
-
Focus: Understanding images.
-
What it does: Extracts meaningful information to make decisions.
-
Examples:
-
Object detection
-
Face recognition
-
Handwriting recognition
-
Image Processing
-
Focus: Improving images.
-
What it does: Enhances or prepares images for further tasks.
-
Examples:
-
Resizing
-
Adjusting brightness
-
Changing color tones
-
Relationship
-
Image Processing is a subset of Computer Vision.
-
Computer Vision is the bigger field.
Applications of Computer Vision
-
1. Facial Recognition
-
Used in smart homes, smart cities, and security systems.
-
Helps recognize guests, maintain visitor logs, and manage access.
-
Schools use it for automatic attendance.
-
CV identifies and matches facial features.
2. Face Filters (Instagram, Snapchat)
-
Apps detect facial movements and landmarks.
-
Filters (dog ears, glasses, beauty filters, etc.) are placed accurately on the face.
-
Computer Vision tracks nose, eyes, lips, etc., in real time.
3. Google’s Search by Image
-
Instead of typing text, you upload an image to search.
-
CV compares features of the image with millions of images in Google’s database.
-
Helps identify objects, locations, people, products, etc.
4. Computer Vision in Retail
a) Customer Tracking
-
CV tracks customer movement inside stores.
-
Helps understand walking paths and behavior.
b) Inventory Management
-
Analyzes camera footage to estimate stock levels.
-
Detects empty shelves, misplaced items, and stocking errors.
-
Suggests better product placement for improved sales.
5. Self-Driving Cars
-
Computer Vision is the core technology.
-
Used for:
-
Object detection (cars, humans, traffic lights)
-
Lane detection
-
Route navigation
-
Environment monitoring
-
-
Enables hands-free or autonomous driving.
6. Medical Imaging
-
Helps doctors to see, analyze, and interpret medical images.
-
Converts 2D scans like CT, MRI into 3D models.
-
Provides detailed views of organs, helping diagnosis and treatment planning.
-
Acts as an assistant to medical professionals.
7. Google Translate App (Live Translation)
-
-
Point your camera at foreign text → instantly translated.
-
Uses:
-
OCR (Optical Character Recognition) to read text.
-
Augmented Reality to display translated text on the screen.
-
-
Very useful for travel and communication.
Computer Vision Tasks
Computer Vision applications work by performing certain key tasks to extract information from images. These tasks help in prediction, analysis, and understanding of visuals.
🔹 1. Image Classification
Definition
-
Assigning a single label/category to an entire image.
-
Only identifies what object is present.
Key point
-
Works with one object per image (most basic CV task).
Examples
-
Cat vs Dog
-
Identifying handwritten digits
-
Classifying a fruit as apple/mango/orange
2. Classification + Localization
Definition
-
Performs two tasks together:
-
Classification – What object is present
-
Localization – Where it is located in the image
-
Key point
-
Works for a single object in an image.
-
Uses bounding boxes to show location.
Example
-
Detecting and marking where a football is in an image.
3. Object Detection
Definition
-
Identifies multiple objects in an image and their locations.
-
Detects all instances of objects like cars, people, animals, etc.
How it works
-
Uses features + learning algorithms.
Examples
-
Detecting pedestrians and vehicles in self-driving cars
-
Automated car parking systems
-
Face detection in camera apps
Key point
-
Gives bounding boxes for all detected objects.
4. Instance Segmentation
Definition
-
The most detailed CV task.
-
Detects each object, assigns it a category, and also labels each pixel belonging to that object.
Key point
-
Separates individual objects, even of the same type.
Example
-
Identifying each person in a crowd separately.
-
For two dogs in an image, it labels Dog 1 and Dog 2 pixel-wise.
Output
-
Creates segments/regions for different objects in the image.
Basics of Images
Images that we see on mobiles or computers are stored in a digital format.
1. Pixels
-
Pixel = picture element (smallest unit of an image).
-
An image is made of thousands or millions of pixels arranged in rows and columns.
-
When zoomed in, an image looks like many coloured or gray squares (pixels).
-
More pixels = clearer and sharper image.
2. Resolution
Resolution refers to the number of pixels in an image.
Two ways to express resolution:
-
Width × Height
-
Example: 1280 × 1024 means:
-
1280 pixels across
-
1024 pixels vertically
-
-
-
Megapixels (MP)
-
1 megapixel = 1 million pixels
-
Example:
-
1280 × 1024 = 1,310,720 pixels
-
≈ 1.31 megapixels
-
-
Higher resolution = more detail.
3. Pixel Value
Each pixel has a value that describes:
-
Brightness
-
Color
Most common format:
-
8-bit pixel → values range from 0 to 255
Meaning:
-
0 = black
-
255 = white
-
Middle values = gray shades (for grayscale images)
Why 0–255?
-
Pixel uses 1 byte = 8 bits
-
Each bit can be 0 or 1
-
Total combinations = 2⁸ = 256 values → 0 to 255
4. Grayscale Images
-
Contain shades of gray (no color).
-
0 = black, 255 = white, values in between = gray.
-
Each pixel requires 1 byte.
-
Stored as a 2D array (Height × Width).
Example:
-
A grayscale image might look black, white, or gray depending on pixel intensity.
5. RGB Images
RGB = Red, Green, Blue
These are the three primary colours used to make all other colours.
How RGB images are stored:
-
Stored using three channels:
-
R channel
-
G channel
-
B channel
-
-
Each pixel has three values (R, G, B) between 0 and 255.
-
All three channels combine to form the final color image.
Comments
Post a Comment