Unit 5-Computer Vision

Unit 5-Computer Vision


  • What is Computer Vision?

  • A domain of AI that enables machines to “see” like humans using images or visual data.

  • Machines can capture, process, analyze, and interpret visual information.

How Humans See

  • Eye → captures visuals

  • Brain → interprets

How Machines See

  • Camera / Sensor → captures image

  • Computer Vision Algorithms → analyze & interpret

 Quick Overview of Computer Vision

Computer Vision = extracting useful information from:

  • Images

  • Videos

  • Text

  • Visual signals

It helps machines understand visuals just like humans do.

Relationship: AI → Computer Vision → Deep Learning

  • Artificial Intelligence (AI): Makes computers think intelligently.

  • Computer Vision (CV): Enables computers to see.

  • Deep Learning (DL): Helps CV models learn automatically from large image datasets.

Computer Vision vs Image Processing

Computer Vision

  • Focus: Understanding images.

  • What it does: Extracts meaningful information to make decisions.

  • Examples:

    • Object detection

    • Face recognition

    • Handwriting recognition

Image Processing

  • Focus: Improving images.

  • What it does: Enhances or prepares images for further tasks.

  • Examples:

    • Resizing

    • Adjusting brightness

    • Changing color tones

Relationship

  • Image Processing is a subset of Computer Vision.

  • Computer Vision is the bigger field.

Applications of Computer Vision

  • 1. Facial Recognition

    • Used in smart homes, smart cities, and security systems.

    • Helps recognize guests, maintain visitor logs, and manage access.

    • Schools use it for automatic attendance.

    • CV identifies and matches facial features.

    2. Face Filters (Instagram, Snapchat)

    • Apps detect facial movements and landmarks.

    • Filters (dog ears, glasses, beauty filters, etc.) are placed accurately on the face.

    • Computer Vision tracks nose, eyes, lips, etc., in real time.

    3. Google’s Search by Image

    • Instead of typing text, you upload an image to search.

    • CV compares features of the image with millions of images in Google’s database.

    • Helps identify objects, locations, people, products, etc.

    4. Computer Vision in Retail

    a) Customer Tracking

    • CV tracks customer movement inside stores.

    • Helps understand walking paths and behavior.

    b) Inventory Management

    • Analyzes camera footage to estimate stock levels.

    • Detects empty shelves, misplaced items, and stocking errors.

    • Suggests better product placement for improved sales.

    5. Self-Driving Cars

    • Computer Vision is the core technology.

    • Used for:

      • Object detection (cars, humans, traffic lights)

      • Lane detection

      • Route navigation

      • Environment monitoring

    • Enables hands-free or autonomous driving.

    6. Medical Imaging

    • Helps doctors to see, analyze, and interpret medical images.

    • Converts 2D scans like CT, MRI into 3D models.

    • Provides detailed views of organs, helping diagnosis and treatment planning.

    • Acts as an assistant to medical professionals.

    7. Google Translate App (Live Translation)

    • Point your camera at foreign text → instantly translated.

    • Uses:

      • OCR (Optical Character Recognition) to read text.

      • Augmented Reality to display translated text on the screen.

    • Very useful for travel and communication.

Computer Vision Tasks



Computer Vision applications work by performing certain key tasks to extract information from images. These tasks help in prediction, analysis, and understanding of visuals.

🔹 1. Image Classification

Definition

  • Assigning a single label/category to an entire image.

  • Only identifies what object is present.

Key point

  • Works with one object per image (most basic CV task).

Examples

  • Cat vs Dog

  • Identifying handwritten digits

  • Classifying a fruit as apple/mango/orange

2. Classification + Localization

Definition

  • Performs two tasks together:

    1. ClassificationWhat object is present

    2. LocalizationWhere it is located in the image

Key point

  • Works for a single object in an image.

  • Uses bounding boxes to show location.

Example

  • Detecting and marking where a football is in an image.

 3. Object Detection

Definition

  • Identifies multiple objects in an image and their locations.

  • Detects all instances of objects like cars, people, animals, etc.

How it works

  • Uses features + learning algorithms.

Examples

  • Detecting pedestrians and vehicles in self-driving cars

  • Automated car parking systems

  • Face detection in camera apps

Key point

  • Gives bounding boxes for all detected objects.

4. Instance Segmentation

Definition

  • The most detailed CV task.

  • Detects each object, assigns it a category, and also labels each pixel belonging to that object.

Key point

  • Separates individual objects, even of the same type.

Example

  • Identifying each person in a crowd separately.

  • For two dogs in an image, it labels Dog 1 and Dog 2 pixel-wise.

Output

  • Creates segments/regions for different objects in the image.



Basics of Images

Images that we see on mobiles or computers are stored in a digital format. 

1. Pixels

  • Pixel = picture element (smallest unit of an image).

  • An image is made of thousands or millions of pixels arranged in rows and columns.

  • When zoomed in, an image looks like many coloured or gray squares (pixels).

  • More pixels = clearer and sharper image.

2. Resolution

Resolution refers to the number of pixels in an image.

Two ways to express resolution:

  1. Width × Height

    • Example: 1280 × 1024 means:

      • 1280 pixels across

      • 1024 pixels vertically

  2. Megapixels (MP)

    • 1 megapixel = 1 million pixels

    • Example:

      • 1280 × 1024 = 1,310,720 pixels

      • 1.31 megapixels

Higher resolution = more detail.

3. Pixel Value

Each pixel has a value that describes:

  • Brightness

  • Color

Most common format:

  • 8-bit pixel → values range from 0 to 255

Meaning:

  • 0 = black

  • 255 = white

  • Middle values = gray shades (for grayscale images)

Why 0–255?

  • Pixel uses 1 byte = 8 bits

  • Each bit can be 0 or 1

  • Total combinations = 2⁸ = 256 values → 0 to 255

4. Grayscale Images

  • Contain shades of gray (no color).

  • 0 = black, 255 = white, values in between = gray.

  • Each pixel requires 1 byte.

  • Stored as a 2D array (Height × Width).

Example:

  • A grayscale image might look black, white, or gray depending on pixel intensity.

 5. RGB Images

RGB = Red, Green, Blue
These are the three primary colours used to make all other colours.

How RGB images are stored:

  • Stored using three channels:

    1. R channel

    2. G channel

    3. B channel

  • Each pixel has three values (R, G, B) between 0 and 255.

  • All three channels combine to form the final color image.


Comments

Popular posts from this blog

XII UNIT 3 HOW CAN MACHINES SEE?

XII UNIT 2 Data Science Methodology: An Analytic Approach to Capstone Project

UNIT -3 MCQ PART 1 HOW CAN MACHINES SEE?