AI Basics — Lesson 6: Computer Vision

What is Computer Vision?

Computer Vision (CV) enables computers to extract meaning from visual data—photos, video frames, and camera streams. Typical goals include recognizing objects, locating them, and understanding scenes for navigation or decisions.

Core Tasks

Image Classification: what’s in the image?
Object Detection: where are the objects? (bounding boxes)
Semantic Segmentation: pixel-wise labeling (road, sky, person)
Instance Segmentation: separate each object instance
OCR: read text from images/scans

Key Techniques

CNNs: Convolutional Neural Networks for feature extraction
Data Augmentation: flips/crops/brightness to improve generalization
Transfer Learning: fine-tune pretrained nets (ResNet, MobileNet)
Detection Models: YOLO, SSD, Faster R-CNN
Vision Transformers (ViT): attention-based models for images

Typical CV Pipeline

1) Collect & Label

Gather images; annotate classes, boxes, or masks. Ensure variety (angles, lighting).

2) Preprocess

Resize, normalize, split into train/val/test. Apply augmentations.

3) Train

Start with a pretrained backbone; monitor loss/accuracy and avoid overfitting.

4) Evaluate

Use metrics like accuracy, mAP (detection), IoU (segmentation).

5) Deploy

Export optimized models (ONNX/TF Lite); test on target device latency.

6) Monitor

Watch for data drift; retrain when performance drops.

Tip: small datasets → lean on transfer learning to get strong results quickly.

Challenges

Lighting, occlusion, motion blur
Domain shift between training and real world
Bias in datasets (lack of diversity)
Privacy & safety considerations for camera data

Applications

Quality inspection in factories
Medical imaging support
Autonomous driving assistance
Retail analytics & shelf scanning
Document scanning & OCR

Lesson 6: Computer Vision