Lesson 6: Computer Vision

Teaching machines to “see” and understand images & video

What is Computer Vision?

Computer Vision (CV) enables computers to extract meaning from visual data—photos, video frames, and camera streams. Typical goals include recognizing objects, locating them, and understanding scenes for navigation or decisions.

Core Tasks

  • Image Classification: what’s in the image?
  • Object Detection: where are the objects? (bounding boxes)
  • Semantic Segmentation: pixel-wise labeling (road, sky, person)
  • Instance Segmentation: separate each object instance
  • OCR: read text from images/scans

Key Techniques

  • CNNs: Convolutional Neural Networks for feature extraction
  • Data Augmentation: flips/crops/brightness to improve generalization
  • Transfer Learning: fine-tune pretrained nets (ResNet, MobileNet)
  • Detection Models: YOLO, SSD, Faster R-CNN
  • Vision Transformers (ViT): attention-based models for images

Typical CV Pipeline

1) Collect & Label

Gather images; annotate classes, boxes, or masks. Ensure variety (angles, lighting).

2) Preprocess

Resize, normalize, split into train/val/test. Apply augmentations.

3) Train

Start with a pretrained backbone; monitor loss/accuracy and avoid overfitting.

4) Evaluate

Use metrics like accuracy, mAP (detection), IoU (segmentation).

5) Deploy

Export optimized models (ONNX/TF Lite); test on target device latency.

6) Monitor

Watch for data drift; retrain when performance drops.

Tip: small datasets → lean on transfer learning to get strong results quickly.

Challenges

  • Lighting, occlusion, motion blur
  • Domain shift between training and real world
  • Bias in datasets (lack of diversity)
  • Privacy & safety considerations for camera data

Applications

  • Quality inspection in factories
  • Medical imaging support
  • Autonomous driving assistance
  • Retail analytics & shelf scanning
  • Document scanning & OCR

Quick Quiz

1) Object detection outputs…

2) A common way to boost performance with few images is…

3) Which metric is standard for detection models?

Previous Back to Index Next