01

Deep Learning Fundamentals

Understand how neural networks learn and predict

02

Computer Vision with CNNs

Seeing the world through convolutional neural networks

03

Generative Models

Creating new data with Variational Autoencoders

04

Natural Language Processing

Teaching machines to understand text

05

Multi-Modal AI

Connecting vision and language together

📓

Hands-on Notebooks

Jupyter notebooks with runnable code examples

L1

Regression to Deep Learning Demo

Generate a spiral dataset, train a small PyTorch network with ReLU/Sigmoid, and visualize decision regions.

  • Spiral Dataset
  • PyTorch MLP
  • Decision Boundaries
L2

MNIST with Fully-Connected Network

Classify MNIST digits using an MLP. Achieves ~98% accuracy with live loss visualization and confusion matrix.

  • MNIST
  • MLP Classifier
  • Confusion Matrix
L3

MNIST with CNN

Train a CNN on MNIST for ~99% accuracy. Visualize learned convolution filters and activation maps.

  • CNN Architecture
  • Filter Visualization
  • Activation Maps
L4

Data Augmentation (CIFAR-10)

Compare training with and without augmentation using FastAI and xResNet18 on CIFAR-10.

  • FastAI
  • Augmentation Pipeline
  • Transfer Learning
L6

Word Embeddings

From tokenization to Word2Vec magic. Explore word analogies, Skip-Gram, and CBOW.

  • Tokenization
  • nn.Embedding
  • Word2Vec Analogies
L6

Bag of Embeddings Classifier

A simpler alternative: average pretrained GloVe embeddings and classify with a linear layer.

  • GloVe Embeddings
  • Bag of Embeddings
  • Topic Classification
L7

Transformers & Attention

Self-attention from scratch, positional encoding, multi-head attention. Use BERT & GPT-2 via Hugging Face.

  • Self-Attention
  • BERT & GPT-2
  • Hugging Face
L8

CLIP Basics

Zero-shot image classification, prompt engineering, image-text retrieval, and contrastive learning with CLIP.

  • Zero-Shot Classification
  • Prompt Engineering
  • Image-Text Retrieval
  • InfoNCE Loss
L8

Vision-Language Pipelines

Image captioning and VQA with BLIP, zero-shot classification with HuggingFace pipelines.

  • BLIP Captioning
  • Visual Question Answering
  • HuggingFace Pipelines
L8

Caption Generator with CLIP

Train an LSTM caption decoder using frozen CLIP embeddings. Learn the "Show and Tell" architecture.

  • CLIP Encoder
  • LSTM Decoder
  • Greedy Decoding
  • Caption Training
🚀

Projects

Larger hands-on projects to apply your skills

VAE

Face Autoencoder

Train a Variational Autoencoder on LFW faces, analyze the latent space with PCA, and explore an interactive Gradio app that lets you manipulate facial features with sliders.

  • VAE
  • LFW Dataset
  • Latent PCA
  • Gradio App
  • Face Generation
View Project →