-
Deeply Learning 4: LoRA (Low Rank Adaptation) - Math, Intuition and Implementation from Scratch
Full fine-tuning requires storing and updating a complete copy of every model parameter for each downstream task. LoRA sidesteps this by injecting small, trainable low-rank matrices while freezing the original weights. In this post, we derive the math behind LoRA, understand why it works through intrinsic dimensionality, implement a LoRATrainer from scratch in PyTorch, fine-tune GPT-2 on SST-2 sentiment classification, compare it against full fine-tuning, and merge the LoRA weights back for zero-cost inference.
-
DeeplyGrad Part 3: Building a Transformer from Scratch
In Parts 1 and 2 we built an autograd engine and trained an MLP on MNIST. In this post, we build a decoder-only transformer (GPT-style) from scratch using those primitives. We implement RoPE, layer normalization, multi-head causal self-attention, and the feed-forward network, then train the full model on Tiny Shakespeare to generate text.
-
DeeplyGrad Part 2: Teaching Our Autograd to Classify MNIST Digits
In Part 1 of the series, we built a `Tensor` class with full backpropagation support. In this post, we add everything needed to train a real neural network. We will build PyTorch like `Module` for layers, activations, losses. Then we will put everything together to train a MLP classifier to classify the classic MNIST digits with high accuracy.
-
DeeplyGrad: Building an Autograd Engine from Scratch with CuPy - Part 1: Tensor and Backpropagation
In this series, we are going to build a complete autograd engine from scratch and then use it to build a transformer. No PyTorch or JAX, pure Python and CuPy.
-
Deeply Learning 3: Mean Squared Error Loss Implementation from Scratch
Deeply learning one concept at a time. In this post, we will implement mean squared error loss from scratch.
-
Deeply Learning 2: Cross Entropy Loss Implementation from scratch
Deeply learning one concept at a time. In this post, we implement cross entropy loss from scratch.
-
Deeply Learning 1: Dropout Implementation from scratch
Deeply learning one concept at a time. In this post, we implement simple dropout from scratch.
-
Introduction to Mechanistic Interpretability, Superposition and Sparse Autoencoders
In this post, we will explore the concepts of Superposition and Sparse Autoencoders in the context of mechanistic interpretability. We'll build a spar...
-
Representation Engineering - I: Steering Language Models With Activation Engineering
Implement Activation Addition (ActAdd) and Contrastive Activation Addition (CAA) to steer language models at inference time without training. Learn how adding vectors to the residual stream changes behavior, with practical code implementations and analysis of both methods.
-
Building GPT-2 From Scratch: Mechanistic Interpretability View
In this post, we're going to build GPT-2 from the ground up, implementing every component ourselves and understanding exactly how this remarkable architecture works.