Deeply Learning

Notes on Deep Learning Systems and AI Research

Deeply Learning 4: LoRA (Low Rank Adaptation) - Math, Intuition and Implementation from Scratch

Full fine-tuning requires storing and updating a complete copy of every model parameter for each downstream task. LoRA sidesteps this by injecting small, trainable low-rank matrices while freezing the original weights. In this post, we derive the math behind LoRA, understand why it works through intrinsic dimensionality, implement a LoRATrainer from scratch in PyTorch, fine-tune GPT-2 on SST-2 sentiment classification, compare it against full fine-tuning, and merge the LoRA weights back for zero-cost inference.

58 min read · April 15, 2026

2026 · Deep Learning LoRA Low-Rank Adaptation Fine-Tuning Parameter-Efficient Fine-Tuning PyTorch GPT-2 Sentiment Analysis SST-2 From Scratch Transformer · Deep Learning Neural Networks Fine-Tuning Transformer
DeeplyGrad Part 3: Building a Transformer from Scratch

In Parts 1 and 2 we built an autograd engine and trained an MLP on MNIST. In this post, we build a decoder-only transformer (GPT-style) from scratch using those primitives. We implement RoPE, layer normalization, multi-head causal self-attention, and the feed-forward network, then train the full model on Tiny Shakespeare to generate text.

43 min read · April 13, 2026

2026 · Deep Learning Autograd Backpropagation CuPy Automatic Differentiation Neural Networks From Scratch RoPE Layer Normalization Multi-Head Causal Self-Attention Feed-Forward Network Transformer GPT · Deep Learning Neural Networks Transformer
DeeplyGrad Part 2: Teaching Our Autograd to Classify MNIST Digits

In Part 1 of the series, we built a `Tensor` class with full backpropagation support. In this post, we add everything needed to train a real neural network. We will build PyTorch like `Module` for layers, activations, losses. Then we will put everything together to train a MLP classifier to classify the classic MNIST digits with high accuracy.

28 min read · March 14, 2026

2026 · Deep Learning Autograd Backpropagation CuPy Automatic Differentiation Neural Networks From Scratch MNIST · Deep Learning Neural Networks
DeeplyGrad: Building an Autograd Engine from Scratch with CuPy - Part 1: Tensor and Backpropagation

In this series, we are going to build a complete autograd engine from scratch and then use it to build a transformer. No PyTorch or JAX, pure Python and CuPy.

35 min read · March 10, 2026

2026 · Deep Learning Autograd Backpropagation CuPy Automatic Differentiation Neural Networks From Scratch · Deep Learning Neural Networks
Deeply Learning 3: Mean Squared Error Loss Implementation from Scratch

Deeply learning one concept at a time. In this post, we will implement mean squared error loss from scratch.

5 min read · February 18, 2026

2026 · Deep Learning Mean Squared Error PyTorch Loss Functions Neural Networks · Deeply Learning Neural Networks
Deeply Learning 2: Cross Entropy Loss Implementation from scratch

Deeply learning one concept at a time. In this post, we implement cross entropy loss from scratch.

4 min read · February 17, 2026

2026 · Deep Learning Cross Entropy PyTorch Loss Functions Neural Networks · Deeply Learning Neural Networks
Deeply Learning 1: Dropout Implementation from scratch

Deeply learning one concept at a time. In this post, we implement simple dropout from scratch.

4 min read · February 17, 2026

2026 · Deep Learning Dropout PyTorch Regularization Neural Networks · Deeply Learning Neural Networks
Introduction to Mechanistic Interpretability, Superposition and Sparse Autoencoders

In this post, we will explore the concepts of Superposition and Sparse Autoencoders in the context of mechanistic interpretability. We'll build a spar...

66 min read · December 25, 2025

2025 · AI Safety Alignment Mechanistic Interpretability Transformers Superposition Sparse Autoencoders SAE Natural Language Processing Transformerss NLP · AI Safety Alignment Mechanistic Interpretability Transformers Superposition Sparse Autoencoders SAE Natural Language Processing NLP
Representation Engineering - I: Steering Language Models With Activation Engineering

Implement Activation Addition (ActAdd) and Contrastive Activation Addition (CAA) to steer language models at inference time without training. Learn how adding vectors to the residual stream changes behavior, with practical code implementations and analysis of both methods.

37 min read · December 20, 2025

2025 · Activation Engineering Representation Engineering Alignment AI Safety Controlibility Large Language Models LLMs Transformers Natural Language Processing NLP RePE · AI Safety Alignment Representation Engineering Activation Engineering ControlibilityNatural Language Processing NLP Large Language Models LLMs Transformers
Building GPT-2 From Scratch: Mechanistic Interpretability View

In this post, we're going to build GPT-2 from the ground up, implementing every component ourselves and understanding exactly how this remarkable architecture works.

38 min read · December 19, 2025

2025 · LLM Transformers Mechanistic Interpretability Natural Language Processing NLP Large Language Models LLMs Transformers Attention Residual Streams Mechanistic Interpretability AI Safety · Large Language Model Transformers Mechanistic Interpretability Natural Language Processing AI Safety