Deeply Learning

Notes on Deep Learning Systems and AI Research

Deeply Learning 3: Mean Squared Error Loss Implementation from Scratch

Deeply learning one concept at a time. In this post, we will implement mean squared error loss from scratch.

5 min read · February 18, 2026

2026 · Deep Learning Mean Squared Error PyTorch Loss Functions Neural Networks · Deeply Learning Neural Networks
Deeply Learning 2: Cross Entropy Loss Implementation from scratch

Deeply learning one concept at a time. In this post, we implement cross entropy loss from scratch.

5 min read · February 17, 2026

2026 · Deep Learning Cross Entropy PyTorch Loss Functions Neural Networks · Deeply Learning Neural Networks
Deeply Learning 1: Dropout Implementation from scratch

Deeply learning one concept at a time. In this post, we implement simple dropout from scratch.

4 min read · February 17, 2026

2026 · Deep Learning Dropout PyTorch Regularization Neural Networks · Deeply Learning Neural Networks
Introduction to Mechanistic Interpretability, Superposition and Sparse Autoencoders

In this post, we will explore the concepts of Superposition and Sparse Autoencoders in the context of mechanistic interpretability. We'll build a spar...

66 min read · December 25, 2025

2025 · AI Safety Alignment Mechanistic Interpretability Transformers Superposition Sparse Autoencoders SAE Natural Language Processing Transformerss NLP · AI Safety Alignment Mechanistic Interpretability Transformers Superposition Sparse Autoencoders SAE Natural Language Processing NLP
Representation Engineering - I: Steering Language Models With Activation Engineering

Implement Activation Addition (ActAdd) and Contrastive Activation Addition (CAA) to steer language models at inference time without training. Learn how adding vectors to the residual stream changes behavior, with practical code implementations and analysis of both methods.

37 min read · December 20, 2025

2025 · Activation Engineering Representation Engineering Alignment AI Safety Controlibility Large Language Models LLMs Transformers Natural Language Processing NLP RePE · AI Safety Alignment Representation Engineering Activation Engineering ControlibilityNatural Language Processing NLP Large Language Models LLMs Transformers
Building GPT-2 From Scratch: Mechanistic Interpretability View

In this post, we're going to build GPT-2 from the ground up, implementing every component ourselves and understanding exactly how this remarkable architecture works.

38 min read · December 19, 2025

2025 · LLM Transformers Mechanistic Interpretability Natural Language Processing NLP Large Language Models LLMs Transformers Attention Residual Streams Mechanistic Interpretability AI Safety · Large Language Model Transformers Mechanistic Interpretability Natural Language Processing AI Safety
Visualizing Attention: See what an LLM sees.

Learn how attention mechanisms work in transformers by visualizing what LLMs see when processing text. Discover how attention connects semantically related tokens (like Paris → French), understand the Query-Key-Value framework, and explore how different attention heads specialize in syntax, semantics, and coreference.

26 min read · December 19, 2025

2025 · Attention Transformers Large Language Models LLMs Natural Language Processing NLP · Natural Language Processing NLP Large Language Models LLMs Transformers
Supervised Finetuning in LLM training workflow

Learn how supervised fine-tuning (SFT) fits into the LLM training pipeline. This post explains the three-step process (pretraining → SFT → alignment), demonstrates SFT implementation with a practical example, and shows how fine-tuning transforms a base model into a task-specific assistant.

38 min read · December 18, 2025

2025 · Supervised Fine-tuning SFT LLM Large Language Models LLMs Transformers Natural Language Processing NLP · Natural Language Processing NLP Large Language Models LLMs Transformers
From Words to Meaning: Implementing Word2Vec from Scratch

Word embeddings are one of the most transformative developments in Natural Language Processing (NLP). They solve a fundamental problem: how can we rep...

168 min read · December 17, 2025

2025 · Word Embeddings Word2Vec Embeddings Embedding Models Natural Language Processing NLP Large Language Models Deep Learning Neural Networks · Natural Language Processing
Primer on Large Language Model (LLM) Inference Optimizations: 3. Model Architecture Optimizations

Exploring model architecture optimizations for Large Language Model (LLM) inference, focusing on Group Query Attention (GQA) and Mixture of Experts (MoE) techniques.

11 min read · November 15, 2024

2024 · LLM Inference Optimization Transformer Attention Mechanism Multi-Head Attention K-V Caching Memory Calculation Optimization Metrics Optimization Techniques Mixture of Experts Group Query Attention GQA MoE AI Accelerators Hardware Acceleration Model Architecture Optimizations Natural Language Processing NLP Large Language Models LLMs Transformers · Large Language Model Inference Optimization Natural Language Processing