Ravi's Blog
Preview Image

Representation Engineering - I: Steering Language Models With Activation Engineering

Implement Activation Addition (ActAdd) and Contrastive Activation Addition (CAA) to steer language models at inference time without training. Learn how adding vectors to the residual stream changes behavior, with practical code implementations and analysis of both methods.