Hardware Acceleration 1 Primer on Large Language Model (LLM) Inference Optimizations: 3. Model Architecture Optimizations Nov 15, 2024