Categories
Distributed Training
LLM
Machine Learning
Practical ML
NLP
2025
Pole Vaulting the Memory Wall (at speed): finetuning LLMs at scale
Attention Mechanisms - tracking the evolution + pair programming in pytorch
Speculative Decoding: 2x to 4x speedup of LLMs without quality loss