Tags
3 pages
Transformers
Pole Vaulting the Memory Wall (at speed): finetuning LLMs at scale
Attention Mechanisms - tracking the evolution + pair programming in pytorch
Speculative Decoding: 2x to 4x speedup of LLMs without quality loss