Tags
2 pages
Transformers
Attention Mechanisms - tracking the evolution + pair programming in pytorch
Speculative Decoding: 2x to 4x speedup of LLMs without quality loss