Skip to main content

LLM

Speculative Decoding: 2x to 4x speedup of LLMs without quality loss
·10 mins
Machine Learning NLP LLM Speculative Decoding Transformers Inference Optimization