DSpark: Speculative decoding accelerates LLM inference [pdf]

(github.com)

793 points | by aurenvale 4 days ago ago

388 comments