Cache-aware prefill–decode disaggregation for 40% faster LLM serving

(together.ai)

1 points | by roody_wurlitzer 8 hours ago ago

No comments yet.