This is an absolutely insane speed plus they claim 20x lower cost for the hardware than Nvidia. If they manage to get a 10x larger LLM and 100,000 token context then they will be the new Nvidia.
This is crazy! These chips could make high-reasoning models run so fast that they could generate lots of solution variants and automatically choose the best.
Or you could have a smart chip in your home lab and run local models - fast, without needing a lot of expensive hardware or electricity
It's an LLM ASIC that runs one single LLM model at ridiculous speeds. It's a demonstration chip that runs Llama-3-8B at the moment but they're working on scaling it to larger models. I think it has very big implications on how AI will look like a few years from now. IMO the crucial question is whether they will get hard-limited by model size similarly to Cerebras
This is an absolutely insane speed plus they claim 20x lower cost for the hardware than Nvidia. If they manage to get a 10x larger LLM and 100,000 token context then they will be the new Nvidia.
This is crazy! These chips could make high-reasoning models run so fast that they could generate lots of solution variants and automatically choose the best. Or you could have a smart chip in your home lab and run local models - fast, without needing a lot of expensive hardware or electricity
Saw this on /r/localllama
It's an LLM ASIC that runs one single LLM model at ridiculous speeds. It's a demonstration chip that runs Llama-3-8B at the moment but they're working on scaling it to larger models. I think it has very big implications on how AI will look like a few years from now. IMO the crucial question is whether they will get hard-limited by model size similarly to Cerebras