If LLMs Only Predict the Next Token, Why Do They Work?

(sicheng.dev)

3 points | by sichengo 7 hours ago ago

8 comments

$chrisjj 7 hours ago

> we somehow observe behaviors that resemble reasoning, abstraction, and even creativity.
Puleeze. You type a search into Google and it returns a news article. Did you observe Google being creative?
> where exactly does the apparent intelligence come from?
The user's gullibility.
[-]
- $sichengo 6 hours ago
  
  Hmm, good point, but google only retrieves information from the web but LLMs do generate a new continuations from learned distributions. I would say the interesting question is that why modeling reasoning traces at scale produces reasoning like behavior at all.
  [-]
  - $chrisjj 5 hours ago
    
    "Continuations"? Simple stochastic parrotry.
    > I would say the interesting question is that why modeling reasoning traces at scale produces reasoning like behavior at all.
    It doesn't. Its simply anthropomorphisation. The only reasoning you see is that of the sources - parotted by the bot.
- $verdverm 6 hours ago
  
  Dogs have intelligence, bees have intelligence, even slime molds some would argue. They are all different, yet still recognized. For you, why Ai is different?
  [-]
  - $chrisjj 6 hours ago
    
    Why is your fridge, toaster or keyfob different? Same applies.
    [-]
    - $ 5 hours ago
      
      [deleted]
$_wire_ 5 hours ago

The confusion arises from the question begging an arbitrary distinction of a token in isolation from all the state in the model and its progression in response to a prompt.
They work because while the process is generating a token at a time, each token has a location in an N-dimensional matrix of overlayed state networks for all the tokens in the training data, the tokens in the given prompt, and the sequence of tokens emitted so far for this prompt.
As an analogy, an image on the screen is emitted a pixel at a time, but each pixel's state is coded as part of a network in a matrix that includes all the residual state from the point of image capture.
And just like an image on your screen the computer has no "ideas" about the contents of the presentation, but other subsystems may use mathematical approaches to selecting and categorizing images, filtering, etc.
The common regard that AI is thinking is purely a matter of appearances, and idiomatic terminology.
As to why we tend to become troubled by the resemblance of AI behavior to thought or creativity, but we are not at all troubled by how entire worlds exist within our TV sets is a matter of surprise and conditioning to the medium.
[-]
- $sichengo 5 hours ago
  
  Yes, I do admit that I simplified the mechanism in the article, but my question is why the scale of next-token prediction yields reasoning-like behavior.