Claude Opus 4.7

(anthropic.com)

186 points | by AlphaWeaver 5 hours ago ago

12 comments

$ChrisArchitect 5 hours ago

Some more discussion on announcement post: https://www.anthropic.com/news/claude-opus-4-7 (https://news.ycombinator.com/item?id=47793411)
[-]
- $tomhow 5 hours ago
  
  Comments moved thither. Thanks!
$AlphaWeaver 5 hours ago

Might be better to update the URL to this, actually: https://www.anthropic.com/news/claude-opus-4-7
$jameson 5 hours ago

How should one compare benchmark results?
For example, SWE-bench Pro improved ~11% compared with Opus 4.6. Should one interpret it as 4.7 is able to solve more difficult problems? or 11% less hallucinations?
$constantius 5 hours ago

Not related to this release, but is anyone aware of what's happening with Deepseek? The usual cascade of synced releases has been lacking this frontier lab whale for a while now.
[-]
- $rvz 5 hours ago
  
  > Not related to this release, but is anyone aware of what's happening with Deepseek?
  Given that no-one is talking about DeepSeek, I assume it is coming this month.
  They are still releasing research papers and that is what really matters and not the .1 increment releases of AI models to massage benchmarks or create hype around.
  [-]
  - $cmrdporcupine 5 hours ago
    
    There's been months of "DeepSeek v4 next week!" rumours and none have panned out.
    They're either stuck/dead or they're sitting on something really fantastic that they only want to release once they've perfected it.
    My realistic side thinks the former, my optimism on the latter.
    In the meantime, GLM 5.1 is actually really good.
    [-]
    - $bsaul 5 hours ago
      
      i tried to find an API pricing for GLM 5.1 but couldn't find any on the homepage. How are you using it ?
      [-]
      - $cmrdporcupine 5 hours ago
        
        per-token via DeepInfra, who hosts it as one of their models.
        https://deepinfra.com/zai-org/GLM-5.1
$vomayank 4 hours ago

Curious how people are evaluating real-world gains with this version.
Are you seeing meaningful improvements in reasoning reliability, or mostly incremental quality changes compared to previous releases?
$grandinquistor 5 hours ago

Quite a big improvement in coding benchmarks, doesn’t seem like progress is plateauing as some people predicted.
$hansmayer 5 hours ago

Ah, here we go again.