2 points | by latexr 5 hours ago ago
3 comments
At the risk of stating the obvious, DUH.
There was a relatively comprehensive article around benching LLMs to play chess that measured even the SOTA models at around a mediocre 1000 ELO as compared to Carlsen who is rated at ~2850.
https://maxim-saplin.github.io/llm_chess
He went on to ask ChatGPT for feedback on his performance.
A master asking a beginner for feedback? I guess he was just curious if the evaluation would be as inept as the play.
OK, now let's see how Stockfish does at Python coding.
At the risk of stating the obvious, DUH.
There was a relatively comprehensive article around benching LLMs to play chess that measured even the SOTA models at around a mediocre 1000 ELO as compared to Carlsen who is rated at ~2850.
https://maxim-saplin.github.io/llm_chess
He went on to ask ChatGPT for feedback on his performance.
A master asking a beginner for feedback? I guess he was just curious if the evaluation would be as inept as the play.
OK, now let's see how Stockfish does at Python coding.