sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

594
active users

Chris Vitalos

Frontier have progressed with reasoning, but they still can not beat chess at the game.

I tested 2.0 Flash Thinking and R1. The system prompt stated they should be the White Player, and I the Black. I started gnuchess and fed it the LLM White player input, then fed the LLM gnuchess' response.

After awhile both LLMs struggled to maintain board state- Gemini before the 10th move and DeepSeek around the 15th move. They then started making illegal moves.

@chrisvitalos thanks for testing!

I wonder how good it is if you'd give it the whole board state after every move

@bazkie

I am willing to bet helping the LLM keep track of board state will result in faster response times. Gemini and Deepseek both spent considerable time reconstructing state at every move.

And it is reasonable to say if they can track board state, they should not make so many mistakes i.e. illegal moves.