Frontier #LLM have progressed with reasoning, but they still can not beat #GNU chess at the game.
I tested #Gemini 2.0 Flash Thinking and #DeepSeek R1. The system prompt stated they should be the White Player, and I the Black. I started gnuchess and fed it the LLM White player input, then fed the LLM gnuchess' response.
After awhile both LLMs struggled to maintain board state- Gemini before the 10th move and DeepSeek around the 15th move. They then started making illegal moves.
@chrisvitalos thanks for testing!
I wonder how good it is if you'd give it the whole board state after every move
I am willing to bet helping the LLM keep track of board state will result in faster response times. Gemini and Deepseek both spent considerable time reconstructing state at every move.
And it is reasonable to say if they can track board state, they should not make so many mistakes i.e. illegal moves.