sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

588
active users

I’m eager to read an ICLR paper discussing whether an LLM trained to "play" Othello learned a model of the board, not just surface statistics. But I’m immediately wary, based on a blog post by the lead author, who wrote: "the recent increase in model and data size has brought about qualitatively new behaviors such as writing basic code[1] or solving logic puzzles[2]."

The co-host of Mystery AI Hype Theatre 3000, @emilymbender, reminds us to "always read the footnotes" — these ones have issues!

First, here is the blog post: thegradient.pub/othello/

Footnote [1] is a tweet in praise of GitHub Copilot, which was trained on a large database of code. When Copilot writes code, this is not a “qualitatively new behaviour”; this is a model doing what it has been trained to do.

Footnote [2] is more puzzling. To support the claim that LLMs exhibit the “qualitatively new behaviour [of] solving logic puzzles”, the author cites a tweet by @dfeldman — which shows an LLM *failing* to do logic. The tweet asks: "Can GPT-3 solve simple logic puzzles?" and shows a series of GPT-3 phrase completion-based chats, beginning with:
Prompt: "Q: Alice is shorter than Bob. Bob is taller than Charlie. Is Alice shorter than Charlie? A:"
Completion: "Yes, Alice is shorter than Charlie."

Jordan B. L. Smith

I’m left to wonder: when the author of the Othello blogpost cited this tweet, did they realise it contradicted their point? Or, when they saw the rhethorical question (“Can GPT-3 solve simple logic puzzles?”), did they just skim the screenshots, see that they looked vaguely like a chatbot solving a logic puzzle, and then assume that the answer to the question was “yes”?

Either way, these opening footnotes smack of something — whether sloppiness or credulousness, I don’t know if it matters!

@dfeldman’s tweet — from June 2022 — also left me curious about whether the current version of ChatGPT (3.5) would also fail to complete the syllogism. It did! So did Anthropic's Claude.

So, can text-completion systems solve logic puzzles? Well, solving a syllogism is a prerequisite to solving a logic puzzle. ChatGPT cannot solve a syllogism. Therefore... 🤗