sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

592
active users

Leshem Choshen

Opposite scaling law: detection of machine-generated text is done better by smaller models

Everyone (outside ...) is afraid GPT would cheat for them, which pushes for detection methods

arxiv.org/abs/2305.09859

First the problem, given a text you want to know whether a human wrote it. You've been in NLP lately I am sure a teacher, sister, nephew etc. called and told you they suspect someone handed them a GPT text.
Problem: how can you tell
The approach
Randomly replace words
Then see how much it changed the sentence probability\likelihood

presented by
arxiv.org/abs/2301.11305

The idea behind it
Model text is uniquely fit to the model expectations
Human texts are just arbitrary good sentences

Therefore, any change to a machine text would have a larger effect on the probability

What this paper found is that

the smaller the LM you pick the better it is in detecting the LLM text
regardless of same training data same architecture, model etc.