sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

640
active users

Leshem Choshen

Parallel generation from auto regressive LMs
Para ll el!
Well not exactly, use a fast LM to propose the next words first
arxiv.org/abs/2302.01318

The story is very simple
Auto regressive models predict the next word given the last, annoying and - with a strong model - slow
Instead, they propose to use a fast model to predict the next words
Then check on all of those words whether the strong model agrees about them

For a bit more details:
q - strong model
p - poor model
p generated x1 .. xn words
q then calculates their probabilities (did I say on parallel?)
We accept them if q gives high probability (eq in fig)

What if we reject?
We just pick another with some other probability
and while not explicit, I guess lose the future predictions by the poor model?

Great speedups, simple and clean.

If you noted, they did choose a rather odd way of rechoosing when they did not agree, this is contrastive sampling, right?
It is assumed to be better, but if it is not, there is no reason not to sample in any other way at this point or am I missing something