Parallel generation from auto regressive LMs
Para ll el!
Well not exactly, use a fast LM to propose the next words first
https://arxiv.org/abs/2302.01318
#NLProc #Generation #inference
#DeepMind
The story is very simple
Auto regressive models predict the next word given the last, annoying and - with a strong model - slow
Instead, they propose to use a fast model to predict the next words
Then check on all of those words whether the strong model agrees about them
For a bit more details:
q - strong model
p - poor model
p generated x1 .. xn words
q then calculates their probabilities (did I say on parallel?)
We accept them if q gives high probability (eq in fig)
What if we reject?
We just pick another with some other probability
and while not explicit, I guess lose the future predictions by the poor model?
Great speedups, simple and clean.
If you noted, they did choose a rather odd way of rechoosing when they did not agree, this is contrastive sampling, right?
It is assumed to be better, but if it is not, there is no reason not to sample in any other way at this point or am I missing something
Apparently something quite similar was proposed in November, getting hot in here
https://arxiv.org/abs/2211.17192