sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

596
active users

Ondřej Cífka

We present Context Length Probing, an embarrassingly simple, model-agnostic, explanation technique for causal (-like) language models.

The idea is simply to check how predictions change as the left-hand context is extended token by token. This allows assigning "differential importance scores" to contexts as shown in the video.

Paper: arxiv.org/abs/2212.14815
Code: github.com/cifkao/context-prob
Demo: cifkao.github.io/context-probi

🧵1/4

In this plot, we show on an example how two different metrics (LM loss and a metric based on KL divergence) change as the context length increases (from right to left). Some context tokens cause abrupt changes, and we suggest the interpretation that these tokens bring important information not already covered by shorter contexts. 🧵2/4

The technique works with any causal LM, as long as it was trained to accept arbitrary text fragments (not necessarily starting at sentence or document boundary), which happens to be how large -like models (, , , ...) are usually trained.

The main trick is in realizing that the necessary probabilities can be computed efficiently by running the model along a sliding window. 🧵3/4

Specifically, to compute the output distributions for all positions in a text of length N and all context lengths up to a max length C, we just need to run inference along a sliding window of length C, i.e. do N forward passes on segments of length ≤C. (see the illustration in my previous post)

Notice that this is a lot like generating a new sequence from the model (the naïve way)! 🧵4/4