The Gradient @thegradient

Sanket Vaibhav Mehta @svm@sigmoid.social

New preprint

DSI++: Updating Transformer Memory with New Documents

Q: "Can you add new documents to DSI??" was the big question many people had when DSI first came out.
A: Turns out you actually can!

https://arxiv.org/abs/2212.09744

(1/n)

arXiv.orgDSI++: Updating Transformer Memory with New DocumentsDifferentiable Search Indices (DSIs) encode a corpus of documents in model parameters and use the same model to answer user queries directly. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents ($+12\%$). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting significantly. Concretely, it improves the average Hits@10 by $+21.1\%$ over competitive baselines for NQ and requires $6$ times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence.

Dec 21, 2022, 04:59 AM··Web

7boosts·7favorites

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

Q: What is DSI?
A: https://twitter.com/YiTayML/status/1494710879429877761?s=20&t=nPEqUUI4wcC9loqVPc3b2w
DSI paper https://openreview.net/forum?id=Vu-B0clPfq

(2/n)

TwitterYi Tay on Twitter“Excited to share our latest work at @GoogleAI on "Transformer Memory as a Differentiable Search Index"! TL;DR? We parameterize a search system with only a single Transformer model 😎. Everything in the corpus is encoded in the model! 🙌 Paper: https://t.co/yA6jSEBcdZ”

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

Q: Why DSI++?
A: Deploying the DSI model in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model

(3/n)

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

Q: What is DSI++?
A: DSI++ (DSI + new documents) a step towards incrementally indexing new documents in the DSI model by being computationally efficient and maintaining the ability to answer user queries related to both previously and newly indexed documents

(4/n)

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

Q: What are the challenges for enabling DSI++?
A: (a) Catastrophic forgetting during continual indexing a common phenomenon in neural networks wherein learning of the new documents interferes with the previously memorized documents

(5/n)

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

…(b) Implicit forgetting during memorization we observe a significant number of documents (~88%) experience forgetting events (when prediction for an individual document goes from correct docid to incorrect one) after they have been memorized

(6/n)

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

Q: How severe is the forgetting of the originally indexed documents? How does the updated DSI model perform on newly indexed documents? How do different docid representation strategies affect forgetting? How does the DSI model scale affect forgetting?
A: Systematic study

(7/n)

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

Q: How to reduce forgetting during memorization?
A: Flatter minima are shown to reduce forgetting...we explicitly optimize for flatter minima using Sharpness-Aware Minimization (SAM) procedure...see our results
SAM paper https://openreview.net/forum?id=6Tm1mposlrM

(8/n)

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

Q: How to alleviate forgetting during continual indexing?
A: Generative memory a parametric model to generate queries for documents…use it for sparse experience replay (ER) for already indexed documents...enables continual semi-supervised learning for new documents...

(9/n)

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

Q: Does our proposed approach for DSI++ generalize to different datasets?
A: We show convincing results across two DSI++ benchmarks, constructed from publicly available datasets – Natural Questions (NQ) and MS MARCO

(10/n)

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

Q: Want to know more?
A: Look at our paper for the effectiveness of the generative memory with the scale of a corpus (8.9M MS MARCO passages), sparsity of ER on forgetting, analysis around incremental index construction time

DSI++ paper arxiv.org/abs/2212.09744

(11/n)

**Sanket Vaibhav Mehta** @svm · Dec 21, 2022

Dec 21, 2022

Sanket Vaibhav Mehta @svm

I am deeply grateful to my incredible co-authors at #Google #GoogleAI and #LTI #CMU for their invaluable assistance!

(12/n)

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back