Hey!
Happy for my first toot to be about work on combining static analysis with Language Models (LLMs) to reduce hallucinations in generated code
A thread:
LLMs often hallucinate incorrect names, especially in private codebases.
We introduce Monitor-Guided Decoding (MGD)-guide LMs to generate compilable code with correct symbol names more reliably!
Work w/ Aditya Kanade, Navin Goyal, Shuvendu Lahiri and Sriram Rajamani at Microsoft Research
MGD consistently improves compilation rate and ground truth match across LM sizes.
Interestingly, smaller LMs with MGD can surpass larger LMs: SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003
MGD complements existing code generation techniques, such as retrieval-augmented, static analysis-based prompting, architecture changes, and fill-in-the-middle decoding (see sections 4.2 & 4.3). Plus, no LM fine-tuning is required!
To evaluate MGD, We curate two datasets:
PragmaticCode: Real-world open-source projects, complete with development environments & dependencies.
DotPrompts: Method-level completion tasks, derived from prompts in PragmaticCode
Code & dataset at: https://aka.ms/monitors4codegen
Learn more about our work in the paper: https://arxiv.org/abs/2306.10763