Lakshya A Agrawal: "Learn more about our work in t…" - Sigmoid Social

Lakshya A Agrawal @LakshyAAAgrawal

Hey!

Happy for my first toot to be about work on combining static analysis with Language Models (LLMs) to reduce hallucinations in generated code

A thread:
LLMs often hallucinate incorrect names, especially in private codebases.
We introduce Monitor-Guided Decoding (MGD)-guide LMs to generate compilable code with correct symbol names more reliably!

Work w/ Aditya Kanade, Navin Goyal, Shuvendu Lahiri and Sriram Rajamani at Microsoft Research

Paper: https://arxiv.org/abs/2306.10763
#arxiv #cscl #cl #ai

Example where text-davinci-003 and SantaCoder generate wrong
identifiers, but SantaCoder with monitor-guided decoding generates
correct identifiers

Lakshya A Agrawal @LakshyAAAgrawal

MGD consistently improves compilation rate and ground truth match across LM sizes.
Interestingly, smaller LMs with MGD can surpass larger LMs: SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003

Summary of results with a budget of 6 generations per model: The numbers in parentheses
are relative improvements of the "-MGD" configuration over the respective ba

Lakshya A Agrawal @LakshyAAAgrawal

MGD complements existing code generation techniques, such as retrieval-augmented, static analysis-based prompting, architecture changes, and fill-in-the-middle decoding (see sections 4.2 & 4.3). Plus, no LM fine-tuning is required!

Lakshya A Agrawal @LakshyAAAgrawal

To evaluate MGD, We curate two datasets:
PragmaticCode: Real-world open-source projects, complete with development environments & dependencies.
DotPrompts: Method-level completion tasks, derived from prompts in PragmaticCode
Code & dataset at: https://aka.ms/monitors4codegen

BingUne merveille entre ciel et merLe Mont Saint-Michel, fier et majestueux, se dress

Lakshya A Agrawal @LakshyAAAgrawal@sigmoid.social

Learn more about our work in the paper: https://arxiv.org/abs/2306.10763

arXiv.orgGuiding Language Models of Code with Global Context using MonitorsLanguage models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating. Integrated development environments (IDEs) assist developers in understanding repository context using static analysis. We extend this assistance, enjoyed by developers, to LMs. We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding. We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it. On models of varying parameter scale, by monitoring for type-consistent object dereferences, MGD consistently improves compilation rates and agreement with ground truth. Further, LMs with fewer parameters, when augmented with MGD, can outperform larger LMs. With MGD, SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003 model. We also conduct a generalizability study to evaluate the ability of MGD to generalize to multiple programming languages (Java, C# and Rust), coding scenarios (e.g., correct number of arguments to method calls), and to enforce richer semantic constraints (e.g., stateful API protocols). Our data and implementation are available at https://github.com/microsoft/monitors4codegen .

#arxiv #cscl #cl

Jun 23, 2023, 06:43 AM··Web

1boost·1favorite

Drag & drop to upload