sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

588
active users

Hey!

Happy for my first toot to be about work on combining static analysis with Language Models (LLMs) to reduce hallucinations in generated code

A thread:
LLMs often hallucinate incorrect names, especially in private codebases.
We introduce Monitor-Guided Decoding (MGD)-guide LMs to generate compilable code with correct symbol names more reliably!

Work w/ Aditya Kanade, Navin Goyal, Shuvendu Lahiri and Sriram Rajamani at Microsoft Research

Paper: arxiv.org/abs/2306.10763

MGD consistently improves compilation rate and ground truth match across LM sizes.
Interestingly, smaller LMs with MGD can surpass larger LMs: SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003📈

MGD complements existing code generation techniques, such as retrieval-augmented, static analysis-based prompting, architecture changes, and fill-in-the-middle decoding (see sections 4.2 & 4.3). Plus, no LM fine-tuning is required!🌟

To evaluate MGD, We curate two datasets:
PragmaticCode: Real-world open-source projects, complete with development environments & dependencies.
DotPrompts: Method-level completion tasks, derived from prompts in PragmaticCode📊
Code & dataset at: aka.ms/monitors4codegen

BingUne merveille entre ciel et merLe Mont Saint-Michel, fier et majestueux, se dress
Lakshya A Agrawal

Learn more about our work in the paper: arxiv.org/abs/2306.10763 📄

arXiv.orgGuiding Language Models of Code with Global Context using MonitorsLanguage models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating. Integrated development environments (IDEs) assist developers in understanding repository context using static analysis. We extend this assistance, enjoyed by developers, to LMs. We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding. We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it. On models of varying parameter scale, by monitoring for type-consistent object dereferences, MGD consistently improves compilation rates and agreement with ground truth. Further, LMs with fewer parameters, when augmented with MGD, can outperform larger LMs. With MGD, SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003 model. We also conduct a generalizability study to evaluate the ability of MGD to generalize to multiple programming languages (Java, C# and Rust), coding scenarios (e.g., correct number of arguments to method calls), and to enforce richer semantic constraints (e.g., stateful API protocols). Our data and implementation are available at https://github.com/microsoft/monitors4codegen .
#arxiv#cscl#cl