BigCode

Sep 26, 2022

Large Language Models (LLMs) are fast becoming an essential tool for all fields of AI research. One striking feature of these large pre-trained models is that they can be adapted to a wide variety of language tasks, often with very little in-domain data.

BigCode is focused on developing state-of-the-art LLMs for code. Code LLMs enable the completion and synthesis of code, both from other code snippets and natural language descriptions, and work across a wide range of domains, tasks, and programming languages. These models can, for example, assist professional and citizen developers with coding new applications.

BigCode invites AI researchers to collaborate on the following topics: 1. A representative evaluation suite for code LLMs, covering a diverse set of tasks and programming languages 2. Responsible data governance and development for code LLMs 3. Faster training and inference methods for LLMs

The project is inspired by BigScience, an open scientific collaboration which culminated in July 2022 with the release of the World’s Largest Open Multilingual Language Model. BigCode aims to follow, as well as establish, new responsible AI practices to train and share large language models. We welcome contributions from AI researchers and strive for openness and transparency in the LLM development process.