Announcing BigCode for the responsible development of large language models

  • 2022
  • ServiceNow Research
September 26, 2022

BigCode by ServiceNow Research and Hugging Face

We’re excited to announce the BigCode project, led by ServiceNow Research and Hugging Face. In the spirit of the BigScience initiative,1 we aim to develop state-of-the-art large language models (LLMs) for code in an open and responsible way.

Code LLMs enable the completion and synthesis of code, both from other code and natural language descriptions, and work across a wide range of domains, tasks, and programming languages. These models can assist professional and citizen developers with coding new applications.

BigCode invites AI researchers to collaborate on the following topics: 

  • A representative evaluation suite for code LLMs covering a diverse set of tasks and programming languages

  • Responsible development and governance of data sets for code LLMs

  • Faster training and inference methods for LLMs

The first goal of BigCode is to develop and release a data set large enough to train a state-of-the-art language model for code. We’ll ensure that only files from repositories with permissive licenses go into the data set.

With that data set, we’ll train a 15-billion-parameter language model for code using ServiceNow’s in-house GPU cluster. With an adapted version of Megatron-LM, we’ll train the LLM on the distributed infrastructure.

Once the model is trained, we’ll evaluate its capabilities. While there are numerous benchmarks available for natural language processing (NLP), the landscape of benchmarks suited for code is much sparser. We’ll strive to make evaluation easier and broader so that we can learn more about the model’s capabilities.

Academic research usually stops after evaluation; this is where the work for practical applications starts. Inference speed is crucial for applications such as autocompletion. We’re interested in making architectural changes and devising tools for post-training optimization.

We’ll follow, as well as establish, responsible AI practices to train and share LLMs. We’ll uphold the principles of openness and transparency in the LLM development process. Experiments can be expensive and take a long time to run, so we’ll share the scientific plan with participants to solicit feedback before we execute it.

AI practitioners from diverse backgrounds are invited to join the BigCode project. The invitation is open to those who have a professional AI research background and can commit time to the project.

In general, we expect applicants to be affiliated with a research organization (either in academia or industry) and work on the technical/ethical/legal aspects of LLMs for coding applications.

Learn more about the project on the official website, and join the conversation on Twitter @BigCodeProject.

1 The BigScience initiative is a scientific collaboration that culminated in July 2022 with the release of BLOOM, the world’s largest open multilingual language model.

© 2022 ServiceNow, Inc. All rights reserved. ServiceNow, the ServiceNow logo, Now, and other ServiceNow marks are trademarks and/or registered trademarks of ServiceNow, Inc. in the United States and/or other countries. Other company names, product names, and logos may be trademarks of the respective companies with which they are associated.

Topics

  • Now Platform Utah release: Arches National Park, Grand County, Utah
    Now Platform
    Welcome to the Now Platform Utah release!
    New innovations in the Now Platform Utah release arm organizations with tools to protect the future of business. Find out more about the latest updates.
  • HR digital transformation: 3 employees collaborate over a conference table
    Employee Experience
    3 ways to supercharge HR digital transformation
    It’s never been more important for HR leaders to make a strong business case for HR digital transformation. Learn three ways to drive success.
  • ERP workflows: 2 co-workers looking at and pointing to a laptop
    Employee Experience
    How prebuilt ERP workflows tame unruly source-to-pay operations
    Prebuilt ERP workflows can help connect the work of users across departments and functions. Find out how they work and how organizations benefit.

Trends & Research

  • RPA: group of workers gathered around a conference table looking at a laptop
    AI and Automation
    Forrester report: ServiceNow debuts as a Strong Performer in RPA
  • Digital innovation: three workers looking at a computer monitor
    AI and Automation
    Survey says digital innovation is the way to navigate macro uncertainty
  • Innovation is a top management imperative: man standing in a corner office overlooking a city
    IT Management
    Survey: Innovation is a top management imperative

Year