PICARD algorithm achieves state-of-the-art performance for text-to-SQL

PICARD integration with beam search for text-to-SQL

This article introduces the current state-of-the-art research in text-to-SQL and is based on the following paper:

Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau. PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).

If you'd like to explore the supporting code, visit https://github.com/ElementAI/picard.

What is text-to-SQL and why does it matter?

Digital transformation has resulted in an explosion of data, with most of this data stored in relational databases. In enterprises, these databases constitute a gold mine of untapped knowledge, but retrieving the correct data typically requires knowledge of how to formulate queries in Structured Query Language (SQL). Writing valid SQL queries is too advanced for most knowledge workers; thus, the value of the data remains untapped.

What is the average life expectancy in the countries where English is not the official language?

Natural language interfaces, such as conversational agents and search engines, allow people to access and interact with data directly by expressing queries in natural language without needing to know how to write the query in SQL.

Benefits of a high-performing text-to-SQL solution

Text-to-SQL interfaces can help employees and customers search and analyze structured data stored in one or more traditional relational databases. One of the appeals of text-to-SQL technology is that the system can also handle questions that are not as common, such as those for which traditional automation and user interfaces do not always scale. Many use cases span the entire business.

ServiceNow offers several user-friendly ways for users to interact with myriad enterprise systems. These include Virtual Agent (an enterprise chatbot solution) and search functionality. Once equipped with a robust text-to-SQL system, these interfaces could enable a company’s users to make powerful and complex information requests in plain language (English and others) and receive relevant information directly within those interfaces.

ServiceNow also enables users to continuously monitor situations by creating persistent dashboards and periodic reports based on data retrieval. This is currently done by pointing and clicking through the database structure and/or inputting some code. A natural language interface could enable a user to simply say what data they want to see in the dashboard or report (and perhaps where it should be placed or even how it should be visualized), and it would be set up automatically.

Here are a few examples of user requests that could benefit from text-to-SQL support:

A sales manager for a virtual reality (VR) software company asks a chatbot for a report that lists the top 10% most active users in New York City who have signed up for a free trial to manage their personalized avatars and digitized voice profiles and who have started related discussions in the online support forums, grouped by positive and negative sentiment and sorted from negative to positive sentiment score in each group.
A customer browsing a virtual pet store in the Metaverse using a VR headset asks the virtual agent, “What products are new and trending to train large dogs over 50 kilograms to walk with a leash without pulling?”
A marketing analyst uses a chatbot to ask, “How many customers engaged with a virtual agent before they signed up with a credit card in the last month, and what percentage of those converted to the premium service with annual renewal?”
An IT director wants to create a list of all employees in London who received both a company laptop and tablet in the last three years, with results grouped by whomever the employee reports to and sorted alphabetically by last name, then first name.
A customer service analyst wants to search for customers in Montréal who purchased snowshoes last winter, left a positive online review of the store of at least 4 stars, and complimented the shopping experience.

Introducing PICARD

Researchers from ServiceNow have recently developed a state-of-the-art algorithm for constrained decoding called PICARD. When we combine this algorithm with language models, we get a new method to automatically convert text into properly constructed SQL queries.

The system can work on databases it has never seen before. It relies on the structure and the content of the database as well as the knowledge encapsulated in the T5 language model. It is significantly more accurate than the prior state of the art, as evidenced by ServiceNow’s top rankings in the Spider and CoSQL challenges.

PICARD integration with beam search: The algorithm stops when all the hypotheses are terminated or when the maximum number of tokens has been reached.

What are Spider and CoSQL?

Researchers at Yale University developed the Spider and CoSQL challenges to compare research teams’ results fairly, on a uniform set of text-to-SQL tasks.

Spider consists of 200 databases with multiple tables spanning 138 domains, 10,181 questions in natural language about the content of those databases, and 5,693 corresponding complex SQL queries. CoSQL is the dialogue version of the Spider challenge. It consists of more than 30,000 turns (dialogue steps) and more than 10,000 annotated SQL queries.

How did PICARD fare in the challenges?

When combined with a large language model called T5-3B, PICARD produced a test accuracy of 75.1% on the Spider challenge using database values, well ahead of the next four contenders, whose scores ranged from 68% to 71%. This means that using PICARD, three out of four user questions were answered correctly on unfamiliar databases without any additional training examples.

On the CoSQL dialogue state tracking challenge, PICARD obtained accuracy scores of 54.6% and 23.7%, respectively, on the question match and interaction match metrics, while the next contender scored 51.6% and 21.2% on those same metrics. PICARD thus established a new standard for those tasks.

Going beyond with new dialogue interfaces

Translating phrases to database queries is a great first step, but it’s also a stepping stone to more innovations. There are many possibilities.

We’ve already mentioned using natural language to specify how data should be formatted or visualized. A system might be able to infer the formats that are most likely to be appropriate based on the type of data (geographic information, time series, etc.).

One common problem in an enterprise is that people might not know which database(s) contains the information they need. An enterprise AI system might be able to identify which database is relevant to a given request before writing the appropriate query.

CoSQL is a dialogue state tracking task, and breakthroughs are already being made in creating dialogue interfaces in which a user can iteratively refine their query. This would enable a user to construct even more complex queries and inspect results along the way. In this conversation, intermediate queries could be given shorthand names for future use.

Another idea is to allow natural language queries to be used as is, in lieu of code, in scripts. This would result in more readable code without sacrificing functionality.

Although natural language is the best way to specify some things, sometimes it’s easier to point at things. Users can get where they want to go most efficiently using interfaces that combine natural language with pointing and clicking, making the best use of both modalities.

Low-code/no-code text-to-code possibilities

Text-to-SQL is a relatively specific example of a general class of tasks called program synthesis. Research is underway on how to produce other kinds of code—be it JavaScript, Python, or Java—from natural language, describing what the code needs to do and including a few examples. This has the potential to result in powerful low-code/no-code platforms that enable non-programmers to produce working apps.

There is also potential to accelerate programmer work when writing code that is pattern-based or relatively routine (e.g., fetch this from that database, do this transform, export the result over there). Code-generating auto-complete functionality based on the last several lines of code written is one way to get there.

There’s no reason why AI systems cannot go in the other direction, from code to natural language. An example of this is automatic documentation and generation of comments that make code more approachable.

Finally, a fundamental question arises in this emerging context of human-machine interaction that involves higher-order thinking: How can human and machine verify that they have understood each other? One possibility is to run the system back and forth—from text to code and then back to text—which amounts to a rephrasing of the original request by the machine.

Learn more

This is an exciting area of research that will impact the future of work and is already making inroads. Get in touch through our GitHub project if you would like to learn more.