9

SantaCoder: don't reach for the stars!

SantaCoder: don't reach for the stars!

The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This …

Harm de Vries, Raymond Li, Joel Lamy Poirier, Dzmitry Bahdanau, Denis Kocetkov, Sean Hughes

Workshop at the International Conference on Learning Representations (ICLR), 2023.

Leveraging Human Preferences to Master Poetry

Leveraging Human Preferences to Master Poetry

Large language models have been fine-tuned to learn poetry via supervised learning on a dataset containing relevant examples. However, …

Rafael Pardinas, Gabriel Huang, David Vazquez, Alexandre Piche

AAAI Workshops, 2023.

In-Context Learning for Text Classification with Many Labels

In-Context Learning for Text Classification with Many Labels

In-context learning (ICL) using large language models for tasks with many labels is challenging due to the limited context window, …

Aristides Milios, Dzmitry Bahdanau, Siva Reddy

Workshop at the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.

On the Compositional Generalization Gap of In-Context Learning

On the Compositional Generalization Gap of In-Context Learning

Pretrained large generative language models have shown great performance on many tasks, but exhibit low compositional generalization …

Dzmitry Bahdanau, Arian Hosseini, Aaron Courville, Alessandro Sordoni, Ankit Vani

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.

Attention for Compositional Modularity

Modularity and compositionality are promising inductive biases for addressing longstanding problems in machine learning such as better …

Oleksiy Ostapenko, Alexandre Lacoste, Pau Rodriguez

Workshop at the Neural Information Processing Systems (NeurIPS), 2022.

A General Purpose Neural Architecture for Geospatial Systems

A General Purpose Neural Architecture for Geospatial Systems

Geospatial Information Systems are used by researchers and Humanitarian Assistance and Disaster Response (HADR) practitioners to …

Alexandre Lacoste, Martin Weiss, Nasim Rahaman

Workshop at the Neural Information Processing Systems (NeurIPS), 2022.

Breadth-First Pipeline Parallelism

We introduce Breadth-First Pipeline Parallelism, a novel training schedule which optimizes the combination of pipeline and data …

Joel Lamy Poirier

Workshop at the Neural Information Processing Systems (NeurIPS), 2022.

Can large language models build causal graphs?

Building causal graphs can be a laborious process. To ensure all relevant causal pathways have been captured, researchers often have to …

Tibor Schuster, Stephanie Long, Alexandre Piche

Workshop at the Neural Information Processing Systems (NeurIPS), 2022.

Choreographer: Learning and Adapting Skills in Imagination

Choreographer: Learning and Adapting Skills in Imagination

Unsupervised skill learning aims to learn a rich repertoire of behaviors without external supervision, providing artificial agents with …

Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Alexandre Lacoste, Sai Rajeswar Mudumba

Workshop at the Neural Information Processing Systems (NeurIPS), 2022.

Constraining Low-level Representations to Define Effective Confidence Scores

Constraining Low-level Representations to Define Effective Confidence Scores

Neural networks are known to fail with high confidence, especially for data that somehow differs from the training distribution. Such …

João Monteiro, Pau Rodriguez, Pierre-André Noël, Issam H. Laradji, David Vazquez

Workshop at the Neural Information Processing Systems (NeurIPS), 2022.