Accueil
Équipe
Publications
Open Source
Démos
Évènements
Blog
Carrières
Nous joindre
Français
Français
English
ServiceNow
ServiceNow recherche
Tags
Dataset
ServiceNow recherche
Dataset
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …
Juan A. Rodriguez
,
Abhay Puri
,
Shubham Agarwal
,
Issam H. Laradji
,
Pau Rodriguez
,
Sai Rajeswar Mudumba
,
David Vazquez
,
Christopher Pal
,
Marco Pedersoli
Computer Vision and Pattern Recognition (CVPR), 2025.
PDF
Citation
Code
Vidéo
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision
This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to …
Diego Velazquez
,
Pau Rodriguez
,
Sergio Alonso
,
Josep M. Gonfaus
,
Jordi Gonzalez
,
Gerardo Richarte
,
Javier Marin
,
Yoshua Bengio
,
Alexandre Lacoste
Workshop at the Winter Conference on Applications of Computer Vision (WACV), 2025.
PDF
Citation
Code
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …
Juan A. Rodriguez
,
Abhay Puri
,
Shubham Agarwal
,
Issam H. Laradji
,
Pau Rodriguez
,
Sai Rajeswar Mudumba
,
David Vazquez
,
Christopher Pal
,
Marco Pedersoli
AAAI Demos, 2025.
PDF
Citation
Vidéo
BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks
Vision and language models that can accurately understand both images and text are crucial for deeper document understanding. These …
Juan A. Rodriguez
,
Xiangru Jian
,
Siba Smarak Panigrahi
,
Tianyu Zhang
,
Aarash Feizi
,
Abhay Puri
,
Akshay Kalkunte
,
Francois Savard
,
Amirhossein Abaskohi
,
Ahmed Masry
,
Shravan Nayak
,
Mahsa Massoud
,
Rabiul Awal
,
Pierre-André Noël
,
Mats L. Richter
,
Saverio Vadacchino
,
Shubham Agarwal
,
Sanket Biswas
,
Ying Zhang
,
Sathwik Tejaswi Madhusudhan
,
João Monteiro
,
Krishnamurthy (Dj) Dvijotham
,
Torsten Scholak
,
Nicolas Chapados
,
Sean Hughes
,
Tamer Özsu
,
Aishwarya Agrawal
,
Marco Pedersoli
,
Christopher Pal
,
Perouz Taslakian
,
David Vazquez
,
Issam H. Laradji
,
Spandana Gella
,
Sai Rajeswar Mudumba
Workshop at the Neural Information Processing Systems (NeurIPS), 2024.
PDF
Citation
Code
Vidéo
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data …
João Monteiro
,
Pierre-André Noël
,
Étienne Marcotte
,
Sai Rajeswar Mudumba
,
Valentina Zantedeschi
,
David Vazquez
,
Nicolas Chapados
,
Christopher Pal
,
Perouz Taslakian
NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2024.
PDF
Citation
Code
Vidéo
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks …
Andrew Williams
,
Arjun Ashok
,
Étienne Marcotte
,
Valentina Zantedeschi
,
Jithendaraa Subramanian
,
Roland Riachi
,
James Requeima
,
Alexandre Lacoste
,
Irina Rish
,
Nicolas Chapados
,
Alexandre Drouin
Montreal AI Symposium (MAIS), 2024.
PDF
Citation
Code
StarCoder 2 and The Stack v2: The Next Generation
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code …
Anton Lozhkov
,
Raymond Li
,
Loubna Ben Allal
,
Federico Cassano
,
Joel Lamy Poirier
,
Nouamane Tazi
,
Ao Tang
,
Dmytro Pykhtar
,
Jiawei Liu
,
Yuxiang Wei
,
Tianyang Liu
,
Max Tian
,
Denis Kocetkov
,
Arthur Zucker
,
Younes Belkada
,
Zijian Wang
,
Dmitry Abulkhanov
,
Indraneil Paul
,
Zhuang Li
,
Wen-Ding Li
,
Megan Risdal
,
Jia Li
,
Terry Yue Zhuo
,
Nii Osae Osae Dade
,
Lucas Krauß
,
Naman Jain
,
Yixuan Su
,
Xuanli He
,
Edoardo Abati
,
Yekun Chai
,
Xiangru Tang
,
Christopher Akiki
,
Chenghao Mou
,
Binyuan Hui
,
Nicolas Patry
,
Canwen Xu
,
Julian McAuley
,
Han Hu
,
Torsten Scholak
,
Sébastien Paquet
,
Jennifer Robinson
,
Carolyn Jane Anderson
,
Nicolas Chapados
,
Mostofa Patwary
,
Nima Tajbakhsh
,
Yacine Jernite
,
Carlos Muñoz Ferrandis
,
Lingming Zhang
,
Sean Hughes
,
Thomas Wolf
,
Arjun Guha
,
Leandro von Werra
,
Harm de Vries
,
Alex Gu
,
Armel Zebaze
,
Evgenii Zheltonozhskii
,
Jian Zhu
,
Manan Dey
,
Marc Marone
,
Mayank Mishra
,
Muhtasham Oblokulov
,
Olivier Dehaene
,
Qian Liu
,
Tri Dao
,
Wenhao Yu
,
Niklas Muennighoff
ArXiv, 2024.
PDF
Citation
Code
Vidéo
The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents
We introduce the StatCan Dialogue Dataset consisting of 4967 conversations between agents working at Statistics Canada and online users …
Xing Han Lu
,
Siva Reddy
,
Harm de Vries
European Chapter of the Association for Computational Linguistics (EACL), 2023.
PDF
Citation
The Stack: 3 TB of permissively licensed source code
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (AI)–not only for natural …
Denis Kocetkov
,
Raymond Li
,
Loubna Ben Allal
,
Jia Li
,
Chenghao Mou
,
Carlos Muñoz Ferrandis
,
Yacine Jernite
,
Margaret Mitchell
,
Sean Hughes
,
Thomas Wolf
,
Dzmitry Bahdanau
,
Leandro von Werra
,
Harm de Vries
Transactions on Machine Learning Research (TMLR), 2022.
PDF
Citation
Code
Kubric: A scalable dataset generator
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the …
Klaus Greff
,
Francois Belletti
,
Lucas Beyer
,
Carl Doersch
,
Yilun Du
,
Daniel Duckworth
,
David J. Fleet
,
Dan Gnanapragasam
,
Florian Golemo
,
Charles Herrmann
,
Thomas Kipf
,
Abhijit Kundu
,
Dmitry Lagun
,
Issam H. Laradji
,
Hsueh-Ti (Derek)Liu
,
Henning Meyer
,
Yishu Miao
,
Derek Nowrouzezahrai
,
Cengiz Oztireli
,
Etienne Pot
,
Noha Radwan
,
Daniel Rebain
,
Sara Sabour
,
Mehdi S. M. Sajjadi
,
Matan Sela
,
Vincent Sitzmann
,
Austin Stone
,
Deqing Sun
,
Suhani Vora
,
Ziyu Wang
,
Tianhao Wu
,
Kwang Moo Yi
,
Fangcheng Zhong
,
Andrea Tagliasacchi
Computer Vision and Pattern Recognition (CVPR), 2022.
PDF
Citation
Code
»
Citation
×