Abstract
Vision and language models that can accurately understand both images and text are crucial for deeper document understanding. These models can efficiently perform enterprise-level tasks, such as receipt processing from screenshots, website and business workflow generation from sketches, and extracting information from structured documents. These tasks often require generating long, structured outputs, an area where models trained on current datasets struggle. Additionally, many existing datasets are not license-permissive, limiting their use to non-commercial applications. To address these limitations, we present BigDocs, a high-quality, specifically curated dataset to train license-permissive Vision and Language Models (VLMs) capable of performing a wide variety of tasks. This dataset focuses on acquiring accurate image-text pairs across diverse tasks while adhering to accountability, responsibility, and transparency (ART) standards. Our preliminary experiments demonstrate that pre-training with BigDocs yields performance boosts in document reasoning and tasks requiring long structured outputs such as screenshot-to-HTML, table-to-Latex, or image-to-SVG. We believe that VLMs trained on BigDocs have the potential to enhance multimodal capabilities significantly, benefiting broader research in multimodal document understanding.
Publication
Workshop at the Neural Information Processing Systems (NeurIPS)
Visiting Researcher
Visiting Researcher at AI Frontier Research located at Montreal, QC, Canada.
Visiting Researcher
Visiting Researcher at AI Frontier Research located at Montreal, QC, Canada.
Visiting Researcher
Visiting Researcher at AI Frontier Research located at Montreal, QC, Canada.
Applied Research Scientist
Applied Research Scientist at AI Research Deployment located at Montreal, QC, Canada.
Visiting Researcher
Visiting Researcher at AI Frontier Research located at Vancouver, BC, Canada.
Visiting Researcher
Visiting Researcher at AI Frontier Research located at Toronto, ON, Canada.
Visiting Researcher
Visiting Researcher at AI Frontier Research located at Montreal, QC, Canada.
Visiting Researcher
Visiting Researcher at AI Frontier Research located at Montreal, QC, Canada.
Research Scientist
Research Scientist at AI Frontier Research located at Montreal, QC, Canada.
Research Lead
Research Lead at AI Research Deployment located at Montreal, QC, Canada.
VP of Research
VP of Research at AI Research Management located at Montreal, QC, Canada.
AI Ecosystem Director
AI Ecosystem Director at AI Research Partnerships & Ecosystem located at San Diego, CA, USA.
Distinguished Scientist
Distinguished Scientist at AI Research Partnerships & Ecosystem located at Montreal, QC, Canada.
Research Lead
Research Lead at AI Frontier Research located at Montreal, QC, Canada.
Director of AI Research
Director of AI Research at AI Research Management located at Montreal, QC, Canada.
Research Manager
Research Manager at AI Frontier Research located at Vancouver, BC, Canada.
Research Manager
Research Manager at AI Frontier Research located at Montreal, QC, Canada.