1

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Decoder-only Transformers often struggle with complex reasoning tasks, particularly arithmetic reasoning requiring multiple sequential …
VCR: Visual Caption Restoration
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially …
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models
Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability, referring to an LLM’s capability to …
Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection
Deployed machine learning systems require some mechanism to detect out-of-distribution (OOD) inputs. Existing research mainly focuses …
Multimodal foundation world models for generalist embodied agents
Learning generalist agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning …
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data …
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks
The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though …
Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy
This paper introduces a novel model compression approach through dynamic layer-specific pruning in Large Language Models (LLMs), …