mrCAD: Multimodal Refinement of Computer-aided Designs
EMNLP 2025
Multimodal instructions for iteratively refining CADs (text, sketch, or both) across human games—benchmarking VLMs on refinement versus generation.
EMNLP 2025
Multimodal instructions for iteratively refining CADs (text, sketch, or both) across human games—benchmarking VLMs on refinement versus generation.
CVPR 2026 (Highlight)
AbstainEQA pairs abstention cases with OpenEQA; frontier models still trail humans on knowing when to abstain.
CogSci 2026
Neurosymbolic approach to Bongard problems: LLMs generate parameterized programmatic rules with Bayesian parameter fitting; evaluated on classification and full problem solving.
ICLR 2026
CoPiC uses LLM-generated planning programs to propose and refine plans and a trained domain-adaptive critic to pick candidates aligned with long-term rewards—stronger success with far fewer LLM queries (ALFWorld, NetHack, StarCraft II).