ETH AI Digest: #3
AI-powered data cleaning, functional scene understanding, and automated tool testing for more reliable agents
In this week’s digest:
LLMs for Tabular Data Cleaning — Researchers test how language models identify and fix dataset errors, revealing strengths with simple problems but limitations with complex patterns
Open-Vocabulary Functional 3D Scene Graphs — A new approach transforms indoor spaces with interactive elements and functional relationships using foundation models, moving beyond traditional spatial-only representations
ToolFuzz for Agent Tool Testing — This innovative testing framework identifies documentation errors in AI agent tools, detecting 20× more problems than baselines while reducing false positives by 45%
Selected Papers of the Week
1. Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets
LLMs can clean tabular data with limitations.

✍️ Authors: Tommaso Bendinelli, Artur Dox, Christian Holz
🏛️ Lab: Sensing, Interaction & Perception Lab
⚡ Summary
This paper investigates whether Large Language Models can automate the tedious task of cleaning tabular datasets for machine learning.
Through experiments on corrupted Kaggle datasets, the authors find LLMs can identify and fix simple errors in individual rows but struggle with complex patterns spanning multiple rows.
While providing hints improves performance, no model achieves the maximum potential improvement possible with complete error correction.
The work introduces a framework for evaluating LLMs' data cleaning capabilities and identifies key limitations that future research should address.
2. Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Extending 3D scene graphs with interactive elements and functional relationships using foundation models.

✍️ Authors: Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang, Ruida Zhang, Xiangyang Ji, Marc Pollefeys, Francis Engelmann
🏛️ Lab: Computer Vision and Geometry Lab
⚡ Summary
Traditional 3D scene graphs only capture spatial relationships between objects, limiting their utility for interaction tasks.
This paper introduces functional 3D scene graphs that model objects, interactive elements, and their functional relationships using foundation models to overcome limited training data.
The proposed OpenFunGraph pipeline significantly outperforms existing methods on both the extended SceneFun3D dataset and a newly collected FunGraph3D dataset.
Applications include 3D question answering and robotic manipulation, demonstrating the versatility of functional scene graphs for complex reasoning about indoor environments.
3. ToolFuzz -- Automated Agent Tool Testing
Finding documentation errors in AI agent tools through fuzzing and consistency checks.

✍️ Authors: Ivan Milev, Mislav Balunović, Maximilian Baader, Martin Vechev
🏛️ Lab: Secure, Reliable and Intelligent Systems Lab
⚡ Summary
LLM agents often fail due to poorly specified tool documentation, but no automated testing methods existed until now.
TOOLFUZZ combines fuzzing with LLM-based generation to detect runtime errors and uses synonymous prompts with cascading checks for correctness failures.
The method identifies 20× more erroneous inputs than baseline approaches while reducing false positives by 45%.
Fixing documentation based on TOOLFUZZ findings improved agent performance by approximately 10% in benchmark tests.
Other noteworthy articles
OpenCity3D: What do Vision-Language Models know about Urban Environments?: Extracting urban insights from aerial 3D reconstructions using vision-language models
SplatVoxel: History-Aware Novel View Streaming without Temporal Training: Combining Gaussian splatting with voxel grids to create temporally consistent novel views from sparse video inputs
Deblur Gaussian Splatting SLAM: Modeling camera motion blur enables high-quality 3D reconstruction despite fast camera movements