ETH AI Digest: #3

AI-powered data cleaning, functional scene understanding, and automated tool testing for more reliable agents

Marco

Mar 28, 2025

In this week’s digest:

LLMs for Tabular Data Cleaning — Researchers test how language models identify and fix dataset errors, revealing strengths with simple problems but limitations with complex patterns
Open-Vocabulary Functional 3D Scene Graphs — A new approach transforms indoor spaces with interactive elements and functional relationships using foundation models, moving beyond traditional spatial-only representations
ToolFuzz for Agent Tool Testing — This innovative testing framework identifies documentation errors in AI agent tools, detecting 20× more problems than baselines while reducing false positives by 45%

Selected Papers of the Week

1. Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets

LLMs can clean tabular data with limitations.

Refer to caption — The model identifies dataset errors to improve performance on held-out data. At each iteration, it runs code via IPython or submits modified datasets for scoring, continuing until reaching a token limit. The highest-scoring dataset version is designated as optimal.

✍️ Authors: Tommaso Bendinelli, Artur Dox, Christian Holz

🏛️ Lab: Sensing, Interaction & Perception Lab

⚡ Summary

This paper investigates whether Large Language Models can automate the tedious task of cleaning tabular datasets for machine learning.

Through experiments on corrupted Kaggle datasets, the authors find LLMs can identify and fix simple errors in individual rows but struggle with complex patterns spanning multiple rows.

While providing hints improves performance, no model achieves the maximum potential improvement possible with complete error correction.

The work introduces a framework for evaluating LLMs' data cleaning capabilities and identifies key limitations that future research should address.

👉 Read the full paper

2. Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

Extending 3D scene graphs with interactive elements and functional relationships using foundation models.

✍️ Authors: Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang, Ruida Zhang, Xiangyang Ji, Marc Pollefeys, Francis Engelmann

🏛️ Lab: Computer Vision and Geometry Lab

⚡ Summary

Traditional 3D scene graphs only capture spatial relationships between objects, limiting their utility for interaction tasks.

This paper introduces functional 3D scene graphs that model objects, interactive elements, and their functional relationships using foundation models to overcome limited training data.

The proposed OpenFunGraph pipeline significantly outperforms existing methods on both the extended SceneFun3D dataset and a newly collected FunGraph3D dataset.

Applications include 3D question answering and robotic manipulation, demonstrating the versatility of functional scene graphs for complex reasoning about indoor environments.

👉 Read the full paper

🌐 Project website

3. ToolFuzz -- Automated Agent Tool Testing

Finding documentation errors in AI agent tools through fuzzing and consistency checks.

Overview of TOOLFUZZ's two error detection methods: (1) fuzzing-based approach and (2) invariance-based approach using consistency checks. Prompts, tool calls, tool responses, and agent responses are denoted by p, I, O, and a respectively.

✍️ Authors: Ivan Milev, Mislav Balunović, Maximilian Baader, Martin Vechev

🏛️ Lab: Secure, Reliable and Intelligent Systems Lab

⚡ Summary

LLM agents often fail due to poorly specified tool documentation, but no automated testing methods existed until now.

TOOLFUZZ combines fuzzing with LLM-based generation to detect runtime errors and uses synonymous prompts with cascading checks for correctness failures.

The method identifies 20× more erroneous inputs than baseline approaches while reducing false positives by 45%.

Fixing documentation based on TOOLFUZZ findings improved agent performance by approximately 10% in benchmark tests.

👉 Read the full paper

💻 Github page

ETH AI Digest

ETH AI Digest: #3

AI-powered data cleaning, functional scene understanding, and automated tool testing for more reliable agents

Selected Papers of the Week

1. Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets

⚡ Summary

2. Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

⚡ Summary

3. ToolFuzz -- Automated Agent Tool Testing

⚡ Summary

Other noteworthy articles

Discussion about this post