ETH AI Digest: #14

AI surgical simulation, distributed training accelerates 33.6%, scanline camera pose estimation

Marco

Jul 03, 2025

In this week's digest:

AI-Powered Surgical Training — New simulation platform enables AI agents to master ultrasound-guided orthopedic procedures through physics-based training environments.
Breaking Datacenter Boundaries — Novel pipeline scheduling system reduces AI model training time by 33.6% across distributed datacenters through optimized resource allocation.
Smart Camera Pose Detection — Innovative approach enables precise camera position estimation using single scanlines, advancing 3D reconstruction capabilities for rolling shutter cameras.

Selected Papers of the Week

1. SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound

SonoGym enables efficient training of AI agents for ultrasound-guided orthopedic surgical tasks.

Refer to caption — SonoGym provides model-based and learning-based ultrasound (US) simulation using 3D label map and CT scans from real patient datasets. Tasks in SonoGym include US navigation, anatomy reconstruction, and US-guided robotic surgery. SonoGym enables benchmarking of various algorithms, including reinforcement learning (RL), safe RL, vision transformer, and diffusion policy.

✍️ Authors: Yunke Ao, Masoud Moghani, Mayank Mittal, Manish Prajapat, Luohong Wu, Frederic Giraud, Fabio Carrillo, Andreas Krause, Philipp Fürnstahl

🏛️ Lab: Learning & Adaptive Systems Group

⚡ Summary

SonoGym addresses the lack of realistic simulation environments for training AI agents in robotic ultrasound-guided orthopedic surgery.

This scalable platform provides physics-based and generative ultrasound simulation from CT scans, supporting navigation, bone reconstruction, and surgical drilling tasks.

Reinforcement learning agents trained in SonoGym outperform traditional approaches, with learned bone reconstruction trajectories showing higher efficiency than heuristic methods.

While demonstrating good generalization across ultrasound noise conditions, the system identifies challenges in adapting to different patient anatomies.

👉 Read the full paper

🌐 Project page

💻 GitHub code

🤗 Dataset

2. CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training

CrossPipe: Reducing training time across distributed datacenters by optimizing pipeline schedules for network constraints.

✍️ Authors: Tiancheng Chen, Ales Kubicek, Langwen Huang, Torsten Hoefler

🏛️ Lab: Scalable Parallel Computing Lab

⚡ Summary

Training large language models increasingly requires resources beyond a single datacenter, but geographic distribution introduces significant communication inefficiencies.

CrossPipe optimizes model training across distributed datacenters by explicitly modeling network latency and bandwidth limitations, generating optimized pipeline schedules using either solver-based or greedy algorithms.

The framework reduces training time by up to 33.6% compared to traditional pipeline schedules under identical memory constraints, and can approach the efficiency of single-datacenter training when memory constraints are relaxed.

CrossPipe offers improved scalability and resource utilization, particularly in environments with high network latency or limited bandwidth.

👉 Read the full paper

💻 GitHub code

3. Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras

Estimating camera poses from line projections without motion models for rolling shutter cameras.

The authors estimate a relative pose from the intersections of the line projections with a single scanline per image (green). This allows them to estimate relative pose between rolling shutter cameras without modelling the camera motion.

✍️ Authors: Petr Hruby, Marc Pollefeys

🏛️ Lab: Computer Vision and Geometry Lab

⚡ Summary

This paper introduces a novel approach for estimating relative pose between rolling shutter cameras using line projections onto single scanlines, eliminating the need for explicit motion modeling.

The method transforms the problem into minimal cases with varying configurations (parallel lines, vertical lines, gravity prior) that can be solved using techniques from projective geometry.

While not all camera poses are accurately estimated, experiments on synthetic and real data show the approach can reliably estimate at least one correct pose per sequence, sufficient for initializing structure-from-motion.

This provides a foundation for rolling-shutter 3D reconstruction where each scanline's pose can be computed independently.

👉 Read the full paper

ETH AI Digest

ETH AI Digest: #14

AI surgical simulation, distributed training accelerates 33.6%, scanline camera pose estimation

Selected Papers of the Week

1. SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound

⚡ Summary

2. CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training

⚡ Summary

3. Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras

⚡ Summary

Other noteworthy articles

Discussion about this post