ETH AI Digest: #14
AI surgical simulation, distributed training accelerates 33.6%, scanline camera pose estimation
In this week's digest:
AI-Powered Surgical Training — New simulation platform enables AI agents to master ultrasound-guided orthopedic procedures through physics-based training environments.
Breaking Datacenter Boundaries — Novel pipeline scheduling system reduces AI model training time by 33.6% across distributed datacenters through optimized resource allocation.
Smart Camera Pose Detection — Innovative approach enables precise camera position estimation using single scanlines, advancing 3D reconstruction capabilities for rolling shutter cameras.
Selected Papers of the Week
1. SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound
SonoGym enables efficient training of AI agents for ultrasound-guided orthopedic surgical tasks.

✍️ Authors: Yunke Ao, Masoud Moghani, Mayank Mittal, Manish Prajapat, Luohong Wu, Frederic Giraud, Fabio Carrillo, Andreas Krause, Philipp Fürnstahl
🏛️ Lab: Learning & Adaptive Systems Group
⚡ Summary
SonoGym addresses the lack of realistic simulation environments for training AI agents in robotic ultrasound-guided orthopedic surgery.
This scalable platform provides physics-based and generative ultrasound simulation from CT scans, supporting navigation, bone reconstruction, and surgical drilling tasks.
Reinforcement learning agents trained in SonoGym outperform traditional approaches, with learned bone reconstruction trajectories showing higher efficiency than heuristic methods.
While demonstrating good generalization across ultrasound noise conditions, the system identifies challenges in adapting to different patient anatomies.
🤗 Dataset
2. CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training
CrossPipe: Reducing training time across distributed datacenters by optimizing pipeline schedules for network constraints.
✍️ Authors: Tiancheng Chen, Ales Kubicek, Langwen Huang, Torsten Hoefler
🏛️ Lab: Scalable Parallel Computing Lab
⚡ Summary
Training large language models increasingly requires resources beyond a single datacenter, but geographic distribution introduces significant communication inefficiencies.
CrossPipe optimizes model training across distributed datacenters by explicitly modeling network latency and bandwidth limitations, generating optimized pipeline schedules using either solver-based or greedy algorithms.
The framework reduces training time by up to 33.6% compared to traditional pipeline schedules under identical memory constraints, and can approach the efficiency of single-datacenter training when memory constraints are relaxed.
CrossPipe offers improved scalability and resource utilization, particularly in environments with high network latency or limited bandwidth.
3. Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras
Estimating camera poses from line projections without motion models for rolling shutter cameras.

✍️ Authors: Petr Hruby, Marc Pollefeys
🏛️ Lab: Computer Vision and Geometry Lab
⚡ Summary
This paper introduces a novel approach for estimating relative pose between rolling shutter cameras using line projections onto single scanlines, eliminating the need for explicit motion modeling.
The method transforms the problem into minimal cases with varying configurations (parallel lines, vertical lines, gravity prior) that can be solved using techniques from projective geometry.
While not all camera poses are accurately estimated, experiments on synthetic and real data show the approach can reliably estimate at least one correct pose per sequence, sufficient for initializing structure-from-motion.
This provides a foundation for rolling-shutter 3D reconstruction where each scanline's pose can be computed independently.
Other noteworthy articles
VisualChef: Generating Visual Aids in Cooking via Mask Inpainting: Transforming cooking instructions with context-aware visual guidance through intelligent object masking
Evolving HPC services to enable ML workloads on HPE Cray EX: Seven technological solutions to bridge the gap between HPC systems and ML requirements