ETH AI Digest: #20

Constraint-satisfying neural networks, grammar-perfect code generation, and privacy law meets machine learning practice

Marco

Aug 16, 2025

In this week's digest:

Neural Networks with Hard Constraints — Πnet projects network outputs onto feasible regions using operator splitting, achieving superior performance in training time and solution quality for constrained optimization
Grammar-Perfect Code Generation — First constrained decoding method for diffusion models ensures syntactically valid C++ and JSON output with near-perfect correctness and reasonable computational overhead
Data Minimization Framework for ML — Comprehensive DMML framework bridges GDPR requirements with machine learning practices, systematically analyzing 13 techniques for privacy-preserving AI

Selected Papers of the Week

1. Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers

Πnet: Projecting neural outputs onto convex constraints for faster, more accurate optimization.

Refer to caption — Πnet architecture projects infeasible backbone network outputs onto the feasible set using operator splitting, with gradient backpropagation through the projection layer via implicit function theorem.

✍️ Authors: Panagiotis D. Grontas, Antonio Terpin, Efe C. Balta, Raffaello D'Andrea, John Lygeros

🏛️ Lab: Automatic Control Laboratory, Institute for Dynamic Systems and Control

⚡ Summary

This paper introduces Πnet, a neural network architecture that guarantees satisfaction of convex constraints by projecting outputs onto feasible regions.

Using operator splitting for forward passes and the implicit function theorem for backpropagation, Πnet achieves superior performance compared to existing methods in terms of training time and solution quality.

The approach is demonstrated on benchmark optimization problems and multi-vehicle motion planning, showing its ability to handle both convex and non-convex objectives while maintaining constraint satisfaction.

The authors provide a GPU-ready implementation with effective tuning heuristics, making Πnet accessible for real-world applications.

👉 Read the full paper

💻 Github repo

2. Constrained Decoding of Diffusion LLMs with Context-Free Grammars

Ensuring syntactically perfect code generation by constraining diffusion language models with context-free grammars.

Overview of grammar-constrained text generation. The method generates text by proposing tokens to fill incomplete regions, checking each proposal against formal grammar rules, and only accepting syntactically valid completions.

✍️ Authors: Niels Mündler, Jasper Dekoninck, Martin Vechev

🏛️ Lab: Secure, Reliable, and Intelligent Systems Lab

⚡ Summary

Current diffusion language models cannot guarantee adherence to formal languages like programming syntax, limiting their practical utility in code generation.

This paper presents the first constrained decoding method for diffusion models that handles context-free grammars, ensuring outputs like C++ code and JSON data are syntactically valid.

By reducing constrained decoding to an infilling problem and developing efficient algorithms to check language intersections, the authors achieve near-perfect syntactic correctness while improving functional correctness.

The approach maintains reasonable computational overhead (30-125%), making it practical for real-world applications in software development and structured data extraction.

👉 Read the full paper

💻 Github repo

🌐 Project page

3. SoK: Data Minimization in Machine Learning

Bridging regulatory requirements with machine learning practices through a comprehensive data minimization framework.

✍️ Authors: Robin Staab, Nikola Jovanović, Kimberly Mai, Prakhar Ganesh, Martin Vechev, Ferdinando Fioretto, Matthew Jagielski

🏛️ Lab: Secure, Reliable, and Intelligent Systems Lab

⚡ Summary

This paper addresses the disconnect between data minimization regulations (like GDPR) and machine learning practices by introducing a unified framework for Data Minimization in Machine Learning (DMML).

The authors systematically analyze 13 ML techniques that implicitly provide data minimization benefits, categorizing them along dimensions including type of minimization, points of application, and privacy guarantees.

Their framework defines actors, pipelines, adversaries, and evaluation metrics, helping practitioners understand which techniques satisfy regulatory requirements.

By bridging technical implementations with regulatory principles, this work enables more effective privacy-preserving machine learning while maintaining utility.

👉 Read the full paper

ETH AI Digest

ETH AI Digest: #20

Constraint-satisfying neural networks, grammar-perfect code generation, and privacy law meets machine learning practice

Selected Papers of the Week

1. Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers

⚡ Summary

2. Constrained Decoding of Diffusion LLMs with Context-Free Grammars

⚡ Summary

3. SoK: Data Minimization in Machine Learning

⚡ Summary

Other noteworthy articles

Discussion about this post