Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Posts

Newtonian-Shampoo: Modified Newton-Schulz Adapted for Shampoo Preconditioners

less than 1 minute read

Published: October 16, 2025

Shampoo optimizer was proposed in [1]

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post.

publications

A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference

Sanghyun Hong, Yiğitcan Kaya, Ionuţ-Vlad Modoranu, Tudor Dumitraş

Published in ICLR 2021 (Spotlight 🔦)

A new adversarial attack to introduce delay in the predictions of multi-exit deep neural networks.

Error Feedback Can Accurately Compress Preconditioners

Ionut-Vlad Modoranu, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, Dan Alistarh

Published in ICML 2024

Reduce the memory usage of M-FAC optimizer via sparsity, low-rank and error feedback.

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

Ionut-Vlad Modoranu, Mher Safaryan, Grigory Malinovsky, Eldar Kurtic, Thomas Robert, Peter Richtarik, Dan Alistarh

Published in NeurIPS 2024

Reduce the memory usage of Adam optimizer via sparsity and error feedback.

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

Thomas Robert, Mher Safaryan, Ionut-Vlad Modoranu, Dan Alistarh

Published in ICLR 2025

Improved low-rank optimization for LLMs (over GaLore).

The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information

Diyuan Wu, Ionut-Vlad Modoranu, Mher Safaryan, Denis Kuznedelev, Dan Alistarh

Published in NeurIPS 2024

Theoretical guarantees for sparse, second order pruning.

Unified Scaling Laws for Compressed Representations

Andrei Panferov, Alexandra Volkova, Ionut-Vlad Modoranu, Vage Egiazarian, Mher Safaryan, Dan Alistarh

Published in Arxiv

Scaling laws for quantization and sparsity.

Optimizers Qualitatively Alter Solutions and We Should Leverage This

Razvan Pascanu, Clare Lyle, Ionut-Vlad Modoranu, Naima Elosegui Borras, Dan Alistarh, Petar Velickovic, Sarath Chandar, Soham De, James Martens

Published in Arxiv

Optimizers have been introduced and benchamrked with respect to how fast they reach a specific loss. In this work we hypothesize they might also have other effects, such as inducing certain biases.

FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models

Ionut-Vlad Modoranu, Mher Safaryan, Erik Schultheis, Max Ryabinin, Artem Chumachenko, Dan Alistarh

Published in Arxiv

FFT-based low-rank optimization for LLMs.

DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers

Ionut-Vlad Modoranu, Philip Zmushko, Erik Schultheis, Mher Safaryan, Dan Alistarh

Published in Arxiv

Faster implementation of Distributed Shampoo by stacking the preconditioner blocks into 3D tensors.

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication

Andrej Jovanović, Alex Iacob, Mher Safaryan, Ionut-Vlad Modoranu, Lorenzo Sani, William F Shen, Xinchi Qiu, Dan Alistarh, Nicholas D Lane

Published in Arxiv

A framework that unifies low-rank optimization with infrequent synchronization.

Ionut Modoranu

Sitemap

Pages

Posts

Newtonian-Shampoo: Modified Newton-Schulz Adapted for Shampoo Preconditioners

Blog Post number 2

Blog Post number 1

publications

A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference

Error Feedback Can Accurately Compress Preconditioners

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information

Unified Scaling Laws for Compressed Representations

Optimizers Qualitatively Alter Solutions and We Should Leverage This

FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models

DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication