AdaptFM

Call for Papers

Excited to announce our AdaptFM workshop taking place in Seoul, Korea, alongside ICML’26! More information about the submission portal will be coming out soon.

Resource-Adaptive Foundation Model Inference (AdaptFM)

Foundation models (FMs) have achieved remarkable capabilities across language, vision, and multimodal tasks. However, their inference typically follows a rigid, one-size-fits-all paradigm where every input, regardless of complexity, passes through the same fixed architecture with identical computational cost. This inflexibility creates a fundamental mismatch between the diverse resource budgets encountered in real-world deployments and the static nature of model inference.

Adaptation can take many forms: compressing models to meet deployment budgets, designing flexible architectures that support multiple configurations from a single trained model, or making dynamic runtime decisions based on input complexity or resource availability. The central question we explore is: How can foundation model inference flexibly adapt to any resource budget, whether constrained by memory, compute, latency, energy, or cost, while maximizing output quality?

This challenge spans across algorithms, architectures, and systems. We aim to bring together researchers from ML, systems, and hardware communities to advance techniques that move beyond rigid inference toward flexible, resource-aware foundation models. We welcome contributions in the following areas:

Flexible & Elastic Architectures: Once-for-all and supernet approaches, slimmable networks, Matryoshka representation learning, MatFormer and elastic transformers, layer skipping, modular architectures
Adaptive Test-Time Compute: Input-adaptive inference, adaptive reasoning, universal transformer & recursive inference
Dynamic Networks & Routing: Early-exit & adaptive-depth networks, MoEs & conditional computation, cascade & routing systems
Efficient Decoding & Token-Level Adaptation: Speculative & parallel decoding, token pruning and merging, adaptive attention mechanisms, dynamic KV cache compression
Model Compression & Optimization: Quantization, pruning, and sparsity, knowledge distillation, low-rank decomposition, hardware-aware model optimization
Systems & Hardware: H/W–S/W co-design for adaptive inference, runtime systems for flexible computation, benchmarking and profiling across resource budgets
Analysis & Trade-offs: Quality-resource tradeoff analysis, energy-efficient and sustainable inference, emerging applications of adaptive inference

We hope that AdaptFM will serve as a forum for researchers across different disciplines to bring forward and discuss challenging topics, share new ideas, and exchange experience in building flexible, resource-aware foundation models — both from a theoretical and experimental perspective.

The workshop is co-located with ICML’26, held in Seoul, South Korea.

Submission portal: openreview.net

Submission deadline: May 1, 2026, AoE

Call for Papers Poster