AdaptFM

Call for Papers

Excited to announce our AdaptFM workshop taking place in Seoul, Korea, alongside ICML’26! Submissions are now open! Deadline ~~May 1~~ May 8, AoE.

Announcing our Call for Competition Submissions

We are announcing our Efficient Qwen Competition. Submission portal opens ~~May 6~~ May 8.

Resource-Adaptive Foundation Model Inference (AdaptFM)

Foundation models (FMs) have achieved remarkable capabilities across language, vision, and multimodal tasks. However, their inference typically follows a rigid, one-size-fits-all paradigm where every input, regardless of complexity, passes through the same fixed architecture with identical computational cost. This inflexibility creates a fundamental mismatch between the diverse resource budgets encountered in real-world deployments and the static nature of model inference.

Adaptation can take many forms: compressing models to meet deployment budgets, designing flexible architectures that support multiple configurations from a single trained model, or making dynamic runtime decisions based on input complexity or resource availability. The central question we explore is: How can foundation model inference flexibly adapt to any resource budget, whether constrained by memory, compute, latency, energy, or cost, while maximizing output quality?

Paper Submissions

This challenge spans across algorithms, architectures, and systems. We aim to bring together researchers from ML, systems, and hardware communities to advance techniques that move beyond rigid inference toward flexible, resource-aware foundation models. We welcome contributions in the following areas:

Flexible & Elastic Architectures: Once-for-all and supernet approaches, slimmable networks, Matryoshka representation learning, MatFormer and elastic transformers, layer skipping, modular architectures
Adaptive Test-Time Compute: Input-adaptive inference, adaptive reasoning, universal transformer & recursive inference
Dynamic Networks & Routing: Early-exit & adaptive-depth networks, MoEs & conditional computation, cascade & routing systems
Efficient Decoding & Token-Level Adaptation: Speculative & parallel decoding, token pruning and merging, adaptive attention mechanisms, dynamic KV cache compression
Model Compression & Optimization: Quantization, pruning, and sparsity, knowledge distillation, low-rank decomposition, hardware-aware model optimization
Systems & Hardware: H/W–S/W co-design for adaptive inference, runtime systems for flexible computation, benchmarking and profiling across resource budgets
Analysis & Trade-offs: Quality-resource tradeoff analysis, energy-efficient and sustainable inference, emerging applications of adaptive inference

We hope that AdaptFM will serve as a forum for researchers across different disciplines to bring forward and discuss challenging topics, share new ideas, and exchange experience in building flexible, resource-aware foundation models — both from a theoretical and experimental perspective.

The workshop is co-located with ICML’26, held in Seoul, South Korea.

Submission portal: openreview.net

Submission deadline: May 8, 2026, AoE

Efficient Qwen Competition

Alongside the workshop, we are running the Efficient Qwen Competition: a challenge to minimize inference latency for Qwen3.5-4B on a single NVIDIA A10G GPU while maintaining model quality. Quantize it, prune it, rewrite the kernels: anything goes. A total of $6,000 in prizes will be awarded, with top teams invited to present at ICML 2026 in Seoul. Submissions open May 8 and close June 8, 2026. Learn more and get started.

Call for Papers Competition Poster Competition Poster