iGenOrch: Intelligent Orchestration Framework for Multi-Model LLM Inference on Edge Platforms

ORCID

Akram: https://orcid.org/0009-0000-7134-7405; Chowdhury: https://orcid.org/0009-0008-0693-8696; Dunne: https://orcid.org/0009-0003-5285-9386; Williams: https://orcid.org/0009-0004-5660-3625; Love: https://orcid.org/0009-0003-6758-6982; Malik: https://orcid.org/0000-0003-3804-997X; Khan: https://orcid.org/0009-0007-6641-9735

MSU Affiliation

College of Agriculture and Life Sciences; Department of Agricultural and Biological Engineering; James Worth Bagley College of Engineering; nSPARC

Creation Date

2026-06-01

Abstract

The growing demand for on-device Large Language Model (LLM) inference has accelerated the deployment of AI capabilities onto constrained edge platforms. However, effective resource management remains a critical challenge, especially with multiple LLM processes run concurrently with a diverse set of requirements such as latency, memory footprints, and workload patterns. Traditional Linux kernel schedulers are designed for general-purpose workloads and fail to account for the bursty, compute- and memory-sensitive LLMs, resulting in unstable latency and inefficient CPU-memory utilization on edge devices. To address these limitations, we propose iGenOrch, a kernel-integrated orchestration framework that enables intelligent scheduling for multi-LLM workloads on constrained devices. The iGenOrch incorporates a monitoring module that captures runtime metrics, such as per-request latency, queue size, token throughput, CPU, and memory usage. This is fed directly into a modified device kernel scheduler. The iGenOrch dynamically adapts CPU affinity, process priority, concurrency levels, and batch size to approximate a multi-objective optimization problem and achieves lower latency and workload distribution across concurrent LLM instances. Experimental results demonstrate that iGenOrch significantly improves latency stability, reduces memory overhead, and improves stability under concurrent multi-LLM execution.

Publication Date

5-14-2026

Publication Title

ACMSE 2026: Proceedings of the 2026 ACM Southeast Conference

Publisher

Association for Computing Machinery

First Page

624

Last Page

269

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Rights

© 2026 Copyright held by the owner/author(s)

Share

COinS
 

Digital Object Identifier (DOI)

https://doi.org/10.1145/3746467.3801501