iGenOrch: Intelligent Orchestration Framework for Multi-Model LLM Inference on Edge Platforms
ORCID
Akram: https://orcid.org/0009-0000-7134-7405; Chowdhury: https://orcid.org/0009-0008-0693-8696; Dunne: https://orcid.org/0009-0003-5285-9386; Williams: https://orcid.org/0009-0004-5660-3625; Love: https://orcid.org/0009-0003-6758-6982; Malik: https://orcid.org/0000-0003-3804-997X; Khan: https://orcid.org/0009-0007-6641-9735
MSU Affiliation
College of Agriculture and Life Sciences; Department of Agricultural and Biological Engineering; James Worth Bagley College of Engineering; nSPARC
Creation Date
2026-06-01
Abstract
The growing demand for on-device Large Language Model (LLM) inference has accelerated the deployment of AI capabilities onto constrained edge platforms. However, effective resource management remains a critical challenge, especially with multiple LLM processes run concurrently with a diverse set of requirements such as latency, memory footprints, and workload patterns. Traditional Linux kernel schedulers are designed for general-purpose workloads and fail to account for the bursty, compute- and memory-sensitive LLMs, resulting in unstable latency and inefficient CPU-memory utilization on edge devices. To address these limitations, we propose iGenOrch, a kernel-integrated orchestration framework that enables intelligent scheduling for multi-LLM workloads on constrained devices. The iGenOrch incorporates a monitoring module that captures runtime metrics, such as per-request latency, queue size, token throughput, CPU, and memory usage. This is fed directly into a modified device kernel scheduler. The iGenOrch dynamically adapts CPU affinity, process priority, concurrency levels, and batch size to approximate a multi-objective optimization problem and achieves lower latency and workload distribution across concurrent LLM instances. Experimental results demonstrate that iGenOrch significantly improves latency stability, reduces memory overhead, and improves stability under concurrent multi-LLM execution.
Publication Date
5-14-2026
Publication Title
ACMSE 2026: Proceedings of the 2026 ACM Southeast Conference
Publisher
Association for Computing Machinery
First Page
624
Last Page
269
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Rights
© 2026 Copyright held by the owner/author(s)
Recommended Citation
Akram, F., Chowdhury, M. R., Dunne, G., Williams, K., Love, K., Malik, A. W., & Khan, S. U. (2026). iGenOrch: Intelligent Orchestration Framework for Multi-Model LLM Inference on Edge Platforms. Proceedings of the 2026 ACM Southeast Conference, 264–269. https://doi.org/10.1145/3746467.3801501