iGenOrch: Intelligent Orchestration Framework for Multi-Model LLM Inference on Edge Platforms

ORCID

Akram: https://orcid.org/0009-0000-7134-7405; Chowdhury: https://orcid.org/0009-0008-0693-8696; Dunne: https://orcid.org/0009-0003-5285-9386; Williams: https://orcid.org/0009-0004-5660-3625; Love: https://orcid.org/0009-0003-6758-6982; Malik: https://orcid.org/0000-0003-3804-997X; Khan: https://orcid.org/0009-0007-6641-9735

MSU Affiliation

College of Agriculture and Life Sciences; Department of Agricultural and Biological Engineering; James Worth Bagley College of Engineering; nSPARC

Creation Date

2026-06-01

Abstract

The growing demand for on-device Large Language Model (LLM) inference has accelerated the deployment of AI capabilities onto constrained edge platforms. However, effective resource management remains a critical challenge, especially with multiple LLM processes run concurrently with a diverse set of requirements such as latency, memory footprints, and workload patterns. Traditional Linux kernel schedulers are designed for general-purpose workloads and fail to account for the bursty, compute- and memory-sensitive LLMs, resulting in unstable latency and inefficient CPU-memory utilization on edge devices. To address these limitations, we propose iGenOrch, a kernel-integrated orchestration framework that enables intelligent scheduling for multi-LLM workloads on constrained devices. The iGenOrch incorporates a monitoring module that captures runtime metrics, such as per-request latency, queue size, token throughput, CPU, and memory usage. This is fed directly into a modified device kernel scheduler. The iGenOrch dynamically adapts CPU affinity, process priority, concurrency levels, and batch size to approximate a multi-objective optimization problem and achieves lower latency and workload distribution across concurrent LLM instances. Experimental results demonstrate that iGenOrch significantly improves latency stability, reduces memory overhead, and improves stability under concurrent multi-LLM execution.

Publication Date

5-14-2026

Publication Title

ACMSE 2026: Proceedings of the 2026 ACM Southeast Conference

Publisher

Association for Computing Machinery

First Page

624

Last Page

269

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Rights

Recommended Citation

Akram, F., Chowdhury, M. R., Dunne, G., Williams, K., Love, K., Malik, A. W., & Khan, S. U. (2026). iGenOrch: Intelligent Orchestration Framework for Multi-Model LLM Inference on Edge Platforms. Proceedings of the 2026 ACM Southeast Conference, 264–269. https://doi.org/10.1145/3746467.3801501

Publications

iGenOrch: Intelligent Orchestration Framework for Multi-Model LLM Inference on Edge Platforms

ORCID

MSU Affiliation

Creation Date

Abstract

Publication Date

Publication Title

Publisher

First Page

Last Page

Creative Commons License

Rights

Recommended Citation

Digital Object Identifier (DOI)

Browse

Search

Submit

Learn More

Powered By

Publications

iGenOrch: Intelligent Orchestration Framework for Multi-Model LLM Inference on Edge Platforms

Authors

ORCID

MSU Affiliation

Creation Date

Abstract

Publication Date

Publication Title

Publisher

First Page

Last Page

Creative Commons License

Rights

Recommended Citation

Share

Digital Object Identifier (DOI)

Browse

Search

Submit

Learn More

Powered By