Graph Symbolic Regression to Interpret the Propagation of Vesicular Stomatitis Virus Across the U.S. and Mexico

ORCID

Rashme: https://orcid.org/0009-0009-1032-5005; Zhang (Zonghan): https://orcid.org/0009-0008-1578-5556; Weeks: https://orcid.org/0009-0006-8353-8165; Benbrahim: https://orcid.org/0009-0000-3728-9870; Zhang (Zijian): https://orcid.org/0009-0009-1099-4364; Chen: https://orcid.org/0000-0003-4112-9647; Pillai: https://orcid.org/0000-0002-2275-6998; Ramkumar: https://orcid.org/0000-0003-3183-0165; Nanduri: https://orcid.org/0000-0002-9996-2976

MSU Affiliation

James Worth Bagley College of Engineering; Department of Computer Science and Engineering; College of Veterinary Medicine; Department of Comparative Biomedical Sciences

Creation Date

2026-01-15

Abstract

The Vesicular Stomatitis virus (VSV) causes cases of livestock disease that occur every year in regions in Mexico. Every few years, VSV spreads northwards into the U.S. in large outbreak events affecting hundreds of livestock premises across multiple states, leading to significant economic losses due to quarantines, trade restrictions, and veterinary expenses. VSV cases are mainly driven by biting arthropod vectors from multiple genera with different ecologies, making outbreak control challenging. The sporadic nature of outbreaks and limited understanding of transmission dynamics further hinder containment efforts, reducing the effectiveness of preemptive measures. In this paper, we propose an interpretable model to elucidate the key rules governing the spread of VSV. This model employs a sparse symbolic regression model, SINDy (Sparse Identification of Nonlinear Dynamical Systems), to identify the most significant ecological variables in spread dynamics, considering both spatial and temporal factors. Since many counties did not have VSV cases during the study period, counties were clustered into 40 regions incorporating static environmental variables land cover, soil properties, livestock density, and climate data and using spatially constrained Agglomerative Clustering based on geographic adjacency, resulting in an average region size of approximately 90 counties. Ecological variables included dynamic and static variables such as temperature, humidity, wind, soil characteristics, and altitude associated with vectors and hosts (cattle, horses, and mules). The change in cases from month to month by region was modeled using two SINDy variants: a baseline model with only ecological features (Normal) and an extended model incorporating spatially derived graph features (Graph).Each alpha was chosen to minimize CV-MSE while retaining less than 11 terms. Graphical features greatly reduced model error, and the SINDy model with select graphical features had a slightly better CV-MSE score than when all graphical features were included. All models identified the infected species as important in capturing the dynamics of case differences between regions.

Publication Date

12-12-2025

Publication Title

SIGSPATIAL '25: Proceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems

Publisher

ACM

First Page

977

Last Page

980

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS
 

Digital Object Identifier (DOI)

https://doi.org/10.1145/3748636.3764166