Model-Based Computing

cdc-BbP7rqIGB3c-unsplash.jpg

Theory and software for computing with scientific models.

Scientific models are traditionally programmed by hand, accreting bells and whistles over years or decades. Even moderately complex models can be difficult to specify in conventional natural and mathematical language, leading some to adopt the slogan that β€œthe code is the model.” This conflation, a response to inadequate conceptual and computational tools, severely impacts both the productivity of scientists and the reliability of their science. Models-as-code are laborious and error prone to create, modify, or extend. They are also difficult to communicate due to the lower level of abstraction demanded by programming. In the era of COVID-19, the challenge of reliably building and validating complex models has been vividly illustrated in high-profile cases. Moreover, no model exists in isolation. Instead, each model belongs to an intricate web of further models, competing on grounds of accuracy, tractability, and interpretability. The exploration of this web of models is a central activity of science.

We are developing the theory and software that will enable scientific and statistical models to be treated as first-class entities, which may be created, transformed, compared, and executed with the same ease as conventional data structures. Mathematically, we draw on ideas from category theory, especially categorical logic, to cleanly separate theories and models in science and statistics and to represent the theories as algebraic structures, which are amenable to machine processing. Morphisms of theories and of their models then formalize the relationships that comprise the web of scientific models.

To put the theory to practical use, we are building new software through the AlgebraicJulia project. This effort includes Catlab.jl, a general-purpose programming library for applications of category theory to STEM fields, as well as specialized packages for specific domains, such as AlgebraicPetri.jl for Petri net models in epidemiology. Future work will instantiate the recently introduced formalism for statistical theories and models. Irrespective of the domain, models are represented as high-level computational objects, supported by rigorous mathematics; complex models are composed out of simpler ones in declarative style; and efficient numerical solvers are generated from the high-level specification, without recourse to manual numerical coding. Morphisms between models are also captured in software, enabling efficient transformation and comparison of models.


Research Lead: Evan Patterson