Incremental query updating in adhesive categories

research
acsets
applied category theory
databases
rewriting
AlgebraicJulia
Author
Published

2025-08-15

Abstract

Working in the general setting of adhesive categories, we derive a practical algorithm for incrementally updating a query’s results with respect to small changes in the object being queried.

Category theory often sheds light on old problems by redescribing them in a conceptually cleaner way, but it less frequently gets used to develop concrete algorithms for practical problems. In this post, the problem we address involves a query we care about: we want to maintain the answer set to some query (e.g. “how many paths of length two are there in this graph?”) when the thing being queried is changing frequently. If the changes are frequent enough, we don’t want to have to run the query after each change: we want to incrementally update our answer set given information about the change, letting us find the new answers to the query without repeating any work that was done to find the old answers. This is schematically depicted below:

While this problem of incremental computation has been studied in great depth, it is less often studied in context where the changes (\Delta, above) are induced by explicit rules which state the pattern-replacement template that was used to generate the \Delta. We will use this additional information to more efficiently address the problem, which is relevant to computational bottlenecks in e-graphs (2025), datalog queries, and agent based modeling. Before unifying these different domains, let’s consider them each independently.

Presumed background

This post will presume familiarity with categories and (co)limits. Also it is a follow-up to Substitution is (also) pushout which formally defines “updating” in the categories \mathsf{Grph} and \mathsf{Dat} and shows analogies between graph rewriting and Datalog.

1 Example incremental solutions

1.1 Graphs

Our first example of an incrementalized query is in the context of relational databases. We will focus in particular on databases with the schema \boxed{E\rightrightarrows V}, whose instances are (directed, multi-)graphs. The kind of query we can incrementalize is a conjunctive query, such as “give me all pairs of edges such that the target of the first edge is the source of second edge”. Interestingly, we can represent this query with a graph itself, \boxed{\textcircled{\footnotesize a}\rightarrow\textcircled{\footnotesize b}\rightarrow\textcircled{\footnotesize c}}, now regarded as a “pattern” rather than a piece of concrete data. Call this query Q.

Thinking of the query itself as a special graph also illuminates what answers to our query look like: applying our query to some graph G is tantamount to finding all graph homomorphisms Q\rightarrow G. The data of each such homomorphism (an assignment of vertices and edges) can be thought of as binding the variables that live in Q to the values in G. For example, if G := \boxed{\textcircled{\footnotesize 1}\rightarrow\textcircled{\footnotesize 2}\circlearrowleft}, then the answers to the query are:

  • [1,2,2] (shorthand for \{a\mapsto 1,\ b\mapsto 2,\ c \mapsto 2\})1
  • [2,2,2]

One way to describe how a graph can evolve over time is a rewrite rule: for us, a rewrite rule is a graph homomorphism where the domain is the pattern and the codomain is the replacement. In the following example, we consider a rule, f\colon L\to R, which says “given some edge, construct a path of length two which goes from its source to its target”. The pattern of the rule says “given an edge from \textcircled{\footnotesize x} to \textcircled{\footnotesize y}” and the replacement of the rule says “add in a new path of length two going from \textcircled{\footnotesize x} to \textcircled{\footnotesize y}”. An example application of this rule to some graph, G, requires a match L\rightarrowtail G to specify how the rule is to be applied, as shown below.

So, if our non-incremental query Q is “what are all paths of length two?”, the incrementalization problem is “If I assume I have all the paths of length two in some arbitrary G and I apply the rule f (which yields some H as result), what should I do to find the new paths of length two in H?”

Given that we already had [1,2,2] and [2,2,2], the matches which we need to discover are [1,3,2] and [3,2,2]. This can be seen by inspection, but we need an algorithm to do this for an arbitrary G and match L\rightarrowtail G. Keeping Q and f fixed, it’s possible to intuitively deduce the optimal thing to do: there are three independent ways to get a new answer for Q by applying f to an arbitrary graph G:

  1. One which will always be introduced by the rule directly ([1,3,2] in our example)
  2. One per outgoing edge from the target of the edge we apply f to (this yields just [3,2,2] in our example)
  3. One per incoming edge into the source of the edge we apply f to (this yields no matches in our example)

1.2 Datalog

A Datalog program has a collection of facts about some ground terms as well as rules for deriving new facts from old ones. As shown in the previous post, we can think of the execution of a Datalog program categorically as a sequence of pushouts in a category \mathsf{Dat}. This is a natural case for applying incremental query updating, since applying rules then triggers new rules which can be applied.

For example, consider the rule, p: path(X,Z) :- path(X,Y), edge(Y, Z).

Applying p makes possible new match possibilities for p’s preconditions. Can we say how to specifically look for new composable path + edge pairs in light of applying p to an arbitrary input? Yes, the solution is to look for outgoing edges from whatever vertex was Z in the previous rule application.

1.3 Other examples

This post focuses only on incremental updating with respect to rules which purely ‘add things’ in some sense. A followup post will discuss rules which merge or delete, which will be required to talk about some other interesting applications of this approach, including:

  • E-graphs (e-saturation steps as applying rewrite rules, incremental hom search to speed up e-matching)
  • Regex
  • Multiset rewriting (Martens et al. 2023)

2 Solution to the basic incremental search problem: purely additive rules

How can we unify and automate the intuition of the above examples, in the form of an algorithm that is a couple of lines long and self-evidently correct? The following dictionary will help map the intuitive concepts from the above examples onto basic concepts in category theory.

2.1 A dictionary for category theory concepts

Informal term Categorical analogue
Setting for incremental search problem An (adhesive2) category \mathsf{C}
Pattern / Query An object, Q \in \operatorname{Ob}\mathsf{C}
State of the world / set of facts An object G \in \operatorname{Ob}\mathsf{C}
Pattern match of Q in G A morphism Q \rightarrow G
Answer set to a query Q in state G \operatorname{Hom}_\mathsf{C}(Q,G)
An (additive) rewrite rule, with pattern L and replacement R A monomorphism f: L \rightarrowtail R
Application of a rewrite rule f to state G with L matched via m A pushout G\xrightarrow{\Delta} H\xleftarrow{r}R of f\colon L\rightarrowtail R and m\colon L\to G

2.2 Algorithm

Describing an algorithm categorically is a double-edged sword — we’re abstracting away from implementation details, with the benefits of getting to the essence of the algorithm and making it possible to apply it in a wide variety of settings. On the other hand, the algorithms do need to eventually be implemented, details and all. A balance is struck below by making certain informal computational assumptions about the category \mathsf{C} we work in: namely we can compute certain things, like small (co)limits, with some known computational complexity in whichever \mathsf{C} we instantiate our algorithm.

With reference to variables in the above table, our algorithm’s job is to systematically find new answers to Q in H while avoiding any work that might recover the answers we already had in G. More formally, the basic case of the incremental search problem is: when we are given {(G, H, \Delta, r)} as above at runtime, we need to compute {\operatorname{Hom}(Q,H)\setminus \operatorname{Hom}(Q,G)\cdot\Delta} (where \cdot denotes post-composition).

2.3 Solving the basic case

2.3.1 Computational assumptions

The above mention of “runtime” is contrasted with “compile time”: long before we begin our process of maintaining an answer set with respect to frequent changes, we know both what Q is and what the possible rules like f are. This is our chance to do expensive calculations and store the results in memory so that our runtime computation is as fast as possible.

Compile-time vs runtime indicated via color

This distinction is shown via color in the figures of this post: black will be used to represent information which is known at compile time, while red is used to denote information that we take to be provided to us at runtime. Green will be used to represent things our algorithm then computes at runtime.

Another computational assumption is that objects in patterns and rules (L, R, Q) are small for practical purposes, whereas the states being updated by rewrite rules (G, H) can be large. We assume it’s computationally difficult to compute the answer set to a query \operatorname{Hom}(A,B) when B is large even when A is small (in \mathbf{Grph}, this is the subgraph isomorphism problem); however, it’s easy to solve the rooted subgraph isomorphism problem (2020), so long as A is small.

Such a partial map specifies a rooted homomorphism search problem when the monic map is componentwise connected, i.e. there exists no connected component of A which lies entirely outside the image of O. The intuition here is that, in a constraint satisfaction problem, if every connected component of A has been partially initialized, then we don’t really have to “search” for morphisms from A to B; we just have to use the connectivity of A to read off all of the matches of A within B, an operation which scales only with the size of A.3

The core reason the algorithm presented below is efficient is that it transforms something analogous to a subgraph isomorphism problem into a collection of rooted subgraph isomorphism problems. This is also a good example of how we can specify something abstractly, such as a partial map from A to B, and leave it as a context-specific implementation detail of how one actually finds all of the maps from A to B consistent with that partial map.

2.3.2 Working backwards towards a solution

To say a category is adhesive is to say something specific about its limits and colimits (see nlab or (Lack and Sobociński 2005)). A consequence of this is that, for any pushout square {H\cong R+_L G} and morphism h:Q \rightarrow H, we can take four pullbacks to recover a second pushout square whose apex is Q, depicted below.

For any match into the result of a rewrite, we have a canonical decomposition of our pattern into various subobjects: {Q\cong Q_R +_{Q_L} Q_G}, i.e. Q just is the part of Q which intersects with R and the part which intersects with G, glued together by the part of Q which intersects with L. So every match h has an associated cube where the top and bottom are pushouts and the sides are all pullbacks. For ease of reference, we’ll call this the adhesive cube induced by h.

How much of this cube can be computed at compile time?

We don’t know G or H at runtime, so we can’t precompute all their incident maps. However, the rest of the cube (in black) can be anticipated.

2.3.3 The simple algorithm

At compile time

  1. Enumerate all possible top faces of an adhesive cube
    • Call these faces decompositions Q \cong Q_G +_{Q_L} Q_R
  2. Enumerate all possible side faces above the rewrite rule
    • Call these faces interactions between a Q_L\rightarrowtail Q_R and a rule f: L\rightarrowtail R, so that an interaction is a pair of maps h_L: Q_L\rightarrow L and h_R: Q_R\rightarrow R that form a pullback square.

One of these decompositions is bad: it has Q_G=Q: this is saying the match lies entirely in G. The matches which have this as the top face of their adhesive cube are (precisely) the matches we wanted to avoid in order to be solving the incremental search problem, so we ignore this decomposition.

At runtime:

  • Once we know what f is, we can separately consider each partial adhesive cube which consists in a decomposition (top face) and a compatible interaction with f (side face).

  • Given m, we can compute all possible h_G: Q_G\rightarrow G only keeping the ones which form a pullback square with m and h_L.

  • Then, for each such h_G, we have a corresponding match {h=[h_G\cdot \Delta, h_R\cdot r]}.

Correctness

That this procedure recovers exactly the new matches, {Hom(Q,H)\setminus Hom(Q,G)\cdot\Delta}, is clear: each new match corresponds to a unique adhesive cube, and we enumerate all possible cubes (deliberately excluding the ones which factor through \Delta).

That this is efficient requires showing that the search for h_G is always a rooted search problem, i.e. Q_L \rightarrowtail Q_G is componentwise connected. First, Q_R is nonempty because we ruled out the trivial decomposition. Then, let us assume Q is connected.4 Because {Q \cong Q_G +_{Q_L} Q_R}, if some disconnected component of Q_G were not in the image of Q_L, it would remain a disconnected component in Q.

2.3.4 Example: finding newly introduced paths of length 2

Let’s revisit our earlier example of finding paths of length two that spring into existence via the rewrite application which introduces a triangle, reproduced on the left.






On top we see a choice of decomposition and interaction that recovers the match [1, 3, 2]. The Q_G of the decomposition asserts that the beginning and end vertex of the path will live in G, but the middle vertex and both edges will come from material newly added by the rule. Note in this case, the partial map Q_G=Q_L \rightarrow G is a total map, so there is always a unique h_G (and h) associated with this rewrite.

On the bottom, we see a choice of decomposition and interaction that recovers the match [3, 2, 2]. The Q_G of the decomposition asserts that the second edge of the path will live in G, but the first vertex and edge will come from material newly added by the rule. The search for h_G candidates induced by the partial map from Q_G to G looks for outgoing edges from \textcircled{\footnotesize 2} in G. There is only one such edge in this case, but in general this decomposition+interaction pair could lead to zero or many new matches found.

2.4 An optimization when \mathsf{C} has complements

The previous algorithm might be unsatisfactory in two ways:

  1. There are a lot of ways to express Q as a pushout. We have to iterate over all of them.
  2. It might seem wasteful that we had to filter h_G candidates by those which formed a pullback square, just because adhesive cube side faces are always pullbacks.

We can address both these issues when \mathsf{C} has complements.

Definitions

The union of two subobjects A and B, denoted A \vee B, is computed by gluing them together along their intersection (i.e. pullback, denoted A \wedge B).

The complement of {A\rightarrowtail X}, denoted {{\sim}A}, is the smallest subobject for which {X = A ∨ {\sim}A}.

The boundary of A, denoted \partial A, is A \wedge {\sim} A.

Once we have this structure, we can succinctly express the optimized algorithm. We solve the incremental problem by doing the same procedure as before with two changes:

  1. Only consider decompositions where Q_R = {\sim}Q_G.
  2. Allow all of the the extensions of the partial map from Q_G to G to be maps h_G used to induce new matches h.

2.4.1 Too good to be true? Why is the optimization correct?

For any match h: Q\rightarrow H, we have not only a unique adhesive cube, but also a unique minimal adhesive cube. One obtains the latter from composing two cubes together like in the below diagram:

Note we can see this arbitrary h has a unique associated interaction between {\partial Q_G \rightarrowtail {\sim}Q_G} and f, i.e. a pair of maps {(h_\partial, h_\sim)} that forms a pullback with f. We get this by precomposing h_L and h_R with {\partial Q_G \rightarrowtail Q_L} and {{\sim}Q_G \rightarrowtail Q_R} respectively.

So the algorithm is structurally the same: each h has a unique cube associated with it, so loop over all partial cubes and try to complete them. But it’s nicer to loop over partial minimal adhesive cubes because the possible top faces are generally a small subset of all possible decompositions of Q. Also, the way to complete these cubes is by simply looking for maps h_G which extend the partial map Q_G \leftarrowtail \partial Q_G \xrightarrow{h_\partial\cdot m} G (no filtering based on a pullback condition). By the same argument as above, this partial map induces a rooted search problem because Q is connected and we are excluding the trivial decomposition.

2.4.2 Optimized algorithm example

\mathsf{Grph} has complements. Let’s consider a new scenario, where R=Q and L=G. The rule L\rightarrowtail R looks for a cospan of edges and adds an edge in order to make a path of length two.

The decomposition Q_R+_{Q_L}Q_G above one gets by taking pullbacks with h is not minimal. Although Q_L and Q_R involve all three vertices of the pattern, the only part of the pattern L that’s actually relevant to the rewrite (i.e. the only part that’s necessary to get us from Q_G to Q) is the \textcircled{\footnotesize y}\rightarrow\textcircled{\footnotesize z} part of it. The complement operation is what allows us to formally ignore the parts of the pattern that do not play an essential role in the rewrite.

If we were running the old algorithm using the minimal interaction, the pullback condition would say that, moreover, \textcircled{\footnotesize 1} doesn’t appear in the pattern. Because of this, the previous algorithm would not find the match associated with h_G = [1, 2, 3] depicted in this figure from the minimal interaction. Instead, that map h_G would have been detected by a non-minimal interaction with Q_L and Q_R, also shown in the figure.

The core idea of this optimization, specialized to this example, is that one shouldn’t need to distinguish special cases for whether \textcircled{\footnotesize 1} appears in the pattern or not.

2.5 Batch update from multiple rule firings

We can apply many additive rewrites at once with a single colimit, rather than as an artificially-ordered sequence of pushouts.

A match into the result of multiple simultaneous rewrites has a corresponding decomposition of the pattern into a colimit of subobjects of the same shape. The n=2 case induced adhesive ‘multicube’ is shown below:

Our algorithm generalizes: at compile-time enumerate possible nontrivial decompositions of Q with the above shape and all interactions of subobject morphisms with rewrite rules. At runtime, some finite family of rewrite rules (e.g. (f_1,f_3,f_3,f_2)) will have been executed with corresponding matches m_i. We loop over possible partial multicubes and look for morphisms h_G: Q_G\rightarrow G that lead to pullback faces above all of the matches, each of which uniquely results in a new match h=[h_G\cdot \Delta, h_{R,1}\cdot r_1, h_{R,2}\cdot r_2,...].

Optimized version when \mathsf{C} has complements

There is also an optimized batch algorithm that requires just looking for maps Q_G \rightarrow G that commute with all h_{L,i} and m_i rather than filtering for ones which furthermore satisfy a pullback property. In brief, we require Q_{R,i} = {\sim}(Q_G \vee \bigvee_{j\ne i} Q_{R,j}): this codifies that we want each interaction to be as small as possible.

2.5.1 Batch example

Let’s consider a pattern that looks for paths of length three, rather than two. This example has a batch two applications of the triangle-introducing rule. This leads to a match which incorporates material from the old graph as well as newly introduced material from both rewrites:

The resulting match, [4, 2, 2, 5], is ultimately found starting from the decomposition of Q on top, which asserts parts of Q must overlap with G, L_i, and R_i. The middle edge in the pattern Q comes from the loop of the old graph, whereas the first and last vertices and edges are respectively created via the two rule applications.

2.6 Nonmonic matches

Nonmonic matches can implicitly quotient parts of the pattern, leading to new possible ways a match for a pattern Q could be created. For every possible L\twoheadrightarrow L', we can compute the corresponding quotiented rule at compile time. At runtime, we epi-mono factorize any nonmonic match and use the previous algorithm with the quotiented rule. This is shown schematically below:

By factoring the pushout square into a pushout of epis and a pushout of monos, we can ignore the top pushout square and pretend our rewrite rule was r' all along, now with a monic match m'. A particular example is shown below.

Applying our triangle rule to a loop rather than an edge between distinct vertices leads to a new kind of path of length two that can appear (the two newly-introduced edges, but now in the opposite order), which wouldn’t be detected by our algorithm using the unquotiented rule.

3 Conclusion

Adhesive categories are a general setting for reasoning about pattern matching. We assume some computational primitives can be implemented in any domain of interest, thought of as an adhesive category:

  1. computing decompositions of an object into subobjects
  2. computing small (co)limits
  3. extending a partial map into a set of total ones.

The efficiency of this approach derives from its systematic transformation of a subgraph isomorphism problems into a collection of rooted subgraph isomorphism problems. Another aspect that makes this algorithm special is that it leverages information beyond the extensional difference of an update in order to speed up incremental answer set update.

Solving the base problem can be straightforwardly generalized to non-monic matches, non-connected patterns, and batch updates. In cases where one can compute complements to subobjects, we can be even more efficient.

There is an implementation in AlgebraicRewriting.jl in arbitrary \mathsf{C}-Set categories (2022). \mathbf{Grph} is a special case of this. There is more to be said about deletion and merging in future work!

References

Biondo, Roberto, Davide Castelnovo, and Fabio Gadducci. 2025. “EGGs Are Adhesive!” https://arxiv.org/abs/2503.13678.
Campbell, Graham, Jack Romo, and Detlef Plump. 2020. “Improving the GP 2 Compiler.” CoRR abs/2002.02914. https://arxiv.org/abs/2002.02914.
Lack, Stephen, and Paweł Sobociński. 2005. “Adhesive and Quasiadhesive Categories.” RAIRO-Theoretical Informatics and Applications 39 (3): 511–45.
Martens, Chris, Alexander Card, Henry Crain, and Asha Khatri. 2023. “Modeling Game Mechanics with Ceptre.” IEEE Transactions on Games 16 (2): 431–44.
Patterson, Evan, Owen Lynch, and James Fairbanks. 2022. “Categorical Data Structures for Technical Computing.” Compositionality 4 (December): 5. https://doi.org/10.32408/compositionality-4-5.

Footnotes

  1. We can succinctly describe graph homomorphisms by a vector which says where the domain vertices go. This is unambiguous when the domain vertices have an obvious order and codomain graph has at most one edge between any two vertices.↩︎

  2. What adhesivity means for us in practice will be described below.↩︎

  3. This also assumes some bound to the node degree of B: even though G might be big (say ~10^6 vertices and edges), there won’t be 10^6 edges between any two particular vertices.↩︎

  4. We can assume our original pattern Q was connected without loss of generality. If Q = Q_1+Q_2, we can separately update \operatorname{Hom}(Q_1,G) and \operatorname{Hom}(Q_2,G). This is because \operatorname{Hom}(Q_1+Q_2,G) \cong \operatorname{Hom}(Q_1,G) \times \operatorname{Hom}(Q_2,G).↩︎

Leaving a comment will set a cookie in your browser. For more information, see our cookies policy.