Geometric Type Theory, Done Two Ways
A topos can be specified by the geometric theory that it classifies. Though the sequents of a theory are described formally and syntactically, its interaction with the world of sets (through set-indexed disjunctions and axiom schemas) is often a little hand-wavy. In this post I describe two ways of doing this more precisely using type theory.
In a previous post, David Jaz outlined a vision of a type theory for working in the internal language of all toposes. Now I’d like to nibble at this: how can we represent geometric theories in syntax?
David Jaz and I are not the only ones circling this idea: Steven Vickers has long advocated for finding a good syntax for geometric theories, in which one could do purely “continuous mathematics”. Johannes Schipp von Branitz and Ulrik Buchholtz have recently been investigating what ought to be the propositional fragment of a larger geometric type theory. Taichi Uemura also has a set of notes that describe a type theory for working with toposes synthetically.
These last two works intend to be modelled in sheaves on the category of all toposes, with a suitable topology. I’ll do something simpler, and just describe a syntax intended to be modelled in toposes directly.
There’s a lot of past work on describing theories of various kinds, or of adding sorts with more structure to first-order logic. To name a few, we have Generalised Algebraic Theories, First-Order Logic with Dependent Sorts (FOLDS) and the similar DFOL, picked up by and analysed by Palmgren.
None of these quite capture what we’re looking for. Our wishlist: we want a reasonable language for defining geometric theories, allowing arbitrary set-indexed disjunctions in a principled way. We should be able to describe and manipulate models of these theories, and reason inside the classifying topos of each theory if we like. While we’re really dreaming, the true goal is a syntactic understanding of Ingo’s synthetic quasicoherence, which would let us prove all the interesting non-geometric properties of the universal model inside each classifying topos.
In this post I’ll outline two ideas for what a sensible geometric type theory could look like. The first is not too dissimilar from the way theories are ordinarily presented: a list of sorts and axioms. These have a somewhat involved context structure, in order to properly capture the set-indexed disjunctions that are characteristic of geometric theories. The second is more radical, using a lot of the ideas from Owen Lynch’s EMTT; theories are built up more type theoretically, leaning heavily on a restricted kind of \Pi-type to emulate the more complex context structure of the first version. I’ll end the post with a handful of additional thoughts and concerns, in case any readers have good ideas for solving them!
One note before we begin. When working with geometric logic in the ordinary style, constructions such as the product of sorts must be performed “by hand”. One adds an additional structureless sort together with projection functions, and a collection of axioms which give this new sort the correct universal property; in this sense one builds new types out of logic. A preprint by Vickers contains many examples of this trick.
As type theorists we are more used to doing the opposite; starting with a handful of constructions on types as primitive and carving out logic as a fragment of these. As hinted at in David Jaz’s previous post, we’ll take a leap of faith and build our theories in a proof-relevant, proposition-as-types style, rather than sticking to pure logic. Geometric constructions in the style we present below will be sufficient to, for example, demand that a type is a (h-)proposition, and build the propositional truncation (as a higher inductive type).
1 Geometric Theories Via Propositions as Types
To hit all the things on our wishlist there’s an unfortunate explosion in the number of judgements involved. We’ll build up to the full set gradually.
First, the specification of theories themselves. We’ll focus on the coherent fragment to begin with, leaving aside set-indexed disjunctions. As usual, theories will be built as a list of sorts and terms, each of which may use the sorts and terms that came before it. Like Generalised Algebraic Theories and unlike FOLDS, we’re going to allow free mixing of terms and dependent sorts. That is, a theory is a sequence of judgements, each with one of the shapes
where \phi is a geometric construction in the sorts and terms of the theory so far, \Phi is a context of such geometric constructions, and X and c are fresh names.
In this proof-relevant style, we’ll collapse the distinction present in first-order logic between the “context” of variables and the collection of propositional antecedents of a sequent: rather than writing \psi \vdash_{\Gamma} \phi, we will use \Gamma, p : \psi \vdash c : \phi where \psi and \phi are constructions that happen to be propositions. For this reason, we won’t add separate primitive notions of functions or relations; (proof-relevant) terms will suffice for the former and dependent sorts for the latter.
Let’s see a couple of examples in this style before describing the judgements more formally. First, the theory of monoids looks similar to what one would write down as a GAT.
In contrast to FOLDS or DFOL, the axioms of a monoid are added as terms of certain equality types rather than through a primitive equality judgement or propositional relation.
The “types” \phi that may appear in these judgements form a little type theory that contains only the type formers we expect to be stable under pullback by geometric morphisms; that is, \Sigma, = and a to-be-pinned-down class of (higher) inductive types. Crucially, we leave out function types and universes. Any universal quantification in a theory must be performed by the top-level construction context \Phi, and we are not allowed to quantify over sorts.
As a second example consider the theory of a strict interval, whose classifying topos is simplicial sets. We have a sort with a relation on it:
\begin{align*} \mathord{\cdot}&\vdash I \,\,\mathsf{sort}\\ x : I, y : I &\vdash(x \leq y) \,\,\mathsf{sort}\\ \end{align*}
We can proceed to add all the ordinary terms of the theory:
where P \vee Q :\equiv\exists\left( P + Q\right) is constructed using the propositional truncation in the ordinary way.
The x \leq y relation should be proposition-valued. We can enforce this with an additional term:
\begin{align*} x : I, y : I, p : x \leq y, q : x \leq y &\vdash w : p =_{(x \leq y)} q \end{align*}
Similarly, back in the monoid example we likely want the carrier of the monoid X to be 0-truncated in the homotopical sense; this requirement can also be written as a geometric construction:
\begin{align*} x : X, y : X, p : x =_X y, q : x =_X y &\vdash w : p =_{(x=y)} q \end{align*}
1.1 Formal Judgements
Let us set this up a little more formally. This is a first pass that we’ll have to add features to later. We have the following judgements:
\begin{align*} &\mathbb{T}\,\,\mathsf{theory}&& \quad \text{Theory} \\ &\mathbb{T}\mid \Phi \,\,\mathsf{ctx}&& \quad \text{Construction Context} \\ &\mathbb{T}\mid \Phi \vdash\phi \,\,\mathsf{constr}&& \quad \text{Construction in Construction Context} \\ &\mathbb{T}\mid \Phi \vdash c : \phi && \quad \text{Term of Construction} \end{align*}
There are two ways to extend a theory: declare a new (dependent) sort, or declare new term.
Construction contexts \Phi are ordinary type theoretic contexts: lists of variables with an associated construction as its type; each of which may be dependent on the previous entries.
Each kind of theory extension has an associated “variable rule”, allowing it to be used directly so long as we specify parameters to use for its construction context.
Now, the actual rules for constructions and their terms. First, we have the “inner” variable rule that allows us to use a variable from the current construction context.
The construction formers available are \Sigma, =, +, (some) higher inductive types, all taking place within a fixed \mathbb{T}\,\,\mathsf{theory}. Because the language of constructions lacks universes, we will need to compensate by including large elimination rules for inductive types. And because we lack function types, we will likely also need to include an extra “Frobenius” telescope \Phi' in these induction principles.
1.2 Set-indexed Sorts and Terms
The next step is to get the external world of sets involved. Thus far we are only capable of expressing “finitely axiomatisable” theories, those consisting of a finite collection of sorts and terms. This is not sufficient for many coherent theories encountered in practice, even before we get to geometric theories.
Consider the theory of R-modules for a fixed ring R. With our current tools we are able to express an abelian group structure on a sort M, but scalar multiplication requires a scaling operation \sigma_r : M \to M for all elements of the set r : R.
Put differently, we need to add a notion of “axiom schema” which allows us to add an axiom for each element of a type, and in the general case this axiom may vary over the elements of the type. While we’re at it, we may as well extend the notion to terms and sorts generally – we’ll handle it through the same facility.
Schemas of this kind are achieved as follows: each sort and term has an ordinary context of types coming before the construction context, so that each sort or term is universally quantified over this context. In the R-module example:
On the right-hand side, some of the constructions we have postulated involve operations in the background type theory: we see the multiplication of ring elements r \cdot s in p_1, and addition r + s in p_4. These are not part of the theory being defined, but happening externally to it.
For an example of a “sort schema”, consider the theory of prime filters of a ring R, whose classifying topos is the small Zariski topos \mathrm{Spec}(R). Here, we have one sort P(r) for each ring element r : R, which we think of as the proposition that the ring element lies in the filter. The remaining terms of the theory are then straightforward translations of the theory in the ordinary style.
Formally, these contexts of types are added to all construction judgements and all the rules for interacting with them.
\begin{align*} &\mathbb{T}\,\,\mathsf{theory}&& \quad \text{Theory} \\ &\mathbb{T}\mid \textcolor{Crimson}{\Delta}\mathbin{;}\Phi \,\,\mathsf{ctx}&& \quad \text{Construction Context} \\ &\mathbb{T}\mid \textcolor{Crimson}{\Delta}\mathbin{;}\Phi \vdash\phi \,\,\mathsf{constr}&& \quad \text{Construction in Construction Context} \\ &\mathbb{T}\mid \textcolor{Crimson}{\Delta}\mathbin{;}\Phi \vdash c : \phi && \quad \text{Term of Construction} \end{align*}
When using the “variable rules”, we now need to provide parameters to use for the entries in the prefix of types of the construction context:
Besides quantifying over elements of a type in this opaque way, we may wish to make choices based on elements of an inductive type.
For example, the theory of a ring R is coherent and finitely axiomatisable. Requiring the ring to have characteristic 0 is still coherent but requires infinitely many terms to specify: we ask that the ring is not characteristic 2, not characteristic 3, and so on. We add one term for each n : \mathbb{N}, stating that (n +_{\mathbb{N}} 1) \cdot 1 = 0 yields a contradiction.
In order to define (n \cdot x) : R in the construction world, we need to be able to do construction-valued induction on the type \mathbb{N}; this requires a new induction principle:
Temporarily adopting a more convenient syntax, we might write this as an ordinary pattern match:
\begin{align*} n \cdot x :\equiv\mathsf{case}(n, &\,\mathsf{zero} \mapsto 0_R, \\ &\,\mathsf{succ}(m) \mapsto x +_R (m \cdot x) ) \end{align*}
And then we can state the term schema that forces the ring to be characteristic 0.
\begin{align*} n : \mathbb{N}\mathbin{;}p : ((n+1) \cdot 1_R) = 0_R &\vdash c : \varnothing \end{align*}
(In this particular example, we could have used a version of the natural numbers \mathbb{N} definable in the language of constructions, if we choose to include \mathbb{N} in the class of inductive constructions we allow.)
1.3 The Ax Construction-Former
The feature of geometric logic remaining to be added is infinitary disjunctions, indexed over any type.
The context of types of the last section behaves much like an infinitary disjunction appearing to the left of the turnstile, but we need an equivalent for when an infinitary disjunction appears on the right. For this, we add a new construction former \mathop{\mathrm{\mathbb{A}x}}A, so that including a term of the construction \mathop{\mathrm{\mathbb{A}x}}A in a theory is asserting that we have an element of the type A.
This is a typical “positive” type former, where the eliminator is designed to give a universal mapping-out property. In this case, to build a construction assuming a term of the construction \mathop{\mathrm{\mathbb{A}x}}A is the same as giving a construction for every element of the type A. Of course, each construction having the extra context zone of types is necessary for this to make sense.
The \mathop{\mathrm{\mathbb{A}x}} construction-former is the final missing piece, and we can now capture in syntax all your favourite geometric theories. First, the flip side of the example above: the theory of a ring of finite characteristic. Starting with the theory of a ring R, we can add
\begin{align*} \mathord{\cdot}\mathbin{;}\mathord{\cdot}&\vdash c : \mathop{\mathrm{\mathbb{A}x}}\mathbb{N}\\ \mathord{\cdot}\mathbin{;}\mathord{\cdot}&\vdash p : \mathsf{let} \; {\mathop{\mathrm{ax}}n} := {c} \, \mathsf{in} \, {(n \cdot 1_R= 0)} \end{align*} using the same n \cdot x operation defined above. We assert a term of \mathop{\mathrm{\mathbb{A}x}}\mathbb{N}, corresponding to an element of the type \mathbb{N}, and then must unwrap it into the context of types before we can use it as we did previously.
An important example we can now capture is the theory of a flat functor from \mathcal{C} where \mathcal{C} is a fixed small category. Suppose we have the category provided as a collection of types and functions (not a theory!). In pseudo-Agda notation:
\begin{align*} O &: \mathcal{U}\\ H &: O \to O \to \mathcal{U}\\ i &: (a : O) \to H(a, a) \\ \circ &: \{a : O\} \to \{b : O\} \to \{c : O\} \to H(a, b) \to H(b, c) \to H(a, c) \\ &\text{\dots associative, unital \dots} \end{align*}
From these we can construct, again purely in type theory, the type of cones with endpoints a and b, and the type of equalisers of a pair of morphisms.
\begin{align*} &\mathsf{Cone} : O \to O \to \mathcal{U}\\ &\mathsf{Cone}(a, b) := (c : O) \times H(c, a) \times H(c, b) \\ &~\\ &\mathsf{Eq} : \{a : O\} \to \{b : O\} \to H(a, b) \to H(a, b) \to \mathcal{U}\\ &\mathsf{Eq}(f, g) := (c : O) \times (h : H(c, a)) \times f \circ h =_{H(c, b)} g \circ h \end{align*}
Then, in our style, we can build the theory of a flat functor from \mathcal{C} in pieces.
I find this quite nice: the type level and theory level are both captured in the formal language, and the boundary between the two is made totally explicit.
1.4 Internal Theories
Something we’ve done a couple of times now is judged a theory with reference to some type-theoretic parameters. For example, the ring R to be used in the theory of an R-module, or the data of the category (O, H, \dots) in the theory of a flat functor. These could be provided as closed types and terms, but better would be to allow ourselves to judge a theory internal to an ambient context.
This can be done by sneaking yet another notion of context into our judgements as a prefix, to act as the context that all our work is happening inside:
\begin{align*} &\textcolor{Crimson}{\Gamma} \mid \mathbb{T}\,\,\mathsf{theory}&& \quad \text{Theory} \\ &\textcolor{Crimson}{\Gamma} \mid \mathbb{T}\mid \Delta \mathbin{;}\Phi \,\,\mathsf{ctx}&& \quad \text{Construction Context} \\ &\textcolor{Crimson}{\Gamma} \mid \mathbb{T}\mid \Delta \mathbin{;}\Phi \vdash\phi \,\,\mathsf{constr}&& \quad \text{Construction in Construction Context} \\ &\textcolor{Crimson}{\Gamma} \mid \mathbb{T}\mid \Delta \mathbin{;}\Phi \vdash c : \phi && \quad \text{Term of Construction} \end{align*}
The above judgements presuppose that \Gamma \vdash\Delta \,\,\mathsf{tele} holds. This \Gamma is now sprinkled through all the rules but does not interact with them in any interesting way, other than being present when typing each telescope \Delta.1
We can replay the theories we described above, but properly parameterised over their inputs.
1.5 Judging a Model
Now that we can describe a theory in context, it would be nice if we could express what it means to have a model of that theory. For this, we will introduce a new judgement: \Gamma \vdash m ::\mathbb{T}, the claim that m “models” or “satisfies”2 the theory \mathbb{T}. We can use variables from the ambient context \Gamma when describing a model, so really this judgement expresses a model-in-context of a theory-in-context.
The rules for producing a model have some moving parts that need to be defined mutually inductively. To build up a model of a theory means providing an appropriate construction in the world of types for each field of the theory.
Each component of a model binds the names in its contexts \Delta and \Phi. The construction context \Phi for each field is interpreted using the components we have already specified for the previous fields of the theory. For example, when describing a model of the theory of a monoid, once we’ve chosen a type A to use as the carrier, specifying the operations and equations involves elements of that specific type A occurring entirely in the type-theoretic world.
Carrying this out means using admissible operations that have the following shape. Given a model of a theory, we are able to interpret any construction context, construction, and construction substitution in that model, giving a result in the underlying type theory.
This is done by induction on the structure of the constructions involved, so for example:
\begin{align*} {\llbracket \phi + \psi \rrbracket}_{m} &:\equiv{\llbracket \phi \rrbracket}_{m} + {\llbracket \psi \rrbracket}_{m} \\ {\llbracket \phi \times \psi \rrbracket}_{m} &:\equiv{\llbracket \phi \rrbracket}_{m} \times {\llbracket \psi \rrbracket}_{m} \\ &\dots \end{align*}
Once we reach the base cases of the direct use of a sort or a term, the model provides exactly the type or element to be used in that location.
\begin{align*} {\llbracket X(\tau;\theta) \rrbracket}_{m} &:\equiv m.X[\tau][{\llbracket \theta \rrbracket}_{m}] \\ {\llbracket f(\tau;\theta) \rrbracket}_{m} &:\equiv m.f[\tau][{\llbracket \theta \rrbracket}_{m}] \end{align*} Here, m.X and m.f project the corresponding component of the model.
1.6 Stepping Into a Classifying Topos
Our goal from the start has been to find a way to work in the “internal language of all toposes”. Once we’ve described a theory, we need a way to step into the classifying topos of that theory. We’ll do this by adding a new notion of context extension for the ambient context \Gamma, which allows us to hypothesise a model of a theory rather than an element of a type.
Working inside the ambient context “u ::\mathbb{T}\,\,\mathsf{ctx}” will correspond to working internal to the classifying topos \mathrm{Set}[\mathbb{T}]. The universal model of \mathbb{T} inside \mathrm{Set}[\mathbb{T}] is exactly the model u, which we will have access to using a variable rule (to be described soon) for these model context extensions. More generally, \Gamma \vdash\mathbb{T}\,\,\mathsf{theory} is a theory internal to the topos specified by the context \Gamma , and the extended context \Gamma, u :: \mathbb{T}\,\,\mathsf{ctx} is then the classifying topos \Gamma[\mathbb{T}].
So, “\Gamma” in the judgements and rules we’ve seen so far is now generalised to allow interleaving of ordinary and model context extensions. The “\Delta” context appearing in the description of sorts and terms is not generalised in this way; it must only contain ordinary variables.
This new notion of context extension has corresponding weakening and substitution operations, but here we encounter a wrinkle. Semantically, these correspond to pullback along a geometric morphism, which does not commute with all type formers and operations.
This manifests as weakenings and substitutions getting stuck in certain places. In particular, we must make this kind of weakening explicit rather than silent in the syntax of a term.
We have equations that push both of these operations through geometric term and type constructors, but none of the others.
\begin{align*} ((x : A) \times B)\{{m/u}\} &\equiv(x : A\{{m/u}\}) \times B\{{m/u}\} \\ (A + B)\{{m/u}\} &\equiv(A\{{m/u}\} + B\{{m/u}\}) \\ ~\\ (A \to B)\{{m/u}\} &\not\equiv A\{{m/u}\} \to B\{{m/u}\} \\ \mathcal{U}\{{m/u}\} &\not\equiv\mathcal{U} \end{align*}
Similarly, these model substitutions push into theories and constructions. By their very nature these are all built out of geometric pieces, with the exception of \mathop{\mathrm{\mathbb{A}x}} constructions. Once this shifts us back into the world of types, the model substitution may get stuck again on a non-geometric type former.
\begin{align*} (\mathbb{T}, (\Delta\mathbin{;}\Phi \vdash X \,\,\mathsf{sort}))\{{m/u}\} &\equiv\mathbb{T}\{{m/u}\}, (\Delta\{{m/u}\}\mathbin{;}\Phi\{{m/u}\} \vdash X \,\,\mathsf{sort}) \\ (\mathbb{T}, (\Delta\mathbin{;}\Phi \vdash x : \phi))\{{m/u}\} &\equiv\mathbb{T}\{{m/u}\}, (\Delta\{{m/u}\}\mathbin{;}\Phi\{{m/u}\} \vdash x : \phi\{{m/u}\}) \end{align*}
We now have the language necessary to specify the variable rule:
That is, we have a theory \mathbb{T} internal to a topos \Gamma. Inside the classifying topos \Gamma[\mathbb{T}] there is, from the perspective of \Gamma, a universal model of the theory \mathbb{T}. Because the judgement in the conclusion is interpreted in \Gamma[\mathbb{T}] itself, this needs to be adjusted: the theory \mathbb{T} is pulled back from \Gamma to \Gamma[\mathbb{T}], and this cannot be totally silent in general. Then inside \Gamma[\mathbb{T}], we have a model of this pulled-back theory.
It’s necessary to notate the locations of these explicit weakenings even when the judgement being weakened has no free variables at all. Consider the type \mathbb{N}\to \mathrm{Bool}. This may have a nontrivial intrinsic “topology”, which varies depending on the topos in which it is constructed. Even in the simplest case of the classifier of the theory of an object, the types u ::(X \,\,\mathsf{sort}) \vdash\mathbb{N}\to \mathrm{Bool} \,\,\mathsf{type} and u ::(X \,\,\mathsf{sort}) \vdash(\mathbb{N}\to \mathrm{Bool})\{{\uparrow^{u}}\} \,\,\mathsf{type} are non-isomorphic, even though \mathbb{N}\{{\uparrow^{u}}\} \cong \mathbb{N} and \mathrm{Bool}\{{\uparrow^{u}}\} \cong \mathrm{Bool} individually.
2 Geometric Type Theory a la EMTT
The type theory that I’ve described in the previous section does not depart too far from geometric theories as they are usually understood, besides switching to the propositions-as-types perspective. There is, however, a serious proliferation of additional judgements and operations necessary to get it all to work. I’d like to outline an alternative, closer to Owen Lynch’s EMTT, which was also discussed a little in David Jaz’s previous blog post.
The key insight is that a mixed, type-to-theory notion of \Pi-type is sufficient for many of the features we are looking for, though it takes a bit of work to reconstruct them, and theories written in this style look somewhat different to straight-line style of the last section.
Let’s reset back to plain MLTT as a baseline; we’re back to not having a theory context or a separate language of constructions. The only novel judgements we’ll maintain are the theory-in-context and model-in-context judgements that we added at the end.
\begin{align*} &\Gamma \vdash A \,\,\mathsf{type}&& \quad \text{Type} \\ &\Gamma \vdash a : A && \quad \text{Element} \\ &\Gamma \vdash\mathbb{T}\,\,\mathsf{theory}&& \quad \text{Theory} \\ &\Gamma \vdash m ::\mathbb{T}&& \quad \text{Model} \end{align*}
These still have the associated special context extension, explicit weakening and model substitution, and these latter two still have the special “stuck” behaviour when it comes to non-geometric types.
We’re now going to treat theories and their models much more like an ordinary dependent type theory, albeit a fairly impoverished one. First, we’ll have 1 and \Sigma theories with precisely the rules you would expect (though note the use of model substitution rather than ordinary substitution).
There are then two special base-cases, the first acting as the replacement for the “\Phi \vdash X \,\,\mathsf{sort}” of theories in the previous style. (In this overview I am neglecting universe levels, which would need to be tracked in the “m ::A” judgement.)
Secondly, a new, simpler version of \mathop{\mathrm{\mathbb{A}x}}A.
Finally, the glue that makes this all work: mixed type-theory \Pi-types.
Here the substitution involved is to the ordinary kind, of a term into an ordinary variable. In the non-dependent case, let’s write A \vartriangleright \mathbb{T} to save on notation.
Similar to before, we still need to add additional eliminators to inductive types to eliminate into the new judgements. This time, at least, there is no separate language of constructions that duplicates all the standard type formers.
That’s it! These components have some heavy lifting to do in order to match the features of the previous style. First, the \Sigma-theories are used to replace the “theory context” \mathbb{T} that we were using previously. This is more subtle than it may first appear because, if you inspect the typing of the theory for the second component \mathbb{T}, this is now a theory internal to the classifying topos of the theory \mathbb{S}. In the previous style, the types that could occur in the telescopes present in a theory all came from the fixed ambient context \Gamma. In this style, the types appearing in \mathbb{T} now exist in the classifying topos \Gamma[\mathbb{S}], and so have to be manually weakened from \Gamma if necessary.
Second, we need some way of emulating the construction context \Phi. This relies on the fact we’ve just seen; that each additional field in a theory is now constructed internal to the classifying topos of the theory so far. And so, what was previously a context of constructions written in a restricted language can be simulated using these \vartriangleright-types.
This is a lot to take in, so let’s write out some basic theories in the new style. There are fewer moving parts, but the price we pay is that describing a theory is now more noisy. It’s possible that, eventually, an elaboration algorithm will be able to insert all the necessary conversions between judgements, but for now we will keep things fully explicit.
First, the theory of an object is immediate: we have \mathrm{\mathbb{S}ort}\,\,\mathsf{theory} in the empty context. To add in a point of that sort is a little more involved: First, we describe the theory of “a point of X”, working internal to the topos described by the context X ::\mathrm{\mathbb{S}ort}.
That is, internal to \mathrm{Set}[X], we are looking at the theory with a single axiom, an element of the type for the sort X. In the first line, the model variable rule gives a model of the explicitly weakened \mathrm{\mathbb{S}ort}\{{\uparrow^{X}}\}, but this is equal to \mathrm{\mathbb{S}ort}.
The internal theory we end up with on the last line is exactly of the shape to form a \Sigma-theory, so together our theory of a pointed object in the empty context is: \begin{align*} (X ::\mathrm{\mathbb{S}ort}) \times \mathop{\mathrm{\mathbb{A}x}}\mathop{\mathrm{ty}}X \end{align*}
Let’s add a binary operation. We could do this in a context containing a model of the \Sigma-theory, but it’s a little clearer to instead extend the X ::\mathrm{\mathbb{S}ort} context with the point and proceed from there. To reference X in this extended context, we now have to explicitly weaken X past the point e we’ve just added in.
That is, the use of X is justified by
using the two different kinds of weakening we have available to us. Finally, let’s express that e is a left unit for this multiplication. For this, we’re going to use an axiom of the ordinary equality type, rather than an equality construction as we did previously. Let’s abbreviate the context to save space:
\begin{align*} \Gamma :\equiv X ::\mathrm{\mathbb{S}ort}, e ::\mathop{\mathrm{\mathbb{A}x}}\mathop{\mathrm{ty}}X, m ::{\mathop{\mathrm{ty}}X\{{\uparrow^{}}\}} \vartriangleright{\mathop{\mathrm{ty}}X\{{\uparrow^{}}\}} \vartriangleright\mathop{\mathrm{\mathbb{A}x}}\mathop{\mathrm{ty}}X\{{\uparrow^{}}\} \end{align*}
Then:
Again, we should be able to hide most of this noise in a practical system!
3 Next Steps
There’s a lot about the design of the type theory which is still up in the air. Here are some of the things I’ve been wondering about.
3.1 Ingo’s Synthetic Quasicoherence
The purpose of this whole exercise was to devise a setting in which we could state Ingo’s quasicoherence principle, and this is where the theory starts to hit the rocks.
In the framing that I want to use, his principle gives a universal property to a certain theory living inside any classifying topos u ::\mathbb{T}; the “slice theory” u \!\downarrow\!\mathbb{T}. This is the theory of “u-algebras”, models of \mathbb{T} equipped with a homomorphism from the universal model u. Because this is happening inside the classifying topos u ::\mathbb{T}, we are not working with models of \mathbb{T} exactly; we must explicitly weaken \mathbb{T} into that extended context.
I won’t attempt a general definition of u \!\downarrow\!\mathbb{T} here. It’s up in the air whether this theory should be defined “observationally” in terms of the theory \mathbb{T}, that is, computing by induction on the structure of \mathbb{T}. Or perhaps the quasicoherence principle I’m about to describe will characterise this slice theory completely, and it can be proved inductively u \!\downarrow\!\mathbb{T} is equivalent to the theory of homomorphisms out of u.
Phrased both more syntactically and more naively than in the previous post, the induction principle has the following shape:
In words:
Working internally to the classifying topos for \mathbb{T}, to show a quasicoherent statement about u-algebras it suffices to prove the statement for the identity u-algebra.
This is highly reminiscent of ordinary path-induction; in order to “map out” of the slice theory u \!\downarrow\!\mathbb{T}, it suffices to give the image at the identity algebra.
We’ve used a judgement that ensures that a context telescope is etale, meaning that it only consists of ordinary context extensions and does not contain any model-extensions. The addition of the etale telescope \Delta is necessary to do anything interesting here; the principle holds for any slice of the classifying topos for \mathbb{T}.
With the above rule in hand, we can attempt to replicate one of Ingo’s proofs of an interesting non-geometric feature of a classifying topos: any two elements of the universal object u ::\mathrm{\mathbb{S}ort} are not not equal, where “not” is defined in the ordinary way as \neg X :\equiv X \to \varnothing. (This is only indicative of what the argument will look like; the precise induction rule is not pinned down enough for this to be formal.)
We’ll do this in excruciating detail, with all the judgement-switching made explicit. In search of a contradiction, suppose we have x, y : \mathop{\mathrm{ty}}u so that c : \neg (x = y); that is, our context is \Gamma :\equiv u ::\mathrm{\mathbb{S}ort}, x : \mathop{\mathrm{ty}}u, y : \mathop{\mathrm{ty}}u, c : \neg(x=y).
The slice theory u \!\downarrow\!\mathrm{\mathbb{S}ort} in this case is the theory of a sort v together with, for every element e : \mathop{\mathrm{ty}}u\{{\uparrow^{}}\}, an axiom a_e : v. Such a map is exactly a model of \mathop{\mathrm{ty}}u\{{\uparrow^{}}\} \vartriangleright\mathop{\mathrm{\mathbb{A}x}}\mathop{\mathrm{ty}}v, so the slice theory in total is u \!\downarrow\! \mathrm{\mathbb{S}ort}:\equiv(v :: \mathrm{\mathbb{S}ort}) \times (\mathop{\mathrm{ty}}u\{{\uparrow^{}}\} \vartriangleright\mathop{\mathrm{\mathbb{A}x}} \mathop{\mathrm{ty}}v). There is certainly a model of this theory given by the identity homomorphism, which in this case is the pair (u\{{\uparrow^{}}\}, \lambda e. \mathop{\mathrm{ax}}e).
Working further inside the classifying topos for this h ::u \!\downarrow\!\mathrm{\mathbb{S}ort}, we can form the “motive” theory: \mathbb{C}:\equiv \left(\mathop{\mathrm{unax}}(\mathsf{pr}_2 h)(x\{{\uparrow^{}}\}) = \mathop{\mathrm{unax}}(\mathsf{pr}_2 h)(y\{{\uparrow^{}}\})\right) \vartriangleright\mathop{\mathrm{\mathbb{A}x}}\varnothing. If we substitute the identity model for h yielding a model internal only to u::\mathrm{\mathbb{S}ort}, the theory \mathbb{C}\{{(u\{{\uparrow^{}}\}, \lambda e. \mathop{\mathrm{ax}}e)/h}\} reduces all the way to (x = y) \vartriangleright\mathop{\mathrm{\mathbb{A}x}}\varnothing. We certainly have a model of this theory, built easily from c: (\lambda p. \mathop{\mathrm{ax}}c(p)) : (x = y) \vartriangleright\mathop{\mathrm{\mathbb{A}x}}\varnothing.
By the quasicoherence principle, we have a model of \mathbb{C}\{{m/h}\} for any model m ::u \!\downarrow\!\mathrm{\mathbb{S}ort}. We now choose a specific model to use that solves this problem: the quotient type (\mathop{\mathrm{ty}} u)/(x=y), given the structure of a u-algebra by the function \mathsf{quot} : \mathop{\mathrm{ty}}u \to (\mathop{\mathrm{ty}}u)/(x=y). More precisely, we’re forming the model
\begin{align*} &m ::(v :: \mathrm{\mathbb{S}ort}) \times (\mathop{\mathrm{ty}}u\{{\uparrow^{}}\} \vartriangleright\mathop{\mathrm{\mathbb{A}x}}\mathop{\mathrm{ty}}v) \\ &m :\equiv(\mathop{\mathrm{sort}}((\mathop{\mathrm{ty}}u)/(x=y)), \lambda z. \mathop{\mathrm{ax}}\mathsf{quot}(z)) \end{align*}
Performing the substitution of this model into \mathbb{C}, we see that induction has given us a model of (\mathsf{quot}(x) = \mathsf{quot}(y)) \vartriangleright\mathop{\mathrm{\mathbb{A}x}}\varnothing. Because we do indeed have an element of \mathsf{quot}(x) = \mathsf{quot}(y) in the quotient type (\mathop{\mathrm{ty}}u)/(x=y), this yields a model of \mathop{\mathrm{\mathbb{A}x}}\varnothing, and so an element of \varnothing. Contradiction!
There are a few issues with the rule as presented above. First, one crucial aspect of the induction principle is that it does not apply to arbitrary motives \mathbb{C}, only those that are “quasicoherent” over the context; this is the intended meaning of the \mathsf{theory}^q annotation3. In Ingo’s formulation, this is a restriction on which infinitary coproducts are allowed to appear; in our setting, this corresponds to some restriction on exactly what types are allowed to be used inside \mathop{\mathrm{\mathbb{A}x}}.
Here’s an example of something we’re not supposed to be able to write. Suppose we’re again working directly in the object classifier, so u ::\mathrm{\mathbb{S}ort}, and the slice theory h ::u \!\downarrow\!\mathrm{\mathbb{S}ort} is as above. In the context of u and h we can write down the theory of a section of h: (g :: \mathop{\mathrm{ty}}\mathsf{pr}_1(h) \vartriangleright\mathop{\mathrm{\mathbb{A}x}} \mathop{\mathrm{ty}}u\{{\uparrow^{}}\}) \times \left[(x : \mathop{\mathrm{ty}}\mathsf{pr}_1(h)) \vartriangleright \mathop{\mathrm{\mathbb{A}x}}(\mathop{\mathrm{unax}} g(h\{{\uparrow^{}}\}(x)) = x)\right].
This certainly has a model when h is the identity u-algebra, and so quasicoherent induction applied with this motive would summon a section for any u-algebra. Worse, the existence of a section would pull back to any morphism in any topos. The culprit in this case is the use of the theory \mathop{\mathrm{\mathbb{A}x}}\mathop{\mathrm{ty}}u\{{\uparrow^{}}\}; this corresponds to an infinitary coproduct that is not quasicoherent over the context. We need some syntactic condition that restricts exactly which types we are allowed to apply \mathop{\mathrm{\mathbb{A}x}} to in the motive of the rule, but how to formulate this condition is unclear.
Second: as written, the rule is syntactically unpleasant due to the non-generic shape of the context in the conclusion. The type theorist’s solution is to cut in for the variable u ::\mathbb{T} in the conclusion (and probably the extra telescope \Delta as well). The ordinary justification is that we lose no generality, because if we want to recover the original version of the rule with the variable in the context, we can simply apply the new rule with a context that happens to contain the variable, and use that variable when applying the rule.
Writing out a more familiar example, suppose we have judgements
\begin{align*} &\Gamma, x : A, y : A, p : x = y &&\vdash C \,\,\mathsf{type}\\ &\Gamma, x : A &&\vdash c : C[x/x, x/y, \mathsf{refl}_x/p] \end{align*} and we wish to construct \begin{align*} \Gamma, x : A, y : A, p : x = y \vdash\mathsf{ind}({\dots}) : C \end{align*} using the ordinary path induction rule. When we apply the rule, we are given additional variables in the context to type the motive, and we choose to supply the term p (which happens to be a variable) as the target.
In our setting, substituting in for the model variable u ::\mathbb{T} gets stuck on any non-geometric types used in the motive \mathbb{C}: we would no longer be able to obtain a result “internal to \mathrm{Set}[\mathbb{T}]” directly, only one pulled back from \mathrm{Set}[\mathbb{T}] to an arbitrary model.
This is unsatisfying, but may actually be sufficient in practice. How might we use the resulting model of \mathbb{C}\{{m/u}\}? Well, if \mathbb{C} contains non-positive constructions then we are stuck for now. However, in combination with other models of theories of the form \mathbb{S}\{{m/u}\}, we may be able to prove a positive fact beneath the -\{{m/u}\}. Then \{{m/u}\} is no longer stuck, and we obtain a proof that the fact holds for the model m specifically. It is unclear to me how feasible this kind of reasoning would be to do by hand.
Finally, the rule above is actually underpowered compared to the principle described by Ingo. In its original formulation, the conclusion of the quasicoherence principle is not merely that the quasicoherent statement holds for every u-algebra, but that there is a proof in geometric logic modulo u \!\downarrow\!\mathbb{T} that the statement holds (again, all happening internal to u ::\mathbb{T}).
This distinction is impossible to express in our type theory; we do not have a method of “introspection” that lets us manipulate proofs in this way. When might that additional power come in handy? One situation is when we have a model of u \!\downarrow\!\mathbb{T} not internal to u ::\mathbb{T} itself, but internal to a topos that is internal to u ::\mathbb{T}. Because we have a geometric proof of the statement, that proof can be replayed in this inner topos too; and so ought to hold for models of u \!\downarrow\!\mathbb{T} that live there:
Heaping speculation on speculation, we may be able to allow the motive of the induction to also lie in an inner topos. Temporarily removing the cut for h, the rule then has exactly the same shape as the left-rule for the ordinary identity type:
Allowing a general \Gamma' telescope subsumes the need for the intermediate etale telescope \Delta between u and h: the contents of \Delta can be explicitly weakened and placed into \Gamma' instead.
The rule certainly needs some quasicoherence restriction on the motive \mathbb{C}, but likely also a restriction on the form of the telescopes \Gamma' that may be used. Exactly what is necessary is a question for the semantics, and I am not the best person to answer it unfortunately.
3.2 Converting Between Styles
Converting between the EMTT style and the all-in-one style of theory is not entirely mechanical. Consider the \Sigma-theory for the following internal theory: \begin{align*} X ::\mathrm{\mathbb{S}ort}\vdash(\mathop{\mathrm{ty}}X \to \mathop{\mathrm{ty}}X) \vartriangleright\mathrm{\mathbb{S}ort}\,\,\mathsf{theory} \end{align*} That is, we have an internal theory with one sort for each function X \to X. This should correspond to some geometric theory in the base topos, but which one?
Naturally, (\mathop{\mathrm{ty}}X \to \mathop{\mathrm{ty}}X) is not a geometric construction in the theory X ::\mathrm{\mathbb{S}ort}, and so we cannot write in the old style what would be the natural thing: \begin{align*} &\mathord{\cdot}&&\mathbin{;}\mathord{\cdot}&&\vdash X \,\,\mathsf{sort}\\ &\mathord{\cdot}&&\mathbin{;}f : X \to X &&\vdash Y \,\,\mathsf{sort} \end{align*}
Our only option here is to appeal to quasicoherence. We happen to know, via quasicoherence, the non-geometric fact that inside X :: \mathrm{\mathbb{S}ort} there is an isomorphism (\mathop{\mathrm{ty}}X \to \mathop{\mathrm{ty}}X) \simeq(1 + \mathop{\mathrm{ty}}X). This lets us write the correct all-in-one version of the theory:
\begin{align*} &\mathord{\cdot}&&\mathbin{;}\mathord{\cdot}&&\vdash X \,\,\mathsf{sort}\\ &\mathord{\cdot}&&\mathbin{;}f : 1 + X &&\vdash Y \,\,\mathsf{sort} \end{align*} Applying this transformation seems difficult to do mechanically!
3.3 Should Model Weakening Be Modal?
One nit in both versions of this type theory is that the rules for model substitution have to be tuned by hand to do the right thing; commuting with certain type and theory constructors but not others.
Here’s an alternative that feels quite natural: we treat weakening past a model context extension as a modal operator that must be introduced and eliminated explicitly. Here’s what that might look like.
We can then aim to prove that certain constructions commute with \mathsf{Const}_u internally, if we give them sufficiently powerful “crisp” induction principles that can be applied anywhere in the context. For sums, we have the following:
\begin{align*} &f : \mathsf{Const}_u (A + B) \to \mathsf{Const}_u A + \mathsf{Const}_u B \\ &f(d) := \mathsf{let} \; {\delta\, s} := {d} \, \mathsf{in} \, {\mathsf{case}(s, \mathop{\mathrm{inl}}a \mapsto \mathop{\mathrm{inl}}\delta\, a, \mathop{\mathrm{inr}}b \mapsto \mathop{\mathrm{inr}}\delta\, b)} \\ \end{align*}
As a derivation tree, we are typing the body of the definition by:
Key here is that we can apply +-induction on the term s that is well-formed in the prefix of the context that comes before u ::\mathbb{T}.
Using \mathsf{Const} rather than explicit substitutions to isolate the special nature of geometric constructions feels much more satisfying and uniform than building them into the rules directly. But pushing down this bump in the carpet causes it to pop up elsewhere. First, it introduces a lot of wrapping and unwrapping \mathsf{Const}s, even for the simple theories we’ve seen already.
Additionally, this interferes with the typing of the model variable rule. In context \Gamma :\equiv u ::\mathbb{T}, we should still be able to use the variable u directly, giving a model of some theory. What theory is u a model of? With explicit weakening, this was \mathbb{T}\{{\uparrow^{u}}\}. Now, we have to reach into \mathbb{T} and sprinkle the use of \mathsf{Const}_u everywhere. A symmetric issue happens when applying model substitution as occurs in the rules for \Sigma-theories (and elsewhere, in the future). There, the substitution operation has to strip all uses of \mathsf{Const}_u. I don’t immediately see any blockers to this working out, but it certainly complicates the metatheory.
Finally, this doesn’t help with the model substitution rule, which should get stuck in all the same places as weakening. I have been unable to come up with sensible rules for a modal operator corresponding to model substitution; model substitution doesn’t seem to be “judgementalisable” in the same way that weakening is:
This is not the first time the issue of stuck substitutions has been encountered in dependent type theory. The most similar precursors can be seen in the line of work on guarded dependent type theory, which uses the “later” modality \triangleright to ensure that recursive definitions are productive. There, delayed substitutions are used to give typing to an applicative structure on \triangleright that works on dependent functions, not just non-dependent functions. Unfortunately the remainder of the setting is quite different, and I’m not sure what lessons can be drawn.
Even more tantalising is that one of the intended models of guarded dependent type theory is the topos of trees; the classifying topos of an easily described theory. And so, it is possible that the rules of GDTT will one day fall out of geometric type theory as a particular instance.
3.4 Should Model Weakening Be Multimodal?
Another option is to use the modal frameworks MTT or MATT to produce a type theory, taking the “mode theory” to be the entire category of toposes and (inverse image parts of) geometric morphisms. Finite diagrams of toposes are the intended application of MATT, and so we are only doing something new if we mix in the ability to specify new modes and mode morphisms as we go. But as a first step, let’s write out MATT in a syntax closer to what we’ve already seen above.
Each context is annotated with a theory \Gamma \,\,\mathsf{ctx}_\mathbb{T} specifying the topos in which it lives. We’ll suppose there is a judgement for geometric morphisms \mathbb{T}\vdash\mathbb{S}\,\,\mathsf{theory} (representing the geometric morphism \mathrm{Set}[\mathbb{T}][\mathbb{S}] \to \mathrm{Set}[\mathbb{T}]), with rules to be determined, and which doesn’t interact with the contexts \Gamma at all.
To have the MATT variable rule we need a syntactic notion of 2-cell between geometric morphisms, and if we are presenting morphisms as “theories in context”, it’s difficult to make sense of what this would mean.
The general inverse image modality would look like the following:
With rules along these lines we could probably do some very basic examples, such as the object classifier \cdot \vdash(X \,\,\mathsf{sort}) \,\,\mathsf{theory}. But as written there’s no advantage over just using MATT directly, so I’m not sure this is going anywhere useful.
References
Footnotes
We may want to allow inductive types in \Gamma to eliminate into construction and theory judgements, though this can be simulated by doing the same elimination in the \Delta context of each field in the Theory.↩︎
The \models symbol would be more appropriate here, but doesn’t display so well in this blogging software.↩︎
In Ingo’s note he calls these special motives geometric{}^*. So geometric, but terms and conditions apply.↩︎