The premise of a language designed for the era of AI-written code is interesting. Formal verification at the language level could help with the trust problem — when AI generates code, you need stronger guarantees than tests provide. Curious how this differs from approaches like TLA+ or model checking embedded in the workflow rather than the language itself.
Good question. The main difference is where the spec lives and who enforces it.
TLA+ is a specification language separate from your implementation. TLC model-checks the spec. Whether the code faithfully implements it is on you. Catching a bug in the design is not catching it in the code.
Workflow model checkers like CBMC, KLEE, SPIN run on code but are bolted on. Usually bounded, usually assertion-style, no type-level integration.
Verum puts the refinement into the type itself. Int { self > 0 } is the type, and SMT discharges it during type checking. Same pass. User predicates declared @logic become SMT-LIB define-fun-rec axioms, so the solver reasons about them structurally rather than treating them as oracles. If SMT gives up, 22 named tactics are first-class; if you need cross-verification, @verify(thorough) races Z3 and CVC5 in parallel and treats disagreement as a diagnostic.
Second axis: verification is a gradient. @verify(runtime) through @verify(certified), same source. TLA+ has no gradient. CBMC either discharges an assertion or does not.
Third axis, the one I think is underappreciated: Verum's verification layer is backend-agnostic. Obligations are classified by theory (arrays, LIA, strings, nonlinear) and routed to whichever solver handles that fragment best. When a better solver appears, existing proofs don't break. That is closer to a capability system than a monolithic verifier.
For @verify(certified), the compiler extracts the proof term from the solver's log and embeds it in a .verum-cert archive exportable to Coq, Lean, Dedukti, or Metamath. A downstream consumer can re-validate offline with an orthogonal tool. That is the move TLA+ and CBMC structurally cannot make.
By focusing on mathematical verification, this method aims to solve the trust deficit in mission-critical AI systems, moving beyond probabilistic RAG solutions.
thanks.. I would phrase the pitch differently though. Verum is a systems language, not a method for AI systems. The thing that is actually load-bearing is that verification is a knob (@verify has seven levels, same source promotes from runtime checks to certified proofs without a rewrite), memory safety has three cost-tiers you pick per function, and runtime dependency injection and compile-time metaprogramming share one primitive. RAG is a retrieval technique for LLM prompts and is unrelated. The nearest prior art for the verification gradient is SPARK, not anything AI-adjacent..
The premise of a language designed for the era of AI-written code is interesting. Formal verification at the language level could help with the trust problem — when AI generates code, you need stronger guarantees than tests provide. Curious how this differs from approaches like TLA+ or model checking embedded in the workflow rather than the language itself.
Good question. The main difference is where the spec lives and who enforces it.
TLA+ is a specification language separate from your implementation. TLC model-checks the spec. Whether the code faithfully implements it is on you. Catching a bug in the design is not catching it in the code.
Workflow model checkers like CBMC, KLEE, SPIN run on code but are bolted on. Usually bounded, usually assertion-style, no type-level integration.
Verum puts the refinement into the type itself. Int { self > 0 } is the type, and SMT discharges it during type checking. Same pass. User predicates declared @logic become SMT-LIB define-fun-rec axioms, so the solver reasons about them structurally rather than treating them as oracles. If SMT gives up, 22 named tactics are first-class; if you need cross-verification, @verify(thorough) races Z3 and CVC5 in parallel and treats disagreement as a diagnostic.
Second axis: verification is a gradient. @verify(runtime) through @verify(certified), same source. TLA+ has no gradient. CBMC either discharges an assertion or does not.
Third axis, the one I think is underappreciated: Verum's verification layer is backend-agnostic. Obligations are classified by theory (arrays, LIA, strings, nonlinear) and routed to whichever solver handles that fragment best. When a better solver appears, existing proofs don't break. That is closer to a capability system than a monolithic verifier.
For @verify(certified), the compiler extracts the proof term from the solver's log and embeds it in a .verum-cert archive exportable to Coq, Lean, Dedukti, or Metamath. A downstream consumer can re-validate offline with an orthogonal tool. That is the move TLA+ and CBMC structurally cannot make.
By focusing on mathematical verification, this method aims to solve the trust deficit in mission-critical AI systems, moving beyond probabilistic RAG solutions.
thanks.. I would phrase the pitch differently though. Verum is a systems language, not a method for AI systems. The thing that is actually load-bearing is that verification is a knob (@verify has seven levels, same source promotes from runtime checks to certified proofs without a rewrite), memory safety has three cost-tiers you pick per function, and runtime dependency injection and compile-time metaprogramming share one primitive. RAG is a retrieval technique for LLM prompts and is unrelated. The nearest prior art for the verification gradient is SPARK, not anything AI-adjacent..