The bitemporal truth store that powers CFS implementations. Append-only, graph-native, verifiable. Designed for org reality, built for AI.
Substrate is a database architecture designed for "org reality" - the web of entities, relationships, events, and artifacts that make up how an organization actually works.
Every fact has two time dimensions: valid time (when it was true in reality) and system time (when the system knew it). You can ask "what was true then?" and "what did we believe then?" - separately.
Entities and relationships are first-class. Traversal is indexed. "What depends on this service?" is a fast query, not a join explosion. Property graph model aligned with ISO GQL.
The canonical log never changes. Corrections append new records, they don't rewrite history. Every commit is hash-chained. Merkle proofs verify integrity.
Built for AI from the start. Capability-scoped access. Citations on every claim. Safe write workflows. Minimal disclosure by default.
Substrate separates concerns into three physical planes. The canonical truthlog is the source of all authority; everything else is derived.
Immutable append-only commit log. Hash chain + merkle tree. Signed roots. This is the audit-proof foundation.
LSM-based indexes + columnar snapshots. Fast queries. Fully rebuildable from truthlog if needed.
Content-addressed chunk store. Deduplicated, encrypted, policy-aware. Docs, code, logs, builds, SBOMs.
Key insight: Indexes are never the source of truth. If an index disagrees with the truthlog, the index is wrong and gets rebuilt. This is how you get both performance and correctness.
AS-OF Query the graph at any point in valid-time or system-time. "What did we know on January 5th about what was true in December?"
DIFF "What changed between Monday and Friday?" Fast delta queries using system-time partitioned indexes.
PROVENANCE Every fact traces to an assertion: who asserted it, from what sources, using what derivation. Citations are first-class.
TRAVERSE Efficient 1-hop, n-hop, and pattern matching. Dependency graphs, blast radius, ownership chains - all indexed.
REDACT Cryptographic erasure via key shredding. The audit trail stays intact ("something existed here") but content is unrecoverable.
VERIFY Inclusion proofs for any commit. Consistency proofs for the log. Independent witnesses can verify no history was rewritten.
Substrate optimizes for interactive queries on "system=now" while supporting arbitrary historical access. These are the design targets:
Expand neighbors for a node with degree ≤5,000
Depth-3 traversal with 50k node cap
"What changed in the last 24 hours?"
Sustained facts per second per shard
p99 proof generation for any committed tx
Rebuild indexes from truthlog (parallel)
These targets assume "system=now" queries (the common case). Historical queries may use snapshot lookup + delta application with modest overhead.
Valid-time vs system-time. Query models. How corrections, backfills, and retractions work without rewriting history.
On-disk layout. Write path. Index structures. Interval trees for bitemporal queries. Merkle structures for integrity.
GQL-T temporal extensions. Example queries. Query planning. Guardrails and resource limits.
Zero-trust model. Zanzibar-style authz. Encryption at rest. Audit integrity. Retention and redaction.
Capability tokens. Context bundles. Citations. Safe write workflows. Minimal disclosure.
Substrate is designed for orgs like this: a company with repos, services, tickets, incidents, contracts, and policies. A new leader joins and needs to understand reality fast.
Bitemporal query: valid-time = last month, system-time = now. Returns what was actually true then, with current knowledge.
System-time query. Shows what the org knew at that point, even if later corrections changed the picture.
Diff query + blast radius. Uses system-time deltas and graph traversal to surface relevant changes.
Provenance query. Traces the claim back through assertions, sources, and derivations. Cites every artifact.