Coframe in 30 Minutes¶
An orientation for practitioners and newcomers. Reads in 25–35 minutes; meant to take you from "what is this?" to "I see how the pieces fit and where to read next." If you'd rather see Coframe actually run on a dataset, the Demo is the place to go.
1. What you'd experience¶
Coframe is a query layer. The fastest way to understand it is to see what a user (an analyst, a BI tool, or an AI agent) experiences when they interact with it. Three short scenarios.
Scenario A: A query that just works¶
You ask:
Coframe returns:
region | year | revenue | coherence_posture | notes
------------|------|---------------|----------------------------|-------------------
east | 2026 | $14,221,308 | unconditional_within_scope | resolved via store_monthly_summary; siblings agree (I10 ✓)
midwest | 2026 | $8,997,015 | unconditional_within_scope | resolved via store_monthly_summary; siblings agree (I10 ✓)
south | 2026 | $11,560,442 | unconditional_within_scope | resolved via store_monthly_summary; siblings agree (I10 ✓)
west | 2026 | $16,089,773 | unconditional_within_scope | resolved via store_monthly_summary; siblings agree (I10 ✓)
You did not write a JOIN. You did not write a GROUP BY. You did not say which physical schema to read from. Coframe's resolver:
- Identified the family
revenue. - Walked the FD-DAG from
regionandyearto find a schema whose grain is reachable. - Selected
store_monthly_summary(cheaper than aggregatingtransactionsfrom scratch). - Applied
SUMautomatically becauserevenue's identity-preserving reducer isSUM. - Verified that the result is identical to what you'd get from
transactionsaggregated up (the Multi-Table Invariance theorem; pre-checked per-DNA-edge at AC validation time). - Annotated the result with how it was produced.
Scenario B: A query that's refused, with a structured reason¶
You ask:
Coframe responds:
DubiousQueryError: family 'revenue' has cousins.
Found two columns named 'revenue' in the AC:
• transactions.revenue — family-root: transactions.revenue (anchored at transaction)
• returns.revenue — family-root: returns.revenue (anchored at return_event)
These share a family-name but have different family-roots. They are
structurally independent observations; Coframe cannot resolve them
to a single answer without explicit disambiguation.
To resolve, choose one:
(a) Qualify the reference:
SUM(transactions.revenue) BY region WHERE region = 'midwest'
(b) Restrict via FROM:
SUM(revenue) FROM transactions BY region WHERE region = 'midwest'
(c) Restate intent (if you want both):
SUM(transactions.revenue) - SUM(returns.revenue) AS net_revenue ...
A semantic layer would have silently picked one of them — likely whichever happened to be defined first. Coframe refuses. It's the same outcome as a compiler refusing to compile ambiguous code: better to fail loudly than to produce a number that looks plausible.
Scenario C: A query an AI agent constructs¶
An agent receives the user request "give me peak weekly revenue per region last quarter, and tell me which stores had any failed transactions." It looks up Coframe's MCP server's family list, finds:
revenue(additive, ip_reducer = SUM, has sibling at store-week grain)transaction_failed(boolean, ip_reducer = BOOL_OR)- AC-dimensions:
region,store,week,quarter,year
It constructs:
WITH weekly AS (
SUM(revenue) AS weekly_revenue
BY (region, week)
WHERE quarter = 'Q1-2026'
)
SELECT
region,
MAX(weekly_revenue) AS peak_weekly_revenue
FROM weekly
BY region;
SELECT
store,
BOOL_OR(transaction_failed) AS had_any_failure
WHERE quarter = 'Q1-2026'
BY store
HAVING had_any_failure;
Coframe resolves both. The agent doesn't have to reason about joins, group-by cardinality, or null handling. Its job was to express what — at what grain, with what filter. The framework's job was to execute how, correctly. The two queries return verified answers, and the agent composes them into its reply.
This is the architectural alignment: the structural commitments that make Coframe rigorous for human-authored ACs are exactly the structural commitments that make agent-mediated analytics trustworthy.
2. The three primitives¶
Coframe's grammar layer rests on three primitives. Once you have these, everything else is consequence.
- Entity. What an observation is about. Customers, stores, transactions, dates. Entities are the things your data identifies. In Coframe vocabulary, every column's
E(c, S)field names the entities the column is anchored to. - Family. What's being observed about those entities. Revenue, count, status, name. A family is a conceptual quantity; many columns may belong to the same family at different anchorings (e.g., revenue at transaction-grain and revenue at store-month-grain).
- Operator. How observations relate. SUM, MONTH_OF, BUCKET. Operators take a predecessor observation and produce a successor — with well-defined relationships between their entity-anchorings and their family-membership.
These three are sufficient. Every structural rule, every integrity check, every query-resolution decision in Coframe can be expressed in terms of how entities, families, and operators relate. The framework's commitment is that you can't have structured analytical observation with fewer than these three; you don't need more.
Two consequences worth pulling out now:
- Names are family-names. Two columns sharing a name are either siblings (same family-name, same family-root — the same conceptual quantity at different anchors, navigable via aggregation) or cousins (same family-name, different family-roots — observationally distinct, refused as dubious if ambiguously referenced). Coframe verifies this from the AC's metric genealogy; it doesn't trust names blindly.
- Operators carry algebraic properties. SUM is partition-invariant (you can apply it at coarser grains and get the same answer as if you'd aggregated from scratch). AVG is not. The operator catalog records this and the framework reasons over it — refusing to navigate across grains via name-preserving aggregation for non-partition-invariant operators.
3. The Analytics Collection¶
The Analytics Collection (AC) is the authoring artifact. An AC is a YAML file (and the loaded Python object) that declares:
- A collection of schemas, each binding to one backend table or view.
- For each schema, a list of ColumnSpecs, one per column the AC author chose to include.
- An AC-level
name_mapmapping logical names (what queries reference) to physical names (what's in the backend). - Other AC-level metadata: descriptions, scope declarations, optional naming function.
Each ColumnSpec carries the column's structural commitment: its data type, its anchoring (E), its missingness signature (M), its operator and derivation history (op and dna), and its family-name.
Important asymmetry: the AC is essential for authors and effectively invisible to consumers. An author commits to an AC's scope (which columns are included, what they're called, what structural facts hold). A consumer just queries by family-name and gets a correct answer. The AC is the framework's foundation but not its user-facing surface.
The same backend data can support multiple ACs — a finance AC and a marketing AC over the same transactions table, each exposing different columns under different names. ACs are deliberate authoring artifacts, not auto-generated reflections of the warehouse.
4. Verification levels¶
Coframe Core publishes each AC's rigor posture as an ordinal verification level: A, AA, or AAA. The levels are monotonically stronger and a consumer (human, BI tool, AI agent) branches on them when deciding how much to trust a result.
| Level | What's verified | Practical meaning |
|---|---|---|
| A | Structural well-formedness. The AC's metadata is internally consistent. | Free; any well-formed AC achieves this automatically. |
| AA | Level A + every dimensional structural commitment is grounded — either attested against data (functional dependencies actually hold) or established by construction (operator catalog semantics make them true). | Most existing semantic-layer products effectively claim this. |
| AAA | Level AA + every metric coherence commitment is grounded. Pre-aggregation drift is verified absent on data-attested edges. Multi-Table Invariance is unconditional within scope. | This is the rigor level Coframe Core's defaults are designed to make achievable. |
The level is computed deterministically from the AC's grounding map (which records, per integrity-condition edge, how it was verified). It propagates through MCP onto query results via the coherence_posture field. AI agents can — and should — branch on it when deciding how much to trust a result.
Levels are informational in v1.0 and become stable surface in v1.x. The high-level commitment to three monotonically-stronger levels is firm; specific edge-case classifications may refine after field experience.
5. The architecture, in one page¶
Five Python packages plus a meta-distribution:
┌─────────────────┐
│ coframe-mcp │ MCP server: query + nl_query for LLM clients
└────────┬────────┘
│ (backend-blind; discovers backends via entry points)
▼
┌────────────────────────────────┐
│ coframe-core │
│ • AC loading / validation │
│ • Integrity I0–I10 │
│ • Frame-QL parse + resolve │
│ • coframe.dialogue (NL→QL) │
└────────┬───────────────────────┘
▲
│
┌────────┴───────────┐
│ coframe-connect │ Backend protocol + source bindings + entry-point discovery
└────────┬───────────┘
▲
┌────────────────┼────────────────┐
│ │
┌─────────┴────────────┐ ┌─────────┴────────────┐
│ coframe-polars │ │ coframe-duckdb │ Reference execution backends + authoring
│ + .author CLI │ │ + .author CLI │
└──────────────────────┘ └──────────────────────┘
The meta-distribution coframe is what you install:
pip install coframe[polars] # foundation + Polars backend
pip install coframe[mcp] # foundation + MCP server
pip install coframe[all] # everything
Three things worth knowing about this architecture:
- The
Backendprotocol is open. Any execution engine that can host (name, entity) data-series and respond to operators can be a Coframe backend. Polars and DuckDB are the reference implementations; Snowflake, BigQuery, and arbitrary engines plug in via the same protocol. - There are two AI surfaces, deliberately separated.
coframe.dialoguetranslates natural language to Frame-QL — it sees logical names, no data. The authoring assistance inside each backend's.authortoolchain helps with naming and FD-candidate review — it sees physical names and data. Different privilege, different purpose. - What's queryable is exactly what the AC author exposed. A backend table with 300 columns may produce an AC exposing 30. The other 270 are outside scope — not invisible by accident, invisible by deliberate authoring choice.
6. What to read next¶
This page is the surface. Three deeper docs, picked based on what you need:
- If you want the conceptual argument for why this matters → Article. Practitioner-oriented prose, ~30 minutes. Carries the thesis with more breathing room than this page.
- If you want the specification — every primitive, every integrity condition, every operator → Manual. Reference document. Read Chapter 2 (Foundations) end-to-end on your first pass; everything else is for lookup.
- If you want the engineering design — package structure, build phasing, public surfaces → Platform Design. For implementers and contributors.
For the bigger story — the AI-agent-as-consumer thesis, the (entity, family, operator) triple's relationship to a possible future family of frameworks, the track separation that keeps Coframe focused while the bigger ideas develop — see the Vision summary.
And when the Demo is wired up, that's where Coframe stops being prose and starts being something you can actually watch run.