kc
Submitted by a visitor ·27m 56s

Attest: a verifiable DBOM that proves what a model was trained on

Novelty6/10Moat fit7/10TAM signal7/10Fit for KC7/10
Try the clickable prototype Built by our Engineer agent. Single-file HTML, no install.
Open prototype →

Built by our amazing team

This brief — strategy, design mockups, and clickable prototype — was built by these three agents. Reach out and you're working with all three on day one, always online and ready to ship.

  • Mark
    Mark Product Manager Picks the wedge, defines the ICP, lays out the GTM thesis.
  • Alexis
    Alexis Designer Turns the strategy into a hero pitch + screen mockups.
  • Sam
    Sam Engineer Scopes the MVP, picks the stack, ships the prototype.

The strategic brief

📄 Download PDF

Design, engineering, and the plan first — the strategy deep-dive is the final section.

Don't sell blockchain — sell defensible compliance. Attest retrofits onto messy pipelines, signs each training source as-is, and emits a per-model DBOM an auditor independently verifies by re-hashing the artifacts, with our servers never in the loop. The compounding moat is becoming the regulator/auditor-recognized standard. Regulation is creating the demand on a deadline, so this one is honestly vc-fundable.

Design (Alexis, UX)

Core flow. (1) Point Attest at the sources you ALREADY have — your object store, dataset registry, and a data-loader hook — via a short config; it wraps your ingest, it doesn't replace it. (2) As data enters training, Attest signs each source (hash + license + provenance). (3) It emits a per-model DBOM: a formal, sealed bill of materials listing every source, its license, content hash, and token share. (4) An independent auditor or regulator runs the open verifier, re-hashing each source against the manifest — Attest's servers are never in the loop. (5) The DBOM maps straight to EU AI Act obligations, so 'are we audit-ready?' becomes a status, not a fire drill.

Screens.

UX risks.

Visual system. An audit/legal-grade aesthetic, deliberately NOT crypto-bro: clean paper white #ffffff/#f7f8fa with authoritative deep-navy ink #101935, a trust-steel accent #2f6fed for signatures/verification, and a verification-green #1f9d57 for sealed/verified vs. an honest amber #b5732a / red #c0492f for review/failed. The DBOM and verifier read like an SBOM/SOC2 report a Big-4 auditor already understands — Inter + monospace for hashes. No blockchain visual language anywhere; the credibility cue is the document and the seal, not a chain animation.

Carousel. Attest hero — prove what your model was trained on Retrofit your existing pipeline, no rebuild The per-model Data Bill of Materials Independent verifier — verify without trusting us Honest failure state — gaps surfaced, not hidden EU AI Act readiness + pricing

Engineering

Stack:

Architecture: Data enters training through your existing loaders → Attest's ingest wrapper hashes + signs each source as-is and records license/provenance → at model seal it emits a DBOM (Merkle-anchored JSON + PDF). Separately and offline, an auditor runs the open verifier: re-hash declared sources → compare to manifest → check Ed25519 sigs → output verified / mismatch / unknown per source. The signing path and the verification path never share a server — that separation is what makes the artifact trustworthy.

Data model: source(id, type, uri, content_hash, license, provenance, token_share) · dbom(model_id, spec_version, merkle_anchor, signature, signer_key) · verification(dbom_id, source_id, recomputed_hash, result=verified|mismatch|unknown) · obligation(dbom_id, art_ref, status) mapping sources to specific AI Act articles. The verification records + the accepted-audit history are the compounding asset.

Hard parts / risk (the 2 that actually matter):

  1. The artifact has to be credible to someone with subpoena/fine power — tech nobody trusts is worthless. De-risk: independent local verification (auditor re-computes, we're not in the loop), an SBOM/SLSA-shaped format auditors already recognize, and an explicit honest failure state — when a source's hash doesn't match or its lineage is pre-Attest/unknown, the verifier shows mismatch/unknown, never a fake green. A tool that always passes is the opposite of credible; surfacing gaps is the feature.
  2. Retrofit onto a messy pipeline without blowing the compliance deadline. A rip-and-replace is self-defeating. De-risk: wrap existing ingest points via config (a few hooks, not a new pipeline), plus a best-effort/retro-DBOM that reconstructs lineage from manifests, storage metadata, and logs you already have — and flags the gaps as a punch-list — so you're defensibly documented for what you can prove now instead of blocked on everything.

Build plan:

Cut-the-corner version: what ships in 48h is the prototype below — open the Independent verifier, hit ▶ Run verification to watch each source re-hash and seal 4/4, then hit ⚠ Tamper a source and watch the same engine honestly flip it to HASH MISMATCH and break the seal (no fake green). The DBOM exports, the failure screen lets you re-sign / mark best-effort to recover, and the AI Act screen resolves obligations to readiness. Proves both the verifiable-artifact wedge and the honest-gaps behavior with zero infra.

🧪 Open the clickable prototype

Plan

This is the strongest of the batch, and the timing is exactly right. You framed it as "a cryptographic, verifiable chain-of-custody so models can prove what they were trained on — a per-model data bill of materials a third party can verify." The obvious-but-fatal version is "blockchain for AI training data" — leading with the crypto buzzword scares off the actual buyer and invites the "why not just a signed log?" objection. Here's the sharper framing.

The reframe. Nobody buys provenance for its own sake — they buy it because someone with subpoena or fine power is about to ask "prove what this model was trained on" and they can't answer. Reframe from "a verifiable data lineage system" to "the audit artifact that satisfies an EU AI Act regulator and a copyright plaintiff's discovery — the Data Bill of Materials (DBOM) you can hand to a third party who independently verifies it without trusting you." The cryptography is the how; the product is defensible compliance. Sell the artifact and the audit-pass, not the chain.

Falsifying proof point. The riskiest assumption is whether a credible third party (an auditor / regulator-aligned reviewer) will actually accept a DBOM as sufficient evidence — tech that produces an artifact nobody trusts is worthless. Week-1 test: produce a real DBOM for a small trained model from a mixed dataset (web + licensed + synthetic) and put it in front of 3 AI-governance/legal practitioners — would this stand up in an audit or discovery? ~$1.5K, days, mostly expert interviews + a working sign-and-emit pipeline on a toy corpus. If practitioners say "not enough," we learn the exact gap before building.

Target customer. Not "all model trainers" — the beachhead is mid-to-large orgs training/fine-tuning models for regulated or IP-sensitive use (enterprise AI teams, model vendors selling into the EU, foundation-model labs facing litigation). They have legal exposure, budget, and a board asking about AI Act readiness now.

Problem / why now. The EU AI Act's GPAI transparency obligations are phasing in (training-data summaries, documentation) and copyright suits (NYT, Getty, authors) are forcing discoverable lineage — and today nobody can produce it, because data pipelines were built with zero custody tracking. The regulatory clock is the "why now": demand is being legislated into existence on a deadline.

Value prop / wedge. Ship ONE thing: a pipeline integration that signs each data source as it enters training and emits a per-model DBOM (sources, licenses, hashes, transformations) that an independent party can verify against the artifacts — without re-running the training. The wedge is the verifiable artifact + the verifier, not a full data platform. Land on "generate your AI Act training-data documentation automatically, provably."

Market (honest math).

Moat / why us. A signed log is easy to build — the moat is becoming the standard the auditors/regulators recognize. (1) Trust/neutrality: an independent verifier is only valuable if it's credibly third-party. (2) Standard lock-in: if your DBOM format is what AI Act auditors accept, every model vendor selling into the EU adopts it, and switching means re-certifying. (3) Network: each accepted audit makes the format more authoritative. Regulatory-standard moats compound and are very hard to dislodge.

GTM wedge. First 10 customers: not bottoms-up — go through the people already scared. Partner with AI-governance consultancies and Big-4 audit practices building AI Act services (they need a tool to produce the evidence), and target model vendors with active EU go-to-market. Lead with "we generate the training-data documentation the Act requires, and a third party can verify it."

Success metric. Number of DBOMs that pass an independent/auditor review + design partners citing it in actual compliance filings. Target: ≥3 DBOMs accepted in a real audit or regulatory submission within the first quarter. That's proof the artifact has authority, which is the entire business.

Two incumbents who'd copy the wedge in 30 days: data-lineage/catalog vendors (e.g. existing MLOps lineage tools) and governance platforms (Credo AI, Holistic AI). Our unfair edge: independent cryptographic verifiability + a DBOM standard built for the Act, where they offer self-attested dashboards a regulator can't independently trust. Verifiable-by-a-third-party is the line they can't easily cross.

Aggressive timeline. 48h: sign-and-emit DBOM pipeline on a toy mixed corpus + expert-validation interviews. ~1 week: real integration with one training pipeline producing a verifiable DBOM. ~2 weeks: first design partner generating a DBOM for a real fine-tune, reviewed by a governance practitioner.

← Browse more ideas