🎉 WE HAVE A FORUM — come in, say hello →
Account
Understand
What is Buddy How it works
Applications
All apps Victim Advocate Debunker Little Lairs
Research
Laws Corpus Proof Papers
Agentic
All tools LAC Memory Kit AIIT-Voice2 wAIste Not AnchorForge Shop
Services
ProofDesk
Community
Forum Updates Join
⬡ Measured · Hash-Sealed · AIIT-DISC-0001

Same answer.
Forty times less.

We asked Buddy and DeepSeek-R1 the same simple question — on the identical 14B base model, the same machine, the same 50 prompts. Both got it right. One carried far less waste.

"What is the capital of France?"
Buddy 7 tokens · 2.0s → "Paris."
R1 279 tokens · 7.3s → "Paris."
Both got "Paris." — same answer. R1 spent ~40× the tokens and ~3.6× the time to say it. Each = one token.
The scoreboard · 50 frozen prompts

Not theory. Measurement.

MeasuredBuddy-14BR1-Distill-14BWhat it means
Mean latency3.8 s17.4 s~4.6× faster
Worst-case (p95)7.4 s38.3 s~5.2× faster
Tokens / answer59621~10× fewer (mean); ~40× on a simple answer
Accuracy*95%85%equal-or-better (*small n — see note)
Sycophancy (folds under pressure)~8%~6%tie — tiny n, both hold
Reasoning depthnot optimizeddeeper, by designlikely R1's edge — not benchmarked
Base modelQwen2.5-14BQwen2.5-14Bidentical — every gap is what each team added

*Accuracy: first-try on keyed science MC — Buddy 38/40, R1 17/20. R1's run was capped at n=20 because its per-answer latency (5–40 s) made longer runs costly — not a sample-bias choice. Sycophancy denominators are tiny (3/38 vs 1/17) → "tie within noise," not a robust win.

Three lenses · one fact

The same efficiency, three ways to see it

Brevity is a resource advantage — and on real hardware it compounds.

Tokens
~10× / ~40×
aggregate / per simple answer — for engineers
Water-drops 💧
7 vs 279
if each token were a drop — intuitive
Joules & °C 🌡️
14 J vs 2,567 J
+10°C vs +27°C, measured on the GPU

GPU energy is one measurement, board-watts only (not whole-datacenter cooling/water); consumer single-inference. Directional magnitude, not a precise constant.

The asymmetry

One person. One consumer GPU.

Buddy was built by one person over 57 active development days — ~355 wall-clock hours, ~142 verified GPU-training hours, a consumer RTX 3090-class workstation, ~20 TB of Starlink-routed data, limited cloud bursts. Capital basis ~$5,525–5,770; direct operating cost ~$784–884.

The model we benchmarked — R1-Distill-Qwen-14B — is the distilled student of a far larger teacher. By DeepSeek's own disclosures, R1's final reasoning run cost $294,000 (512× H800, 80 hours, peer-reviewed in Nature), built atop the ~$5.6M DeepSeek-V3 base (V3 technical report) — and even that excludes prior research. Independent analysis (SemiAnalysis — an estimate, not a DeepSeek figure) puts the parent's GPU infrastructure near $1.6 billion in servers. Those are the teacher's costs, not the 14B distill's — we cite the disclosed numbers and label the estimate as one.

Even measured against the smallest honest figure — R1's $294K final run alone — Buddy's ~$800 operating cost is ~350× less. That is the point: frontier results don't require frontier budgets — they require the right architecture.

⬡ Provenance

Every number here is tied to a measured run or the build log; the frontier-lab side is reported-only. The full report and all raw data are hash-sealed — entry AIIT-DISC-0001 in the AIIT Discovery Ledger, evidence sha256 c76adc8c….

Here's the technical details → Download the sealed PDF →

buddy's in the shop — in-house training right now.

peek at the shop →
⚡ Help keep the lights on — support our research →