Stay in the loop

Updates.

Training checkpoints, corpus drops, patent filings, investor news. No spam. You pick the channel.

April 27, 2026 corpus

The pipeline is awake.

A multi-day corpus-cleaning run is live on the local rig. This is the unglamorous part: language filtering, shard prep, and keeping the training stream stable while the public site stays online.

The point is simple: Buddy is not being improvised from a prompt. The next brain is being built from a curated research corpus: the papers, validation work, public science sources, and the AIIT coherence archive, cleaned before it ever reaches weights.

AskBuddy is live now. The corpus pipeline is still chewing. When this pass clears, the next training and evaluation cycle gets a cleaner floor to stand on.

April 17, 2026 corpus

472,639,661 tokens. 309GB raw. Still going.

Real numbers, pulled live: 472.6 million tokens processed and sitting in 10 training shards. 309GB of raw corpus on active ingestion. 1.3TB total including LaCie overflow. 572 source files.

This is the corpus Buddy learns from. Not a web scrape — a curated, multi-source pipeline built to deposit the coherence framework directly into the model's weights. arXiv, PubMed, Project Gutenberg, USGS, Federal Register, StackExchange, Wikipedia, and domain-specific sources across physics, biology, consciousness, math, and language.

Pipeline is locked and running. Next milestone: 1 billion tokens clean. After that — RunPod A100 for the next full fine-tune cycle. 🧠

April 17, 2026 ship

Website is up.

22 hours straight. No sleep. Full build.

Custom domain live. Cancer section up. Weather patent listed. $400K investor page wired to Formspree. Mobile nav, dropdowns, game, footer — all shipped.

DATA NOMNOMNOM. 🫘

Subscribe ↓