Training checkpoints, corpus drops, patent filings, investor news. No spam. You pick the channel.
corpus
The pipeline is awake.
A multi-day corpus-cleaning run is live on the local rig. This is the unglamorous part: language filtering, shard prep, and keeping the training stream stable while the public site stays online.
The point is simple: Buddy is not being improvised from a prompt. The next brain is being built from a curated research corpus: the papers, validation work, public science sources, and the AIIT coherence archive, cleaned before it ever reaches weights.
AskBuddy is live now. The corpus pipeline is still chewing. When this pass clears, the next training and evaluation cycle gets a cleaner floor to stand on.
corpus
472,639,661 tokens. 309GB raw. Still going.
Real numbers, pulled live: 472.6 million tokens processed and sitting in 10 training shards. 309GB of raw corpus on active ingestion. 1.3TB total including LaCie overflow. 572 source files.
This is the corpus Buddy learns from. Not a web scrape — a curated, multi-source pipeline built to deposit the coherence framework directly into the model's weights. arXiv, PubMed, Project Gutenberg, USGS, Federal Register, StackExchange, Wikipedia, and domain-specific sources across physics, biology, consciousness, math, and language.
Pipeline is locked and running. Next milestone: 1 billion tokens clean. After that — RunPod A100 for the next full fine-tune cycle. 🧠
ship
Website is up.
22 hours straight. No sleep. Full build.
Custom domain live. Cancer section up. Weather patent listed. $400K investor page wired to Formspree. Mobile nav, dropdowns, game, footer — all shipped.