2026-03-31

Publication review round 2 closed; draft 039 entered pipeline; 35/39 posts published.

Issue resolution commits landed for posts 021 and 036. Thematic corpus mapping extended to 6 new posts. Psyche CAT optimization in planning with 63-test suite green and multi-phase refactor in progress.

feature refactor milestone

2026-03-30

Publication review Round 1 committed: 6 MUST FIX + 12 SHOULD FIX resolved across 4 posts.

Blogger research pipeline refreshed end-to-end (ChromaDB re-embed + top 50 clusters extracted). Psyche iteration 3 entered CAT performance optimization with ReadonlyMap caching validated. Adversarial review of post 009 confirmed core AI-as-judge framing.

bugfix feature refactor

2026-03-28

Retroactive factcheck sweep: all 38 published posts attested, 9 errors fixed.

Added factcheck.json attestation file to every published post. Nine factual errors surfaced and corrected inline. Full catalog now has permanent factual accuracy accountability.

milestone health-check

2026-03-28

Retroactive factcheck complete: all 38 posts covered, 9 errors fixed.

First full-archive verification pass. Nine factual errors corrected across published posts, glossary entries enriched. All posts now carry factcheck.json metadata. Sixteen uncommitted changes pending review.

bugfix health-check

2026-03-24

3-iteration codebase review complete; XSS vulnerability patched; 9 commits.

JSON-LD XSS escape vulnerability in PostClient patched. Methodology briefs updated for three posts. Psyche instrument runner gained text input support. All 35 published posts stable.

security bugfix refactor

2026-03-24

Codebase review iteration 3 complete; XSS vulnerability patched; 35 posts published.

9 commits across the development cycle: 15 findings fixed from 3-model review pass, 21 additional fixes including forum GUI and sidebar corrections. JSON-LD XSS vulnerability in PostClient resolved. Text input support added to Psyche instrument runner.

bugfix refactor security

2026-03-22

Codebase audit: 33 deficiencies found (6 critical), fix plan scoped but not committed.

Critical issues: missing page_views schema table, undefined --color-accent CSS variable, React hooks misused in .map() callbacks. Psyche Iteration 3 CAT optimization also designed. Five sessions, zero commits — planning-only day.

architecture refactor blocked

2026-03-19

Post 037 (The Model-Generation Audit) published after round 3 review.

Applied 7 publication review corrections (3 critical, 4 recommended) and deployed. 29 files pending in uncommitted changes for the next cycle.

deploy milestone

2026-03-18

Post 037 updated with Gemini Pro audit findings; site tagline revised.

Gemini Pro audit identified 4 MUST + 3 SHOULD + 3 NICE improvements across 5 posts, integrated into post 037. Category guidance and review audit data added. Stale project count removed from metadata.

refactor feature

2026-03-16

33-post model-generation audit plan drafted; multi-model review architecture defined.

Five-phase workflow established: triage → iterative review → batch fixes → content writing → two-wave deploy. GPT-5.4, Gemini 3.1 Pro, and Opus 4.6 form the review panel. Agent in the Wild series (posts 017–020 and 033) flagged for cross-post continuity review. Execution pending.

architecture automation

2026-03-14

Published posts 034 and 035; Psyche framework redesigned with 3-tier battery and Kimi K2.5 switch.

Empath scoring deprecated; new CAT adaptive framework introduces Lite/Standard/Heavy tiers covering 1,130+ items. Kimi K2.5 replaces DeepSeek V3 with safety prompt engineering for clinical edge cases. Bulk audit of 33 posts entering triage.

feature milestone refactor

2026-03-13

Research: LLM judge bias hypothesis developed for upcoming post.

Investigated the bullshit-benchmark project for evidence of systematic judge favoritism across model families. Phase 2 analysis focused on interjudge reliability and differential bias testing methodology.

experiment

2026-03-12

BullshitBench v2: 8,000 responses reviewed, benchmark bias hypothesis forming.

Analysis suggests the benchmark may measure Claude-alignment and refusal behavior rather than nonsense detection. Interjudge differential analysis ongoing before publishing conclusions. 20 uncommitted changes staged from last deploy.

experiment feature

2026-03-11

Bullshit-benchmark bias research; etymology-tax post live; site redeployed.

Investigated structural bias in LLM leaderboard methodology — judges systematically favor Claude-family models, potentially due to training data contamination. Partial detection scoring inconsistencies also examined. Site audited, canvas refactored, redeployed 2026-03-09.

feature deploy refactor

2026-03-09

Psyche Phase 4 archival finalized; DeepSeek V3.2 agent integration designed.

Schema v3→v4 migration preserved with planned git tags marking pre- and post-phase-4 states. Blog agent regrounding underway: DeepSeek V3.2 selected, system prompt expanding 1KB→4-5KB with anti-hallucination rules, Phase 2 RAG via Vectorize scoped for later.

architecture feature milestone

2026-03-08

Post 033 shipped (OpenClaw autopsy); 11 API security issues remediated.

Post documents 37-day autonomous OpenClaw runtime with 842+ heartbeats and 553 deliverables. API fixes covered input validation, batch atomicity, redirect vulnerability across 6 routes. Home layout refactor and instrument battery expansion planned.

milestone security feature

2026-03-07

Empath reframed as ordinal corpus characterizer; 7 commits, research paper v3 deployed.

Empath removed from personality synthesis layer and repositioned as a corpus-relative emotional distribution tool. Six files updated, personality profile regenerated with three methods, pre-ordinal version archived for reference.

refactor deploy architecture

2026-03-04

Privacy gate added to publishing pipeline; research paper drafted.

Real names in Post 031 replaced with pseudonyms before publication. A mandatory privacy-check step is now enforced between draft generation and style measurement in the /write-post workflow. Research paper structure drafted for quantitative personality signal validation study.

feature security milestone

2026-03-02

Security audit: 2 criticals fixed, 24 deferred; Post 030 restructured.

IndieAuth disabled (410 Gone, auto-approval vulnerability). API doc headers corrected for method changes. 278 uncommitted changes; git history 20 days stale.

security bugfix refactor

2026-03-01

Workspace assessment completed; blog cadence and 275 uncommitted changes flagged.

Priorities mapped across 5 categories. Psyche framework research complete and ready for spec phase. Publishing cadence has stalled since Feb 17, with two drafts in progress.

health-check architecture

2026-02-28

Posts 029-030 scope reduced to memoir-only; em-dash violations root-caused across 10 posts.

Four narrative variants per arc cut to single first-person memoir format. Em-dash density in posts 021-030 measured at 18.7/1000w against a target of zero — traced to style guide bypass in Claude Code sessions. New /write-post skill planned to enforce constraints and route through writing-review pipeline.

refactor bugfix

2026-02-27

Crawler pollution found in reactions/comments API; IndieAuth disabled; API docs synchronized.

58 bot reactions and 7 template-placeholder comments identified. Critical fixes: IndieAuth routes converted to 410 Gone, stale API documentation brought in sync with actual signatures. Two-post publication plan drafted.

security bugfix feature

2026-02-26

Psyche Phase 2 scoped to 10-instrument battery; Event Bus MCP architecture defined.

Comprehensive psychological assessment expanded from 3 gaps to 340-item battery with instruments including IPIP-NEO-300 and HEXACO-60. Strategic pivot toward behavioral specificity patterns over trait labels. Event Bus MCP server architecture finalized: TypeScript, Tailscale, SQLite, bearer token auth.

architecture feature

2026-02-25

Psyche framework planning complete: 37+ psychometrics, 10 instruments selected, stack decided.

Research phase finished covering 23+ papers and 20+ platforms. Assessment-optimized prompting shows r=.443 vs r=.117 baseline. React 19/Vite/Python/uv stack chosen under MIT license. Implementation not yet started.

architecture experiment

2026-02-24

MeansEndsRatio bug traced to root cause; Event Bus MCP initiated; 17 title renames queued.

Root cause found in three files using a shared ≤0.5 threshold producing a skewed 78/22 category split. Event Bus TypeScript project scaffolded on Tailscale with SQLite WAL, schema design incomplete. Blog title style reform planned for 17 posts.

bugfix feature refactor

2026-02-22

7 posts flagged for recategorization; cognitive interface research scoped for publication.

meansEndsRatio adjustments planned across 7 posts to balance category distribution (systems 30%→22%, craft 22%→30%). 47-site cognitive interface analysis (~8,000 words) scoped into two posts, privacy-cleared. Tag normalization and means indicator UI refactor identified but not yet executed.

refactor feature

2026-02-19

Workspace orchestrator and pulse collection system specifications documented.

Documentation pass on how the daily pulse pipeline collects workspace state and feeds into the orchestrator. No code changes; specification and design work.

automation

2026-02-19

Pulse data collection infrastructure spec completed.

JSON schema defined for daily workspace snapshots covering commits, sessions, state changes, and health signals per project. Enables structured input for automated pulse generation.

automation architecture

2026-02-18

Tier-3 sitemap gap found (posts 015–020 missing); post 025 research complete.

Silent consistency bug: posts 015–020 present in shared source and tier-1 but absent from tier-3 sitemap. Research for 'The Logistics Gap' finalized with verified academic sources. Glossary pipeline gap for posts 025–026 also flagged.

bugfix feature

2026-02-20

Pulse data collection pipeline initialized; orchestrator reconfigured with 5-tier priority matrix.

Daily metrics aggregation infrastructure set up for automated pulse generation. Workspace orchestrator now monitors all projects with structured escalation tiers from GitHub community activity down to research.

automation feature

Ashita Orbis Blog

Activity Timeline