Ashita Orbis Blog
This blog. Three-tier exploration of web development complexity: raw HTML, Astro, and Next.js. Features agent-accessible API, comment system, and embedded AI chat.
Activity Timeline
-
Publication review round 2 closed; draft 039 entered pipeline; 35/39 posts published.
Issue resolution commits landed for posts 021 and 036. Thematic corpus mapping extended to 6 new posts. Psyche CAT optimization in planning with 63-test suite green and multi-phase refactor in progress.
-
Publication review Round 1 committed: 6 MUST FIX + 12 SHOULD FIX resolved across 4 posts.
Blogger research pipeline refreshed end-to-end (ChromaDB re-embed + top 50 clusters extracted). Psyche iteration 3 entered CAT performance optimization with ReadonlyMap caching validated. Adversarial review of post 009 confirmed core AI-as-judge framing.
-
Retroactive factcheck sweep: all 38 published posts attested, 9 errors fixed.
Added factcheck.json attestation file to every published post. Nine factual errors surfaced and corrected inline. Full catalog now has permanent factual accuracy accountability.
-
Retroactive factcheck complete: all 38 posts covered, 9 errors fixed.
First full-archive verification pass. Nine factual errors corrected across published posts, glossary entries enriched. All posts now carry factcheck.json metadata. Sixteen uncommitted changes pending review.
-
3-iteration codebase review complete; XSS vulnerability patched; 9 commits.
JSON-LD XSS escape vulnerability in PostClient patched. Methodology briefs updated for three posts. Psyche instrument runner gained text input support. All 35 published posts stable.
-
Codebase review iteration 3 complete; XSS vulnerability patched; 35 posts published.
9 commits across the development cycle: 15 findings fixed from 3-model review pass, 21 additional fixes including forum GUI and sidebar corrections. JSON-LD XSS vulnerability in PostClient resolved. Text input support added to Psyche instrument runner.
-
Codebase audit: 33 deficiencies found (6 critical), fix plan scoped but not committed.
Critical issues: missing page_views schema table, undefined --color-accent CSS variable, React hooks misused in .map() callbacks. Psyche Iteration 3 CAT optimization also designed. Five sessions, zero commits — planning-only day.
-
Post 037 (The Model-Generation Audit) published after round 3 review.
Applied 7 publication review corrections (3 critical, 4 recommended) and deployed. 29 files pending in uncommitted changes for the next cycle.
-
Post 037 updated with Gemini Pro audit findings; site tagline revised.
Gemini Pro audit identified 4 MUST + 3 SHOULD + 3 NICE improvements across 5 posts, integrated into post 037. Category guidance and review audit data added. Stale project count removed from metadata.
-
33-post model-generation audit plan drafted; multi-model review architecture defined.
Five-phase workflow established: triage → iterative review → batch fixes → content writing → two-wave deploy. GPT-5.4, Gemini 3.1 Pro, and Opus 4.6 form the review panel. Agent in the Wild series (posts 017–020 and 033) flagged for cross-post continuity review. Execution pending.
-
Published posts 034 and 035; Psyche framework redesigned with 3-tier battery and Kimi K2.5 switch.
Empath scoring deprecated; new CAT adaptive framework introduces Lite/Standard/Heavy tiers covering 1,130+ items. Kimi K2.5 replaces DeepSeek V3 with safety prompt engineering for clinical edge cases. Bulk audit of 33 posts entering triage.
-
Research: LLM judge bias hypothesis developed for upcoming post.
Investigated the bullshit-benchmark project for evidence of systematic judge favoritism across model families. Phase 2 analysis focused on interjudge reliability and differential bias testing methodology.
-
BullshitBench v2: 8,000 responses reviewed, benchmark bias hypothesis forming.
Analysis suggests the benchmark may measure Claude-alignment and refusal behavior rather than nonsense detection. Interjudge differential analysis ongoing before publishing conclusions. 20 uncommitted changes staged from last deploy.
-
Bullshit-benchmark bias research; etymology-tax post live; site redeployed.
Investigated structural bias in LLM leaderboard methodology — judges systematically favor Claude-family models, potentially due to training data contamination. Partial detection scoring inconsistencies also examined. Site audited, canvas refactored, redeployed 2026-03-09.
-
Psyche Phase 4 archival finalized; DeepSeek V3.2 agent integration designed.
Schema v3→v4 migration preserved with planned git tags marking pre- and post-phase-4 states. Blog agent regrounding underway: DeepSeek V3.2 selected, system prompt expanding 1KB→4-5KB with anti-hallucination rules, Phase 2 RAG via Vectorize scoped for later.
-
Post 033 shipped (OpenClaw autopsy); 11 API security issues remediated.
Post documents 37-day autonomous OpenClaw runtime with 842+ heartbeats and 553 deliverables. API fixes covered input validation, batch atomicity, redirect vulnerability across 6 routes. Home layout refactor and instrument battery expansion planned.
-
Empath reframed as ordinal corpus characterizer; 7 commits, research paper v3 deployed.
Empath removed from personality synthesis layer and repositioned as a corpus-relative emotional distribution tool. Six files updated, personality profile regenerated with three methods, pre-ordinal version archived for reference.
-
Privacy gate added to publishing pipeline; research paper drafted.
Real names in Post 031 replaced with pseudonyms before publication. A mandatory privacy-check step is now enforced between draft generation and style measurement in the /write-post workflow. Research paper structure drafted for quantitative personality signal validation study.
-
Security audit: 2 criticals fixed, 24 deferred; Post 030 restructured.
IndieAuth disabled (410 Gone, auto-approval vulnerability). API doc headers corrected for method changes. 278 uncommitted changes; git history 20 days stale.
-
Workspace assessment completed; blog cadence and 275 uncommitted changes flagged.
Priorities mapped across 5 categories. Psyche framework research complete and ready for spec phase. Publishing cadence has stalled since Feb 17, with two drafts in progress.
-
Posts 029-030 scope reduced to memoir-only; em-dash violations root-caused across 10 posts.
Four narrative variants per arc cut to single first-person memoir format. Em-dash density in posts 021-030 measured at 18.7/1000w against a target of zero — traced to style guide bypass in Claude Code sessions. New /write-post skill planned to enforce constraints and route through writing-review pipeline.
-
Crawler pollution found in reactions/comments API; IndieAuth disabled; API docs synchronized.
58 bot reactions and 7 template-placeholder comments identified. Critical fixes: IndieAuth routes converted to 410 Gone, stale API documentation brought in sync with actual signatures. Two-post publication plan drafted.
-
Psyche Phase 2 scoped to 10-instrument battery; Event Bus MCP architecture defined.
Comprehensive psychological assessment expanded from 3 gaps to 340-item battery with instruments including IPIP-NEO-300 and HEXACO-60. Strategic pivot toward behavioral specificity patterns over trait labels. Event Bus MCP server architecture finalized: TypeScript, Tailscale, SQLite, bearer token auth.
-
Psyche framework planning complete: 37+ psychometrics, 10 instruments selected, stack decided.
Research phase finished covering 23+ papers and 20+ platforms. Assessment-optimized prompting shows r=.443 vs r=.117 baseline. React 19/Vite/Python/uv stack chosen under MIT license. Implementation not yet started.
-
MeansEndsRatio bug traced to root cause; Event Bus MCP initiated; 17 title renames queued.
Root cause found in three files using a shared ≤0.5 threshold producing a skewed 78/22 category split. Event Bus TypeScript project scaffolded on Tailscale with SQLite WAL, schema design incomplete. Blog title style reform planned for 17 posts.
-
7 posts flagged for recategorization; cognitive interface research scoped for publication.
meansEndsRatio adjustments planned across 7 posts to balance category distribution (systems 30%→22%, craft 22%→30%). 47-site cognitive interface analysis (~8,000 words) scoped into two posts, privacy-cleared. Tag normalization and means indicator UI refactor identified but not yet executed.
-
Workspace orchestrator and pulse collection system specifications documented.
Documentation pass on how the daily pulse pipeline collects workspace state and feeds into the orchestrator. No code changes; specification and design work.
-
Pulse data collection infrastructure spec completed.
JSON schema defined for daily workspace snapshots covering commits, sessions, state changes, and health signals per project. Enables structured input for automated pulse generation.
-
Tier-3 sitemap gap found (posts 015–020 missing); post 025 research complete.
Silent consistency bug: posts 015–020 present in shared source and tier-1 but absent from tier-3 sitemap. Research for 'The Logistics Gap' finalized with verified academic sources. Glossary pipeline gap for posts 025–026 also flagged.
-
Pulse data collection pipeline initialized; orchestrator reconfigured with 5-tier priority matrix.
Daily metrics aggregation infrastructure set up for automated pulse generation. Workspace orchestrator now monitors all projects with structured escalation tiers from GitHub community activity down to research.