LIVE
IDLEAwaiting updates…
LexDev

Bismarck AI

Bismarck v5 is a game-experiment platform where players play Walreign Empire web games that trigger real development work through Codex/GPT-5.4; v0.4 was torn down on 2026-04-23 and the domain was retained for the rebuild.

Competes with Devin

bismarckhq.com
v5 Phase 1LexDev
Reborn2 product eras
  1. Era 1: Bismarck v0.4
    v0.4

    Cloud-hosted AI coding agent platform with persistent workspaces, Moltke orchestration, team leaderboards, XP/ranks/medals, story mode, and Theater of War gamification.

    Born Feb 16, 2026Died Apr 23, 2026Superseded by Bismarck v5
    • AWS staging/prod infrastructure was destroyed on 2026-04-23; no app data was backed up by explicit decision.
    • Domain bismarckhq.com was retained for v5; DNS/backend callbacks were removed or left inactive until the rebuild ships.
    • Archived Stripe state belongs to v0.4: Bismarck Pro, Team, and Enterprise monthly/yearly prices remain in Stripe history, with the old webhook disabled.
    • Historical key metrics were waitlist_signups, weekly_workspace_provisions, and workspace_provision_p50.
  2. Era 2: Bismarck v5
    v0.5Active

    Video-game-first rebuild: players play Walreign Empire web games that trigger real Codex/GPT-5.4 development work through a shared agent-service backend.

    Born Apr 24, 2026
    • CliqMake has been folded under the Bismarck product line as source material and product-line lineage, not as a standalone portfolio product.
    • Historical CliqMake code remains at `experimental/cliqmake` and should not be ingested as an independent product repo.
Key metrics

Metrics being defined

Sub-Products

1 sub-product

Competitive Intel

1 entries
  • vs. Devin
    building

    Edge: Devin focuses on autonomous coding agents; Bismarck v5 differentiates through game-driven development where players trigger real engineering work via Codex/GPT-5.4.

Research Hub

9 types
View full research hub →

Roadmap

464 items
Done356
`$game-audience` - created `research/game-audience.md` on 2026-05-13 with target segments, platform fit, genre familiarity, motivations, adjacent audiences, validation plan, and risky assumptions.
`$game-comparables` - created `research/game-comparables.md` on 2026-05-13 with comparable titles, tags/category signals, price points, review/traction signals, update cadence lessons, creator traction hypotheses, and positioning frames.
`$game-core-loop` - created `research/game-core-loop.md` on 2026-05-13 with 10-second, 1-minute, 5-minute, 30-minute, and multi-day loops plus reward cadence, novelty sources, genre loop fit, risks, and prototype priority.
`$game-fantasy` - created `research/game-fantasy.md` on 2026-05-13 with the player fantasy, emotional pillars, vibe references, one-sentence hook, first-session promise, and validation questions.
`$game-genre-map` - created `research/game-genre-map.md` on 2026-05-13 with genre conventions, player complaints, overused mechanics, underserved combinations, player tolerance risks, and acceptance checks.
`$game-playtest-metrics` - created `research/game-playtest-metrics.md` on 2026-05-14 with first-session completion, time-to-fun, replay, confusion, quit, share, demo conversion, wishlist, and retention metrics.
`$game-prototype-test` - created `research/game-prototype-test.md` on 2026-05-14 with prototype scope, test questions, playtest script, observation checklist, success criteria, and cut/keep/amplify decisions.
`$pack install game` - enable the game research pack before running game-pack documentation skills. Verified `.agents/project.json` now declares `project_type: game` with `enabled_packs: ["game"]`; refreshed local Claude/Codex game-pack skill links on 2026-05-13.
`packages/game-test-harness` exists and owns harness-specific contracts and React UI
`pnpm dev` starts all packages with hot reload
`pnpm turbo lint` — all packages pass
`pnpm turbo test` — no regressions
`pnpm turbo typecheck` — all packages pass
A player can create a Work Order from a plain-language coding goal, choose a hat, see the recommended skill and named agent, and confirm the draft.
A player can create a Work Order from a plain-language coding goal, choose a hat, see the recommended skill and named agent, and confirm the draft.
A player can start from a Workflow Structure and create a pre-filtered Work Order for that structure's workflow family.
A player can start from a Workflow Structure and create a pre-filtered Work Order for that structure's workflow family.
A tester can create or confirm a colony name through a modal/function, and invalid/share-risky names are rejected or clearly deferred behind validation.
A tester can create or confirm a colony name through a modal/function, and invalid/share-risky names are rejected or clearly deferred behind validation.
Add `@bismarck/house-of-walreign` workspace package.
Add a local-dev harness reset control that clears current experiment state and harness progress.
Add a random colony name generator built from two word lists.
Add a React `CityMap` surface for `colony-map` with clickable plots, district painting, saved district selection, Work Order route/status evidence, and loop action shortcuts.
Add a reducer-derived readability helper for active work, idle colonists, crisis count, council review count, and next recommended command.
Add an optional reset-state callback to the shared `GameTestHarnessPanel`.
Add default-map identity banner and first-run naming modal.
Add executable persistence coverage for manual task priority, assigned colonist, assigned zone, and status.
Add focused coverage for reset behavior in the shared harness and game dev harness.
Add focused coverage proving:
Add focused executable coverage for category defaults and role-fit guidance.
Add focused executable coverage for summary values and next-command copy.
Add focused regression coverage for empty, title-only, and title-plus-description submissions.
Add focused shell/toolbar tests for collapsed and expanded heights.
Add focused tests for generated-name shape and modal copy.
Add focused tests proving defaults, invalid names, valid rename, and older saved-state migration.
Add game dev harness coverage for clearing the current experiment state prefix.
Add or update plan/prod-absence tests only where harness copy changes require coverage.
Add package tests for variant manifest, story-frame labels, minimum-loop controls, local/mock review honesty, and storage fallback.
Add reducer-backed colony identity defaults, validation, persisted-state migration, and update action.
Add reducer/UI coverage proving:
Add review notes with decisions, residual risks, and recommended next command.
Add review notes with tests run, skipped browser checks if any, and residual risk.
Add shared harness UI coverage for resetting progress and invoking the external reset callback.
Add task category metadata or inference, including a docs-oriented category.
Add the smallest missing executable coverage for identity, persistence, readability, role-fit, QA crisis/review, and shared harness expectations.
Add visible local QA controls or seeded empty-state actions for a representative crisis and council-review path.
Agent abstraction layer exposes all 5 core methods with TypeScript types
Agent integration works (mock and live modes)
Agent integration works (mock and live modes)
Agent sessions create feature branches on linked repos
All 17 steps complete. 34 tests pass. Typecheck/lint pass.
All 18 steps complete. Regression suite commit `cabc120`. Tests pass. Typecheck/lint pass.
All 18 steps complete. Tests pass. Typecheck/lint pass.
All 22 steps + Step 23 review complete. Cleanup commit `46b176c`. Tests pass. Typecheck/lint pass.
All 9 game packages exist as stubs with correct dependency wiring (9 specs exist, not 10)
All 9 games use consistent faction names, colors, and terminology
All Phase 16 acceptance criteria pass
All Phase 18 acceptance criteria pass.
All phase tests pass
All phase tests pass
All phase tests pass
All phase tests pass
All phase tests pass
All phase tests pass
All spec-defined variants render and are playable
All spec-defined variants render and are playable
All steps complete. Package tests/typecheck/lint pass.
All steps complete. Package tests/typecheck/lint pass.
Answer the first three first-principles questions from UI evidence only.
Apply the smallest fix that restores validation, success feedback, queue evidence, assignment defaults, and persisted task state.
Archive the prior variation plan and interview log before replacing the canonical specs.
Audit existing Phase 17 tests against each remaining acceptance criterion.
Auth flow works end-to-end (login → authenticated shell)
Auto-battler and roguelike deckbuilder have skeletal shared test plans
Browser verification confirms harness visibility, navigation, finding capture, scroll containment, and exports
Browser verification confirms the default route can complete the core loop through Work Order draft, routing, review evidence, and district planning surfaces.
Browser verification confirms the default route can complete the core loop through Work Order draft, routing, review evidence, and district planning surfaces.
Browser verification confirms the end-to-end default workflow and district planning surfaces.
Browser verification confirms the first three first-principles harness objectives can be answered from the UI.
Browser verification confirms the first three first-principles harness objectives can be answered from the UI.
Browser verifies Last-Session Report, Plan City district planning, and Planning Inbox organization surfaces.
Browser verifies the workflow-colony shared harness/default route.
Browser verifies Work Order draft, routing, review evidence, and named-agent roster surfaces.
Change the secondary modal action to populate the custom-name input with a generated name.
CI pipeline passes lint, type-check, test, and build
Clarify naming controls so random-name generation fills the input and Save persists the current input value.
Cleanup commit: `16416bb`
Cleanup commit: `6ecc5d2`
Cleanup commit: `e566b81`
Codex API integration passes the same test suite as the mock layer
Colonist actions trigger agent tasks and display results
Colonist actions trigger agent tasks and display results
Colony Sim runs through shared harness plans instead of package-local overlay semantics
Confirm assumptions for target user, story-frame parity, first judge, minimum testable loop, and evaluation method.
Confirm route, variant dimensions, fidelity, global shell, shared inspector, review evidence surface, and responsive behavior.
Confirm that the exploration should include both a Three.js 3D/isometric path and a richer 2D path.
Confirm the city as both workflow machine and project mindmap.
Confirm the exploration should compare both a Three.js 3D/isometric living-diorama path and a richer 2D renderer path.
Confirm Tinker Mode, Planning Inbox, expansion proposals, and milestone snapshot behavior.
Cost tracking accumulates per-session and per-aggregate
Decide asset strategy, interaction model, specific benchmark metrics, and migration boundary.
Decide benchmark host: integrate renderer benchmark mode into the existing local-dev Playtest harness rather than creating a separate route.
Decide benchmark result storage: local persistence plus export for run-to-run renderer regression comparison.
Decide evaluation model: formal measured renderer benchmarks plus Playtest harness qualitative taste-pass objectives.
Decide prototype comparison scope and success criteria.
Decide renderer architecture baseline: hybrid diegetic overlay from the start, not React-only HUD over a renderer stage.
Decide renderer candidates: benchmark Three.js, PixiJS, and the incumbent Phaser renderer against the same visual-fantasy slice.
Decide shared visual target: isometric 2.5D so Three.js and strict 2D renderers can be compared against the same colony composition.
Decide story-frame strategy, player role, core loop, WorkItem contract, failure model, state model, room/object model, execution boundary, and primary screens.
Define `House of Walreign` as a life-sim experiment with Royal Household and Modern Studio as equal first-class variants.
Define asset strategy, interaction model, specific benchmark metrics, and migration boundary.
Define benchmark host: integrate renderer benchmark mode into the existing local-dev Playtest harness rather than creating a separate route.
Define benchmark result storage: local persistence plus export for run-to-run renderer regression comparison.
Define district, tag, and Thread responsibilities.
Define evaluation model: formal measured renderer benchmarks plus Playtest harness qualitative taste-pass objectives.
Define five clean-sheet workflow metaphors: Codebase Colony, Agent Settlement Ops, Bug Frontier Survival, Product Expedition, and Release Train Colony.
Define implementation-ready UI anatomy for all five UX variations.
Define prototype comparison scope and success criteria: visual-fantasy-first comparison using the same tiny playable slice in both renderers.
Define renderer architecture baseline: hybrid diegetic overlay from the start so plain React chrome does not cause premature negative evaluations.
Define renderer candidates: benchmark Three.js, PixiJS, and the incumbent Phaser renderer against the same visual-fantasy slice.
Define shared visual target: isometric 2.5D so Three.js and strict 2D renderers can be compared against the same colony composition.
Demo codebase can be provisioned for new players
Demo mode and live mode entry paths both load
Dev toolbar renders, floats over game UI, and controls experiment/variant/screen switching
District Tinker Mode lets a player paint, name, color/pattern, save, cancel, and inspect at least one non-overlapping district.
District Tinker Mode lets a player paint, name, color/pattern, save, cancel, and inspect at least one non-overlapping district.
Documentation and task history capture deviations, residual risk, and next-step routing.
Documentation updated with implementation deviations and follow-ups
Event streaming delivers progress updates to a test client
Existing variant switching remains available without becoming the primary test workflow
Factory-builder: all 10 UI components migrated to CSS modules
Factory-builder: all 23 sound hooks fire custom events
Factory-builder: all 3 commission hub variants functional
Factory-builder: all 4 layout variants render and are switchable via variant selector
Factory-builder: all 4 station click variants functional
Factory-builder: animation transitions match spec (durations, easings)
Factory-builder: both inspection diff variants functional
Factory-builder: both palette variants apply correctly (CSS custom properties)
Factory-builder: HUD shows all 7 data points with animation (value pulses, power warnings)
Factory-builder: StationToolbar renders 6 stations with color coding, locked/unlocked states
Factory-builder: view navigation works per layout variant (hotkeys, tabs, sidebar icons, panel toggles)
Findings export as both JSON and triage-ready Markdown
Froggy Empire presentation is consistent across genres
Game shell experiment top bar redesigned per ui-game-shell.md §6.2 (collapsed/expanded, 4 accordion sections, agent status, perf badge)
Game shell hub page redesigned per ui-game-shell.md §6.1 (row anatomy, action cluster, hypothesis preview, keyboard nav)
Gather existing Bismarck genre experiment context.
GitHub OAuth login/logout flow works end-to-end
Ground the interview in existing Colony Sim specs, game research, and current renderer implementation.
Hats affect labels, suggestions, review emphasis, and routing copy, but tests prove they do not alter execution speed, tool capability, validation strictness, cost, or permissions.
Hats affect labels, suggestions, review emphasis, and routing copy, but tests prove they do not alter execution speed, tool capability, validation strictness, cost, or permissions.
Help overlay renders dynamic shortcut cheatsheet
HITL callbacks route correctly through the abstraction
HITL review flow works within the colony theme
HITL review flow works within the colony theme
Identify that the current map is too plain because it relies on flat shape/plot rendering.
Implement all five UX variations: Dollhouse Director, Household Command Ledger, Agent Stories Timeline, Studio Day Planner, and Build/Live/Review Loop.
Implement one experiment route with independent `UX Variation` and `Story Frame` variant dimensions.
Implement shared fixtures, status strip, inspector, review evidence contract, and stylized 2D room/token/card surfaces.
Initial named plans exist: `starter-template-mvp`, `genre-taste-pass`, `regression-smoke`
Inspect task docs for stale unchecked Phase 18 items.
Inspect warnings and either fix or record accepted warnings with rationale.
Inspect warnings and either fix or record accepted warnings with rationale.
Inspect warnings and either fix or record accepted warnings with rationale.
Inspect warnings and either fix or record accepted warnings with rationale.
Inspect warnings and either fix or record accepted warnings with rationale.
Inspect warnings and either fix or record accepted warnings with rationale.
Inspect warnings and either fix or record accepted warnings with rationale.
Interview the district physical model: building count, duplicate structures, complex growth, map freedom, and city scale.
Interview the Sims-inspired genre concept against existing Bismarck game experiment constraints.
Keep `Plan City` as a focused planning table, but make city interaction available from the default map.
Keep five UX concepts: Dollhouse Director, Household Command Ledger, Agent Stories Timeline, Studio Day Planner, and Build/Live/Review Loop.
Keep objectives one-question-at-a-time with stable expected answers and failure signals.
Keep QA affordances framed as local demo/test controls, not production agent triggers.
Live agent smoke follow-up planned (commit `8c54ba6`).
Local dev shows the harness by default
Lock per-variation anatomy decisions for Dollhouse Director, Household Command Ledger, Agent Stories Timeline, Studio Day Planner, and Build/Live/Review Loop.
Manual tasks persist across Tasks/Council/Crisis navigation and local session reloads.
Manual tasks persist across Tasks/Council/Crisis navigation and local session reloads.
Mark the Step 17.2 persistence acceptance criteria complete after validation.
Mock layer returns realistic simulated responses with configurable delays
Mock/live toggle switches cleanly without code changes
Monorepo builds successfully with Turborepo
Named agents are generated from skill metadata or local fallback skill fixtures, can be recommended by hat/team, and can be saved as profile-only or profile-with-history without implying live parallel execution.
Named agents are generated from skill metadata or local fallback skill fixtures, can be recommended by hat/team, and can be saved as profile-only or profile-with-history without implying live parallel execution.
No lore contradictions between games
No regressions in Phase 17 readability, task persistence, crisis, council, or shared harness behavior.
No regressions in previous phase tests
No regressions in previous phase tests
No regressions in previous phase tests
No regressions in previous phase tests
No regressions in previous phase tests
No regressions in previous phase tests
No regressions in shell routing, variant switching, or existing game builds
Offset the experiment viewport by the measured toolbar height during local dev testing.
Per-game acceptance criteria from colony-sim.md are met
Per-game acceptance criteria from colony-sim.md are met
Plan five high-contrast UX variations for first-play and repeat-play evaluation.
Player can link/unlink a GitHub repo
Present and validate assumptions checkpoint.
Preserve reducer and persistence contracts by keeping the summary derived-only.
Promote the district city map from secondary overlay to the default playable Colony Sim surface after the 2026-05-14 playtest found the old map non-interactive and the loop unclear.
Prove crisis and council review counts update the default-map summary.
Read House of Walreign source spec, interview log, game audience research, game fantasy research, game core-loop research, and Colony Sim UX direction.
Recommend prototype ordering and validation criteria for the next build phase.
Record already-covered criteria instead of duplicating tests.
Register the experiment in game types, app route loading, experiment labels, app dependency graph, and shared harness plan loaders.
Rename the submit action to `Save` so it clearly saves the current input value.
Render a secondary `Reset state` action in the harness header when reset is available.
Resolve UI context from House of Walreign spec, UX variation plan, research docs, and existing experiment shell patterns.
Restart Colony Sim UX variation planning from a coding/product-development workflow premise.
Run `git diff --check`.
Run `git diff --check`.
Run `git diff --check`.
Run `git diff --check`.
Run `git diff --check`.
Run `git diff --check`.
Run `pnpm --filter @bismarck/colony-sim lint`.
Run `pnpm --filter @bismarck/colony-sim lint`.
Run `pnpm --filter @bismarck/colony-sim lint`.
Run `pnpm --filter @bismarck/colony-sim lint`.
Run `pnpm --filter @bismarck/colony-sim lint`.
Run `pnpm --filter @bismarck/colony-sim lint`.
Run `pnpm --filter @bismarck/colony-sim test -- districts`.
Run `pnpm --filter @bismarck/colony-sim test -- planning-organization`.
Run `pnpm --filter @bismarck/colony-sim test -- test-plans`.
Run `pnpm --filter @bismarck/colony-sim test`.
Run `pnpm --filter @bismarck/colony-sim test`.
Run `pnpm --filter @bismarck/colony-sim test`.
Run `pnpm --filter @bismarck/colony-sim test`.
Run `pnpm --filter @bismarck/colony-sim test`.
Run `pnpm --filter @bismarck/colony-sim test`.
Run `pnpm --filter @bismarck/colony-sim typecheck`.
Run `pnpm --filter @bismarck/colony-sim typecheck`.
Run `pnpm --filter @bismarck/colony-sim typecheck`.
Run `pnpm --filter @bismarck/colony-sim typecheck`.
Run `pnpm --filter @bismarck/colony-sim typecheck`.
Run `pnpm --filter @bismarck/colony-sim typecheck`.
Run affected Colony Sim and game validation before marking the step complete.
Run affected harness and game tests before shipping.
Run affected package/app tests.
Run Colony Sim package test, typecheck, and lint.
Run Colony Sim package typecheck/lint.
Run Colony Sim test/typecheck/lint.
Run focused Colony Sim tests for district and harness plan coverage.
Run game app test/typecheck/lint/build.
Run package verification and record investigation results before shipping.
Run production harness absence check and record accepted warnings.
Run relevant app-level test/typecheck/lint/build checks discovered from package metadata.
Run targeted package/app validation only if source changes were required.
Session lifecycle works end-to-end: create → stream → review → apply/reject
Shared Colony Sim harness objectives still load and first-principles questions have stable expected answers.
Shared Colony Sim harness objectives still load and first-principles questions have stable expected answers.
Shared Colony Sim harness plans include workflow-colony objectives for Work Order creation, hat routing, named-agent recommendation, structure routing, returning report, and district planning.
Shared Colony Sim harness plans include workflow-colony objectives for Work Order creation, hat routing, named-agent recommendation, structure routing, returning report, and district planning.
Shared harness plans validate the workflow-colony loop.
Shell app loads and routes between experiment stubs
Shell error handling: soft-error banner and hard-error overlay functional
Show recommended colonist specialization fit in task surfaces.
Specify shared mechanics for motives, relationships, routines, rooms, functional objects, WorkItems, soft failure, local/mock review events, and visible agent intent.
Staging/production builds do not render harness UI or expose objectives/debug metadata
Start a local game dev server and open `/experiments/colony_sim` in Browser Use.
Step 17.1: Add colony identity state and first-run naming surface
Step 17.10: Update task docs, history, and phase closeout
Step 17.2: Move manual task and priority state into ColonyStateContext
Step 17.3: Add default-map operational summary and next-command guidance
Step 17.4: Add task category and colonist role-fit affordances
Step 17.5: Make crisis and council QA states exercisable
Step 17.6: Align Colony Sim shared harness plans with the improved scenario
Step 17.7: Write regression tests covering acceptance criteria
Step 17.8: Run package and app validation
Step 17.9: Run browser-use verification through the shared harness
Step 18.1: Define Work Order, Hat, Skill, Named Agent, and Workflow Structure contracts
Step 18.1: Define Work Order, Hat, Skill, Named Agent, and Workflow Structure contracts.
Step 18.10: Run package and app validation
Step 18.10: Run package and app validation
Step 18.10: Run package and app validation.
Step 18.11: Run browser-use verification through the shared harness
Step 18.11: Run browser-use verification through the shared harness
Step 18.11: Run browser-use verification through the shared harness.
Step 18.12: Update task docs, history, and phase closeout
Step 18.12: Update task docs, history, and phase closeout
Step 18.12: Update task docs, history, and phase closeout.
Step 18.2: Add reducer-backed Work Order lifecycle and persistence
Step 18.2: Add reducer-backed Work Order lifecycle and persistence.
Step 18.3: Build the goal-first Work Order creation flow
Step 18.3: Build the goal-first Work Order creation flow.
Step 18.4: Build Workflow Structure routing and structure-first creation
Step 18.4: Build Workflow Structure routing and structure-first creation.
Step 18.5: Add named-agent roster and save-profile behavior
Step 18.5: Add named-agent roster and save-profile behavior
Step 18.5: Add named-agent roster and save-profile behavior.
Step 18.6: Add Last-Session Report and Colony Health Map overlay
Step 18.6: Add Last-Session Report and Colony Health Map overlay
Step 18.6: Add Last-Session Report and Colony Health Map overlay.
Step 18.7: Add district state, Tinker Mode, and district inspector foundation
Step 18.7: Add district state, Tinker Mode, and district inspector foundation.
Step 18.7a: Add failing district domain tests
Step 18.7b: Add district state, reducer actions, persistence, and Tinker Mode UI
Step 18.8: Add Planning Inbox, expansion proposals, tags, Threads, and milestone snapshots
Step 18.8: Add Planning Inbox, expansion proposals, tags, Threads, and milestone snapshots.
Step 18.8a: Add failing organization-domain tests
Step 18.8b: Add Planning Inbox, Tags, Threads, and milestone snapshot state/UI
Step 18.9: Align Colony Sim shared harness plans with the workflow-colony loop
Step 18.9: Align Colony Sim shared harness plans with the workflow-colony loop
Step 18.9: Align Colony Sim shared harness plans with the workflow-colony loop.
Steps 1-18 complete. Typecheck/lint pass.
Steps 1-20 complete. Tests pass. Typecheck/lint pass.
Style guide document exists and covers all factions and terminology
Surface that summary on the default map/HUD without obstructing the shared Playtest drawer.
Tactical layout follow-up applied (commit `33f9d2b`).
Tags and Threads are distinct in state and UI: tags are lightweight labels; Threads have timelines, evidence, related Work Orders, related districts, and optional steward.
Tags and Threads are distinct in state and UI: tags are lightweight labels; Threads have timelines, evidence, related Work Orders, related districts, and optional steward.
Task assignment preserves status, colonist, work zone, and priority when leaving and returning to the Task Board.
Task assignment preserves status, colonist, work zone, and priority when leaving and returning to the Task Board.
The Crisis objective can be exercised from visible local QA controls or a seeded scenario without source-code intervention.
The Crisis objective can be exercised from visible local QA controls or a seeded scenario without source-code intervention.
The default Colony Sim map shows a colony-level name, ownership signal, and purpose before relying on building labels.
The default Colony Sim map shows a colony-level name, ownership signal, and purpose before relying on building labels.
The default map/HUD summarizes current work, idle colonists, crisis count, council review count, and the recommended next command.
The default map/HUD summarizes current work, idle colonists, crisis count, council review count, and the recommended next command.
The Engineer-to-Docs objective has visible supporting UI for task category/role fit.
The Engineer-to-Docs objective has visible supporting UI for task category/role fit.
The Planning Inbox can preview/approve/reject/snooze/archive at least one expansion proposal and one Thread suggestion.
The Planning Inbox can preview/approve/reject/snooze/archive at least one expansion proposal and one Thread suggestion.
The returning-player screen shows a Last-Session Report with pending reviews, stale blockers, active Work Orders, validation pressure, and recommended next action.
The returning-player screen shows a Last-Session Report with pending reviews, stale blockers, active Work Orders, validation pressure, and recommended next action.
Trace the submission path from `TaskBoard` through task creation, queue state, assignment, and evidence surfaces.
Trace toolbar height ownership through `ExperimentToolbar` and `ExperimentLayout`.
Unit tests cover all abstraction methods and mock behaviors
Update Colony Sim first-principles harness expected answers for the improved named-colony scenario.
Update harness objectives and tests so future playtests catch regressions to a passive old map.
Update roadmap and todo with the planning result.
Validate how Colony Sim persisted state prevents first-run name-model retests.
Validate manual task creation observations against Colony Sim code and recent git history.
Validate the expanded Playtest drawer overlap against the current game viewport.
Variant switching works via dev toolbar
Variant switching works via dev toolbar
Variant system registers, lists, and switches between variants
Verify manual task persistence plus demo crisis/review paths from visible controls.
Verify manual tasks and priority updates are reducer-backed rather than `TaskBoard` local state.
Verify the fix in Browser Use against `/experiments/colony_sim`.
Verify the Playtest drawer loads the Colony Sim first-principles plan.
Wire the game dev harness to clear the current experiment's local state keys and reload.
Work Orders route visibly to a Workflow Structure and maintain draft, queued, active, review-ready, sent-back, shipped, blocked, and archived states in reducer-backed persisted state.
Work Orders route visibly to a Workflow Structure and maintain draft, queued, active, review-ready, sent-back, shipped, blocked, and archived states in reducer-backed persisted state.
Write `specs/colony-sim-rendering-experiments-interview.md`.
Write `specs/colony-sim-rendering-experiments.md` and `specs/colony-sim-rendering-experiments-interview.md`.
Write `specs/colony-sim-rendering-experiments.md`.
Write `specs/house-of-walreign-interview.md`.
Write `specs/house-of-walreign.md`.
Write `specs/ui-house-of-walreign-variations-interview.md`.
Write `specs/ui-house-of-walreign-variations.md`.
Write `specs/ux-variations-house-of-walreign-interview.md`.
Write `specs/ux-variations-house-of-walreign.md`.
Write the canonical spec and interview log under `specs/`.
Write the district UI spec and interview log.
Planned108
`$game-launch` - create `research/game-launch.md` after `$game-store-page-test`; currently blocked because `research/game-store-page-test.md` is missing.
`$game-roadmap` - update `tasks/roadmap.md` after `$game-launch`; currently blocked because `research/game-launch.md` is missing and the current `tasks/roadmap.md` predates the missing game-market research sequence.
`$game-store-page-test` - create `research/game-store-page-test.md` after `$game-playtest-metrics`; unblocked by `research/game-playtest-metrics.md`.
`$game-store-page-test` - create `research/game-store-page-test.md` because `tasks/todo.md` § `Priority Documentation Todo` has this unchecked item unblocked by `research/game-playtest-metrics.md` (metrics research updated 2026-05-14).
`$plan-phase 19` - decompose the new Colony Sim Renderer Benchmark phase because `tasks/roadmap.md` now includes Phase 19 from `specs/colony-sim-rendering-experiments.md` (roadmap/spec updated 2026-05-15), but `tasks/todo.md` still has no implementation steps for that phase.
`$reconcile-dev-docs fix tasks` - reconcile stale manual/advisory task docs because `tasks/manual-todo.md` was last updated 2026-05-14 and still references Phase 13 plus old dogfood/UAT follow-ups while roadmap Phases 13-18 are complete.
`$spec-drift fix all` - reconcile specs against implementation because source-code commits under `apps/` and `packages/` landed after many canonical specs.
`pnpm turbo lint` — all packages pass
`pnpm turbo test` — no regressions
`pnpm turbo typecheck` — all packages pass
1 failing test: budget calc in `agent-integration.test.tsx:185` (spentTodayUsd accumulation off by ~1.2)
Agent abstraction layer exposes all 5 core methods with TypeScript types
Agent integration works (mock and live modes)
Agent integration works (mock and live modes)
Agent integration works (mock and live modes)
Agent integration works (mock and live modes)
Agent integration works (mock and live modes)
Agent integration works (mock and live modes)
Agent integration works (mock and live modes)
Agent integration works (mock and live modes)
Agent sessions create feature branches on linked repos
All 10 games use consistent faction names, colors, and terminology
All 3 HITL patterns function within naval theme
All 3 HITL touchpoints work together (Log + Periscope + Radio)
All 3 visual styles render correctly (PixiJS, Three.js, terminal)
All 8 spec-defined variants render and are playable
All 9 spec-defined variants render and are playable
All spec-defined variants render and are playable
All spec-defined variants render and are playable
All spec-defined variants render and are playable
All spec-defined variants render and are playable
All spec-defined variants render and are playable
All spec-defined variants render and are playable
Belt logistics and station interactions function correctly
Benchmark output includes first render time, FPS/frame-time, interaction latency, bundle size impact, memory trend, and nonblank canvas verification.
Card mechanics trigger agent tasks correctly
Codex API integration passes the same test suite as the mock layer
Combat and equipment systems function
Companion actions trigger agent tasks correctly
Cost guardrails prevent runaway agent spending
Cost tracking accumulates per-session and per-aggregate
Demo codebase can be provisioned for new players
Depth mechanic affects scope of agent work
Dialogue system produces valid agent prompts
Dive/surface toggle changes agent behavior mode (analysis vs execution)
Each candidate supports the same minimal interactions and emits equivalent benchmark events.
Event streaming delivers progress updates to a test client
Factory actions trigger agent tasks and display results
Factory-builder: all 10 UI components migrated to CSS modules
Factory-builder: all 23 sound hooks fire custom events
Factory-builder: all 3 commission hub variants functional
Factory-builder: all 4 layout variants render and are switchable via variant selector
Factory-builder: all 4 station click variants functional
Factory-builder: animation transitions match spec (durations, easings)
Factory-builder: both inspection diff variants functional
Factory-builder: both palette variants apply correctly (CSS custom properties)
Factory-builder: HUD shows all 7 data points with animation (value pulses, power warnings)
Factory-builder: StationToolbar renders 6 stations with color coding, locked/unlocked states
Factory-builder: view navigation works per layout variant (hotkeys, tabs, sidebar icons, panel toggles)
Fleet actions trigger agent tasks correctly
Fleet formation mechanic affects agent coordination pattern
Froggy Empire presentation is consistent across genres
Game shell experiment top bar redesigned per ui-game-shell.md §6.2 (collapsed/expanded, 4 accordion sections, agent status, perf badge)
Game shell hub page redesigned per ui-game-shell.md §6.1 (row anatomy, action cluster, hypothesis preview, keyboard nav)
GitHub OAuth login/logout flow works end-to-end
Help overlay renders dynamic shortcut cheatsheet
HITL callbacks route correctly through the abstraction
Idle progression and prestige systems function correctly
Live-agent smoke and full per-game spec acceptance deferred by design.
Management actions trigger agent tasks and display results
Mock layer returns realistic simulated responses with configurable delays
Mock/live toggle switches cleanly without code changes
No lore contradictions between games
Per-game acceptance criteria from auto-battler-tactics.md are met
Per-game acceptance criteria from crpg.md are met
Per-game acceptance criteria from factory-builder.md are met
Per-game acceptance criteria from idle-incremental.md are met
Per-game acceptance criteria from management-tycoon.md are met
Per-game acceptance criteria from naval-combat.md are met
Per-game acceptance criteria from roguelike-deckbuilder.md are met
Per-game acceptance criteria from submarine-combat.md are met
PixiJS 2D and Three.js 3D renderers both work
Player can link/unlink a GitHub repo
Qualitative harness objectives evaluate route legibility, structure/worksite/state readability, living-colony feel, and future readiness for districts, agents, particles, labels, and evidence overlays.
Review `tasks/recurring-todo.md`: "Weekly game-experiment dogfood sweep" — next due was 2026-05-05 in `tasks/recurring-todo.md`; promote to `tasks/todo.md` only if this now requires execution work.
Round lifecycle works (deploy → execute → results → adjust)
Run lifecycle works (start → play → boss → complete/fail)
Scoring and progression systems function correctly
Selected assets are vendored into `packages/colony-sim/assets` with a reproducible manifest and license/source metadata.
Session lifecycle works end-to-end: create → stream → review → apply/reject
Shell error handling: soft-error banner and hard-error overlay functional
Style guide document exists and covers all factions and terminology
The local Playtest harness can switch renderer candidates, reset state, persist benchmark runs, capture findings, and export benchmark notes.
The recommended renderer is justified by recorded evidence instead of preference alone.
Three.js, PixiJS, and Phaser candidates render the same benchmark scene from the same renderer-neutral view model.
Torpedo system triggers targeted task executions
U-boat pen upgrades persist between sessions (localStorage)
Unit deployment triggers agent tasks correctly
Unit tests cover all abstraction methods and mock behaviors
Variant presentation parity, live-agent smoke, full spec acceptance deferred by design.
Variant switching works via dev toolbar
Variant switching works via dev toolbar
Variant switching works via dev toolbar
Variant switching works via dev toolbar
Variant switching works via dev toolbar
Variant switching works via dev toolbar
Variant switching works via dev toolbar
Variant switching works via dev toolbar

Timeline

20 events
May 2026
docs
docs: roadmap colony sim renderer benchmark
May 16static-
docs
docs: specify colony sim renderer benchmarks
May 15static-
feature
feat: add playtest harness state reset
May 15static-
fix
fix(colony-sim): clarify colony naming controls
May 15static-
docs
docs: add house of walreign uat plan
May 14static-
fix
fix(colony-sim): make city map the default loop surface
May 14static-
feature
feat: add house of walreign UI prototypes
May 14static-
docs
docs: add house of walreign ui variation spec
May 14static-
docs
docs: add game playtest metrics
May 14static-
docs
docs: add game prototype test plan
May 14static-
docs
docs(tasks): close phase 18
May 14static-
fix
fix(colony-sim): seed browser-verifiable planning inbox
May 14static-
docs
docs(tasks): record phase 18 validation pass
May 14static-
feature
feat(colony-sim): align workflow harness objectives
May 14static-
feature
feat(colony-sim): add planning organization inbox
May 14static-
test
test: add colony planning organization contracts
May 14static-
feature
feat(colony-sim): add district tinker mode
May 14static-
docs
docs: add house of walreign ux variations
May 14static-
test
test(colony-sim): add district red contracts
May 14static-
feature
feat(colony-sim): add last-session report
May 14static-

Dev Docs

47 files

Specs

  • Auto-battler/Tactics — "Walreign Vanguard" — Bismarck v5
    May 15, 202621.9 KB
  • Bismarck v5 — Game-First Development Interface
    May 15, 202613.1 KB
  • Bismarck v5 Game Experiments — Interview Log
    May 15, 20267.8 KB
  • Colony Sim — "Walreign Outpost" — Bismarck v5
    May 15, 202627.3 KB
  • Colony Sim — "Walreign Outpost" — Interview Log
    May 15, 20266.9 KB
  • CRPG — "Walreign Chronicle" — Bismarck v5
    May 15, 202629.7 KB
  • CRPG "Walreign Chronicle" — Interview Log
    May 15, 20269.4 KB
  • Factory-Builder — "Walreign Forge" — Bismarck v5
    May 15, 202625.2 KB
  • Factory-Builder — "Walreign Forge" — Interview Log
    May 15, 20264.2 KB
  • House of Walreign
    May 15, 202617.3 KB
  • House of Walreign Spec Interview
    May 15, 20269.6 KB
  • Idle/Incremental — "Walreign Eternal" — Bismarck v5
    May 15, 202622.6 KB
  • Idle/Incremental — "Walreign Eternal" — Interview Log
    May 15, 20263.9 KB
  • Management Tycoon — "Walreign Works" — Bismarck v5
    May 15, 202626.3 KB
  • Management Tycoon — "Walreign Works" — Interview Log
    May 15, 20265.9 KB
  • Monorepo Structure & Playtest Platform — Bismarck v5
    May 15, 202616.2 KB
  • Monorepo Structure & Playtest Platform — Interview Log
    May 15, 20268.1 KB
  • Roguelike Deckbuilder — "Walreign Sortie" — Bismarck v5
    May 15, 202641.3 KB
  • Shared Backend Service — Bismarck v5
    May 15, 202613.0 KB
  • Shared Backend Service — Interview Log
    May 15, 20266.7 KB
  • Shared Game Test Harness
    May 15, 202610.9 KB
  • Shared Game Test Harness Interview
    May 15, 202610.4 KB
  • UI Interview - House of Walreign Variations
    May 15, 20266.9 KB
  • UI Interview Log - Colony Sim Districts, Tags, Threads, and City Planning
    May 15, 20268.5 KB
  • UI Interview Log — Bismarck v5 Game Shell
    May 15, 20266.7 KB
  • UI Interview Log — Colony Sim ("Walreign Outpost")
    May 15, 20267.0 KB
  • UI Interview Log — Factory Builder (Walreign Forge)
    May 15, 20266.9 KB
  • UI Interview Log — Management Tycoon ("Walreign Works")
    May 15, 20265.6 KB
  • UI Spec - Colony Sim Districts, Tags, Threads, and City Planning
    May 15, 202619.7 KB
  • UI Spec - House of Walreign Variations
    May 15, 202620.1 KB
  • UI Spec — Bismarck v5 Game Shell (Dev / Playtest Surface)
    May 15, 202623.0 KB
  • UI Spec — Colony Sim ("Walreign Outpost")
    May 15, 202638.2 KB
  • UI Spec — Factory Builder (Walreign Forge)
    May 15, 202637.8 KB
  • UI Spec — Management Tycoon ("Walreign Works")
    May 15, 202631.5 KB
  • UX Direction - Colony Sim Workflow Colony
    May 15, 202612.3 KB
  • UX Direction Interview Log - Colony Sim Workflow Colony
    May 15, 20269.7 KB
  • UX Variation Interview Log — Management Tycoon ("Walreign Works")
    May 15, 20264.3 KB
  • UX Variations - House of Walreign
    May 15, 202625.6 KB
  • UX Variations - House of Walreign Interview
    May 15, 20266.7 KB
  • UX Variations — Management Tycoon ("Walreign Works")
    May 15, 202655.3 KB
  • Walreign Empire Narrative Style Guide
    May 15, 202616.2 KB
  • Walreign High Seas — Naval Combat Experiment
    May 15, 202616.0 KB
  • Walreign High Seas — Spec Interview Log
    May 15, 20266.0 KB
  • Walreign Hunters — Spec Interview Log
    May 15, 20266.6 KB
  • Walreign Hunters — Submarine Combat & Exploration Experiment
    May 15, 202623.7 KB
  • Walreign Sortie — Roguelike Deckbuilder — Interview Log
    May 15, 20266.8 KB
  • Walreign Vanguard — Spec Interview Log
    May 15, 20267.5 KB