April 2: Timers, Path Filters, and TDD
Split the hardcoded job timeout into configurable per-stage timers, added per-stack path filtering to skip unrelated CI jobs, and — because a friend in the channel kept needling about TDD — tried having Claude write the tests first for one of the fixes. It worked well enough to be a little annoying.
Developer Journal
Hardcoded 10-minute timeout split into stages
The composite job had a single 10-minute deadline for "everything from FITS load to sRGB encode." That was fine when the pipeline was simple, but now long-running MIRI + NIRCAM jobs can legitimately spend 7 minutes in the blend stage alone, which leaves almost nothing for everything else.
Split the single timeout into per-stage budgets: load (90s), reproject (180s), stretch+color (120s), blend (300s), post-stack (60s), encode (30s). Each stage's timer resets when it begins. Total envelope is longer but individual stalls are now bounded — a stuck reproject can't eat into the blend budget and vice versa.
Budgets are env-var configurable so staging can run with tighter limits for early failure detection.
Per-stack path filtering in CI
dorny/paths-filter was already in the CI pipeline for docs-only fast path, but each test job still ran on every PR regardless of whether its stack was touched. A processing-engine-only PR was running frontend unit tests, backend unit tests, and backend integration tests — all of which are no-op.
Added per-stack path filters: frontend/**, backend/**, processing-engine/**, docker/**, docs/**. Each test job gates on the relevant filter. PRs that only touch one stack now run ~1/3 the work, and the CI wall clock dropped noticeably for the typical single-stack change.
Content-type check before response.json()
A frontend call was assuming response.ok meant the body was JSON, but some error paths return HTML (the reverse proxy's default error page) with a 200 status code after redirect. Calling .json() on HTML threw inside a Promise chain that had no error handler, surfacing as a silent UI freeze.
Fixed by checking response.headers.get('content-type') starts with application/json before parsing. If it doesn't, throw a clear "unexpected content type" error with the actual content-type value so the next person debugging doesn't waste an hour staring at a silent Promise.
TDD for real this time
A friend in the AI channel was giving me grief about not doing TDD. The argument that stuck: LLMs don't care about time, so the cost argument ("TDD is slow because you type the tests first") doesn't apply the same way. Tried it on the content-type fix — asked Claude to write the test first from a plain-language description, confirm it fails red, then implement just enough to make it green.
It worked. The test was cleaner than what I'd have written retrofitting after the fact, and the implementation stayed narrower because there was no temptation to "while I'm here" myself into adjacent changes. Not sure if it'll become the default, but adding it to the workflow toolkit.
Code review artifact cleanup
Deleted a stray code-review-summary.md that had been committed by an agent run and left in the repo root. Also merged the automated code review summary doc for today.
What shipped
| PR | Title |
|---|---|
| #942 | chore(ci): add per-stack path filtering to skip unrelated test jobs |
| #941 | fix: replace hardcoded 10-min job timeout with configurable split timers |
| #940 | chore: remove code-review-summary.md artifact |
| #938 | fix: check content-type before calling response.json() |
| — | docs: add automated code review summary for 2026-04-02 |