Skip to content

March 13: The c-Prefix Problem

Four PRs today, all circling the same issue: MAST's observation naming conventions are more complicated than they look, and our recipe engine was tripping over them.

Developer Journal

Observation prefixes: o vs c

MAST has two kinds of observation IDs for the same program. The o-prefix observations are the individual detector exposures — the raw building blocks. The c-prefix observations are pre-combined mosaics that STScI has already stitched together. Both show up in search results for the same target.

Our recipe engine was naively pulling both. In some cases it would download all the individual parts and the combined mosaic — doubling the download time and confusing the composite pipeline. Worse, the c-prefix mosaics are enormous files that MAST doesn't always put on S3, so they'd fall back to slow HTTP downloads or fail entirely with unhelpful "no products available" errors.

The fix came in stages across the day:

First (#801): filtered discovery searches to L3 calibration level to match what we actually download. Previously we'd find filters at L2b that had no L3 products, then fail at download time.

Second (#803): added deduplication logic to the recipe engine — when both c-prefix and o-prefix observations exist for the same filter, keep only one. But which one?

Third (#804): switched from filtering downloads against raw MAST search results to using the recipe's own observation_ids list. The recipe already knows which observations it wants — we were second-guessing it with a broader MAST query that pulled in duplicates.

Fourth (#808): settled the preference question — always prefer o-prefix observations over c-prefix. The individual exposures are smaller, consistently available on S3, and our composite pipeline is designed to combine them anyway. The pre-combined mosaics are useful for quick previews but not for building custom composites.

The cascade pattern

This is a pattern I've noticed: data pipeline bugs rarely come alone. You fix the obvious symptom (downloads failing), which reveals the next layer (duplicate observations), which reveals the assumption underneath (we trusted MAST search results to be de-duplicated). Four PRs in one day isn't scope creep — it's peeling back the onion.

What shipped

PR Title
#801 fix: filter discovery search to L3 calibration level to match downloads
#803 fix: deduplicate c-prefix mosaic vs o-prefix observations in recipe engine
#804 fix: use recipe observation_ids to filter downloads instead of raw MAST results
#808 fix: always prefer o-prefix observations over c-prefix in deduplication