JWST Data Analysis Application — Completed Phases

This document archives the completed development phases (1–5) of the JWST Data Analysis Application. For the active roadmap, see development-plan.md.

Phase 1: Foundation & Architecture ✅ Completed

Key Components:

Data Ingestion Layer for various JWST data formats
Storage Layer with flexible MongoDB schemas
Processing Engine for scientific computations
API Gateway for orchestration
React dashboard for data visualization

Current Status:

Phase 1 Deliverables:

Complete project architecture
.NET 10 Web API with MongoDB integration
React TypeScript frontend with modern UI
Flexible data models for various JWST data types
Docker containerization for all services
Python processing engine foundation
Comprehensive documentation and setup guides

Phase 2: Core Infrastructure ✅ Complete

Backend Development:

Database Design:

Design flexible document schemas for:
Image data (metadata + binary storage)
Raw sensor data (time series, spectral data)
Processing results and analysis outputs
User sessions and preferences

Phase 2 Summary:

Enhanced data models with comprehensive metadata
Improved API endpoints for search, statistics, bulk operations, and export
Robust MongoDB service with advanced querying and aggregation
Successful testing of all new features
Documentation updated

Deliverables:

Functional .NET API with MongoDB integration
Data models for various JWST data types
Basic authentication system
File upload and storage capabilities
Advanced endpoints for search, statistics, bulk update, and export
Robust validation and error handling
Updated documentation

Phase 3: Data Processing Engine ✅ Complete

Python Microservice:

Create Python service for scientific computations
Integrate with Astropy for astronomical data processing
MAST Portal integration with astroquery

MAST Portal Integration: ✅ Complete

Search MAST by target name (e.g., "NGC 3132", "Carina Nebula")
Search MAST by RA/Dec coordinates with configurable radius
Search MAST by observation ID
Search MAST by program/proposal ID
Download FITS files from MAST to local storage
Import downloaded files into MongoDB with metadata extraction
Frontend UI for MAST search and import workflow

Processing Level Tracking: ✅ Complete

Parse JWST filename patterns to extract processing level (L1/L2a/L2b/L3)
Track observation base ID and exposure ID for lineage grouping
Establish parent-child relationships between processing levels
Add lineage API endpoints (/api/jwstdata/lineage)
Frontend lineage tree view with collapsible hierarchy
Color-coded level badges (L1:red, L2a:amber, L2b:emerald, L3:blue)
Migration endpoint to backfill existing data

MAST Import Progress Indicator: ✅ Complete

Background job tracking for import operations
Real-time progress polling from frontend
Visual progress bar with stage indicators
Async download with file-by-file progress tracking

Chunked Downloads & Resume: ✅ Complete

HTTP Range header support for chunked downloads (5MB chunks)
Parallel file downloads using asyncio (3 concurrent files)
Byte-level progress tracking with speed (MB/s) and ETA
State persistence for resume capability (JSON state files)
Resume interrupted downloads from last byte position
Import-from-existing endpoint for recovering completed downloads
Frontend progress UI with per-file progress bars

FITS File Type Detection: ✅ Complete

Classify FITS files by filename suffix (image vs table)
Visual type badges in file listings (🖼️ image, 📊 table)
Disable View button for non-viewable table files
Graceful error handling for non-image FITS files in viewer

MAST Metadata Preservation: ✅ Complete

Preserve ALL MAST fields (~30+) with mast_ prefix in Metadata dictionary
Enhanced ImageMetadata with proposal info, calibration level, wavelength range
Robust observation date extraction with fallbacks (t_min → t_max → t_obs_release)
Refresh metadata endpoint for single observation
Bulk refresh metadata endpoint for all MAST imports
Frontend "Refresh Metadata" button in dashboard
JsonElement to basic type conversion for MongoDB serialization

Phase 3 Deliverables:

Python microservice with scientific computing capabilities
Integration with .NET backend (HTTP client communication)
MAST Portal search and download functionality
Processing level tracking and lineage visualization
Import progress indicator with real-time updates
Chunked downloads with HTTP Range headers and resume capability
Byte-level progress tracking with speed and ETA
FITS file type detection and viewer improvements
MAST metadata preservation and refresh capability

Phase 4: Frontend & FITS Viewer Features ✅ Complete

Complete React frontend application with advanced FITS visualization capabilities inspired by OpenFITS and similar tools.

React Application:

Modern, responsive dashboard design
File upload interface for JWST data
Real-time processing status updates
Interactive data visualization components
Results display with export capabilities

Centralized API Service Layer: ✅ Complete

Core HTTP client (apiClient.ts) with automatic JSON handling and error extraction
Custom error class (ApiError.ts) with status codes and type guards
JWST data service (jwstDataService.ts) for CRUD, processing, archive operations
MAST service (mastService.ts) for search, import, progress tracking, resume
Service re-exports (index.ts) for clean imports
Replaced 15 inline fetch() calls across 4 components
Consistent error handling across all API operations

Core Viewer Features (A-series):

Color & Composite (B-series):

B1: RGB Composite Creator (Epic) — Wizard-based workflow for creating false-color composites

Task	Description	Blocked By	Status
B1.1	Composite generation backend (processing engine + API endpoint)	—	[x]
B1.2	Reusable Wizard/Stepper UI component	—	[x]
B1.3	Observation selection step (card grid with thumbnails)	B1.2	[x]
B1.4	Channel assignment step with auto-suggest (wavelength sorting)	B1.3	[x]
B1.5	Preview and export step (generate composite, download PNG/JPEG)	B1.1, B1.4	[x]
B1.6	Per-channel adjustment controls (enhancement - stretch/levels per channel)	B1.5	[x]
B1.7	UI refresh: merge to 2-step wizard with drag-and-drop + thumbnails	B1.6	[x]
B1.8	Per-channel weight sliders (0–200% intensity balance)	B1.7	[x]

Architecture Decision: Wizard flow chosen over simple modal for better UX, guided experience, and reusability of stepper component for future multi-step workflows (batch export, guided import, etc.)

UI Refresh (B1.7–B1.8): Consolidated original 3-step wizard into 2 steps — Step 1: Assign Channels (drag-and-drop with FITS thumbnails, target-scoped auto-sort) → Step 2: Preview & Export (per-channel stretch controls, weight sliders, channel swap, live preview, export). Added per-channel weight multiplier across the full stack (frontend → C# backend → Python processing engine).

B2: WCS Mosaic Generator (Epic) — Combine multiple observations into seamless large-area images

Task	Description	Blocked By	Status
B2.1	Add `reproject` dependency and mosaic engine (processing engine)	—	[x]
B2.2	Mosaic API endpoints (MosaicController, MosaicService)	B2.1	[x]
B2.3	Footprint preview endpoint (show combined coverage before generation)	B2.1	[x]
B2.4	MosaicDialog component with multi-file selection	B2.2	[x]
B2.5	Footprint preview visualization in dialog	B2.3, B2.4	[x]
B2.6	Mosaic result display and export	B2.4	[x]
B2.7	Mosaic wizard UI refresh: 2-step flow, thumbnail cards, reusable WizardStepper	B2.6	[x]

Key Difference from RGB Composite: RGB composite stacks 3 images as R/G/B color channels (same sky field, different filters). Mosaic spatially combines N images from different sky positions using WCS reprojection to create larger coverage area.

B3: Multi-Channel Composite (4+ filters) — Extend RGB composite to support N-channel color mapping

NASA's published JWST composites typically use 4–6 filters mapped to distinct color channels (e.g., Southern Ring Nebula MIRI uses F770W→Blue, F1130W→Cyan, F1280W→Green, F1800W→Red). The current wizard only supports 3 channels (R/G/B), which limits how closely users can recreate reference images.

Task	Description	Blocked By	Status
B3.1	N-channel color mapping engine (processing engine — map N filters to RGB via hue)	—	[x]
B3.2	Backend API support for N-channel composite requests	B3.1	[x]
B3.3	Wizard UI: dynamic channel list with color picker / wavelength-to-hue auto-assign	B3.2	[x]
B3.4	Luminance channel support (L in LRGB — broadband or combined for detail)	B3.1	[x]
B3.5	Preset color mappings for common JWST filter sets (NIRCam, MIRI)	B3.3	[x]
B3.6	Remove deprecated `/composite/generate` endpoint and frontend references	B3.3	[x]

Motivation: Professional tools like PixInsight and SAOImageDS9 support arbitrary filter-to-hue mapping. JWST programs routinely observe in 4–8 filters per target. Limiting to 3 channels forces users to either drop filters or awkwardly combine filters into a single channel.

Related: Issue #357 (refine default stretch/background neutralization)

Data Acquisition (F-series):

F1: S3 Direct Access for FITS Downloads — Use `s3://stpubdata/jwst/public/` for faster data access

STScI mirrors the full JWST public archive on AWS S3 (s3://stpubdata/jwst/public/). Downloading via S3 is significantly faster than HTTP from MAST (no rate limiting, AWS-native throughput, supports multipart downloads). The bucket is public — no authentication required, only a --no-sign-request flag.

Task	Description	Blocked By	Status
F1.1	S3 client integration in processing engine (boto3, anonymous access)	—	[x]
F1.2	S3 path resolution via MAST `get_cloud_uris()` API (PR #396)	F1.1	[x]
F1.3	Download engine: S3 multipart download with progress tracking	F1.1	[x]
F1.4	Backend API to select download source (S3 preferred, HTTP fallback)	F1.2, F1.3	[x]
F1.5	Frontend: download source indicator and preference setting	F1.4	[x]

F2: Storage Abstraction Layer — Decouple file storage from local filesystem

The application currently reads/writes all data to a shared /app/data/ Docker volume. Before migrating to S3, introduce a storage abstraction so providers can be swapped via config. This is the foundation for F3.

Task	Description	Blocked By	Status
F2.1	`IStorageProvider` interface in backend (.NET): Write, ReadStream, Exists, Delete, GetPresignedUrl, List	—	[x]
F2.2	`LocalStorageProvider` implementation (wraps current `/app/data/` filesystem)	F2.1	[x]
F2.3	Python `StorageProvider` ABC with `read_to_temp()`, `write_from_path()`, `write_from_bytes()`, `presigned_url()`	—	[x]
F2.4	`LocalStorage` Python implementation (current filesystem behavior)	F2.3	[x]
F2.5	MongoDB migration — normalize `FilePath` values to storage keys (strip `/app/data/` prefix)	F2.2	[x]
F2.6	Environment switch: `STORAGE_PROVIDER=local\|s3` with DI registration	F2.2, F2.4	[x]

F3: S3 Storage for Application Data — Migrate MAST downloads, uploads, and outputs to S3

Replace the shared Docker volume with S3 for all application data. Bucket structure: jwst-data-{env}/mast/{obs_id}/{file}.fits, uploads/{user_id}/{uuid}.fits, mosaic/{uuid}_i2d.fits, exports/{export_id}.json.

Task	Description	Blocked By	Status
F3.1	`S3StorageProvider` implementation (backend .NET, AWS SDK)	F2.1, F2.2	[x]
F3.2	`S3Storage` implementation (processing engine Python, boto3)	F2.3, F2.4	[x]
F3.3	MAST downloads to S3 — stream via S3 multipart upload, LRU temp cache for processing	F3.1, F3.2	[x]
F3.4	User uploads to S3 — stream multipart form data to `uploads/{userId}/{guid}{ext}`	F3.1	[x]
F3.5	Generated outputs to S3 — mosaic/composite results to `mosaic/` and `exports/` prefixes	F3.2	[x]
F3.6	Presigned URLs for file downloads (15-min expiry, skip proxying through backend)	F3.1	[x]
F3.7	S3 Intelligent-Tiering lifecycle policy on `mast/` prefix (manual script)	F3.1	[x]
F3.8	Local dev parity — SeaweedFS in docker-compose.yml (s3 profile)	F3.1	[x]

Image Analysis (C-series):

C2: Region selection and statistics (mean, median, std, min, max, sum, pixel count)
C3: Image comparison/blink mode (toggle, side-by-side, opacity overlay)
C4: Color balance and curves adjustment

Note: C1 (Smoothing/Noise Reduction) moved to Phase 8 (requires backend endpoint wiring)

Visualization & Export (D-series):

D3: WCS grid overlay (PR #180)
D4: Scale bar (PR #183)
D5: Annotation tools (text, arrows, circles) (PR #181)
D6: AVM metadata embedding on export (PR #208)

Note: D1 (Batch Processing) moved to Phase 8

Dashboard & UX (E-series):

E1: Search by target name in top search bar (filter local observations by targetName)
E2: Automatic FITS thumbnail generation for dashboard cards

Reliability & UX Polish (G-series):

G1: Auto-recovery startup scan & data visibility model (PR #385)
G2: MAST error propagation — show actual errors, not generic 503 (PR #395)
G3: S3 cloud URI resolution via MAST API (PR #396)
G4: Docker healthcheck probe for processing engine (PR #382) — other services use service_started dependency
G5: Smart mosaic pre-selection with target priority & warnings (PR #387)
G6: Floating analysis bar & unified file selection (PR #386)
G7: Dynamic file size warnings on mosaic cards (PR #388)
G8: E2E tests for MAST download workflow (PR #380)

Design System Polish (P-series):

Token-based design system established in P14–P16. This series audits adoption and closes remaining gaps.

P17: Design Token Audit & Migration — Audit all CSS against system.md, fix violations

Task	Description	Status
P17.1	Add foundation tokens (overlay, shadow-xl, text-3xl) to index.css	[x]
P17.2	Spacing violations — 14 hardcoded values → nearest --space-* token	[x]
P17.3	Radius violations — 12 hardcoded values → nearest --radius-* token	[x]
P17.4	Typography violations — 97 hardcoded font-sizes → --text-* tokens	[x]
P17.5	Shadow violations — 5 simple migrations + 3 modal shadows → tokens	[x]
P17.6	Color/overlay violations — ~25 rgba backgrounds → --overlay-* tokens	[x]

P18: Button Standardization — Shared base class + variant system

Task	Description	Status
P18.1	Add --text-inverse token, replace hardcoded `white` across 18 files	[x]
P18.2	Deduplicate .btn-action into index.css, remove from component files	[x]
P18.3	Deduplicate .btn-export into index.css, keep component-specific overrides	[x]

P19: Button Base Class (.btn-base) — Requires JSX changes across 25+ components

Task	Description	Status
P19.1	Define .btn-base shared class (padding, radius, font-size, cursor, transition)	[x]
P19.2	Add .btn-base className to all button components (~25 files)	[x]
P19.3	Consolidate padding to 3 tiers (compact, standard, large) via modifiers	[x]
P19.4	Enforce min-height standard (38px regular, 36px icon-only)	[x]
P19.5	Standardize 30px icon buttons to shared class (#609)	[x]

Note: P19.6 (micro buttons) moved to Phase 8 — tracked as #610

Phase 4 Deliverables:

Phase 5: Scientific Processing & Infrastructure ✅ Complete

Backend processing capabilities, infrastructure improvements, and remaining viewer features.

Tier 1 — Core Science Features:

FITS table viewer for non-image FITS products (binary tables, catalog data)
Spectral data visualization (1D spectrum plotting for MOS/IFU)
Job queue + WebSocket progress (replace polling, enable large operations)
SignalR hub, unified job tracker, queue pattern infrastructure
MAST import progress via SignalR (Phase 3)
Async composite export via job queue + SignalR (Phase 4)
Async mosaic export via job queue + SignalR (Phase 5)
Async mosaic save-to-library via job queue + SignalR (Phase 5)
Large mosaic generation resilience — cap preview resolution to 2048px (configurable via Mosaic:MaxPreviewDimension) with structured timing logs for evidence-based monitoring

Note: Permalinkable viewer state moved to Phase 8

Guided Discovery Experience (v1 UX pivot):

Transforms the app from tool-first to content-first. Full design in docs/plans/design/guided-discovery-experience.md.

Phase A — React Router + layout shell (routes: /, /library, /target/:name, /create; move dashboard to My Library)
Phase B — Suggestion engine + chromatic ordering (Python recipe endpoint, featured targets config, color mapping fix)
Phase C — New frontend pages (discovery home, target detail, guided creation flow)
Phase D — Polish (loading skeletons, error states, end-to-end verification of featured targets)

Tier 2 — Image Processing:

D2: Source detection overlay

Note: C1 (Smoothing) and D1 (Batch processing) moved to Phase 8

Phase 5 Deliverables:

FITS table viewer
Spectral viewer
Job queue with WebSocket progress (SignalR hub, unified tracker, MAST import, composite export, mosaic export/save)
Guided Discovery Experience (v1 UX pivot — discovery home, target detail, guided creation, chromatic ordering)
D2: Source detection overlay