JWST Data Analysis Application — Completed Phases
This document archives the completed development phases (1–5) of the JWST Data Analysis Application. For the active roadmap, see development-plan.md.
Phase 1: Foundation & Architecture ✅ Completed
Key Components:
- Data Ingestion Layer for various JWST data formats
- Storage Layer with flexible MongoDB schemas
- Processing Engine for scientific computations
- API Gateway for orchestration
- React dashboard for data visualization
Current Status:
- Project structure setup
- Development plan documentation
- Backend .NET project initialization
- Frontend React project setup
- MongoDB connection configuration
- Basic API structure
- Flexible data models for JWST data
- CRUD operations for data management
- Modern React dashboard with search and filtering
- Docker configuration for all services
- Python processing engine foundation
- Comprehensive setup documentation
Phase 1 Deliverables:
- Complete project architecture
- .NET 10 Web API with MongoDB integration
- React TypeScript frontend with modern UI
- Flexible data models for various JWST data types
- Docker containerization for all services
- Python processing engine foundation
- Comprehensive documentation and setup guides
Phase 2: Core Infrastructure ✅ Complete
Backend Development:
- Set up .NET 10 Web API project
- Implement MongoDB connection and basic CRUD operations
- Create flexible data models for different JWST data types
- Build data ingestion pipeline for FITS files and raw sensor data
- Implement authentication and authorization
- Enhance data models with rich metadata (image, sensor, spectral, calibration, processing results, etc.)
- Add DTOs and validation attributes for robust API requests/responses
- Improve MongoDBService with advanced querying, aggregation, statistics, and bulk operations
- Merge advanced endpoints into JwstDataController (search, statistics, bulk update, export)
- Fix nullable reference type issues and ensure all endpoints are discoverable and functional
- Robust error handling and validation
- Update documentation and setup guide
Database Design:
- Design flexible document schemas for:
- Image data (metadata + binary storage)
- Raw sensor data (time series, spectral data)
- Processing results and analysis outputs
- User sessions and preferences
Phase 2 Summary:
- Enhanced data models with comprehensive metadata
- Improved API endpoints for search, statistics, bulk operations, and export
- Robust MongoDB service with advanced querying and aggregation
- Successful testing of all new features
- Documentation updated
Deliverables:
- Functional .NET API with MongoDB integration
- Data models for various JWST data types
- Basic authentication system
- File upload and storage capabilities
- Advanced endpoints for search, statistics, bulk update, and export
- Robust validation and error handling
- Updated documentation
Phase 3: Data Processing Engine ✅ Complete
Python Microservice:
- Create Python service for scientific computations
- Integrate with Astropy for astronomical data processing
- MAST Portal integration with astroquery
MAST Portal Integration: ✅ Complete
- Search MAST by target name (e.g., "NGC 3132", "Carina Nebula")
- Search MAST by RA/Dec coordinates with configurable radius
- Search MAST by observation ID
- Search MAST by program/proposal ID
- Download FITS files from MAST to local storage
- Import downloaded files into MongoDB with metadata extraction
- Frontend UI for MAST search and import workflow
Processing Level Tracking: ✅ Complete
- Parse JWST filename patterns to extract processing level (L1/L2a/L2b/L3)
- Track observation base ID and exposure ID for lineage grouping
- Establish parent-child relationships between processing levels
- Add lineage API endpoints (
/api/jwstdata/lineage) - Frontend lineage tree view with collapsible hierarchy
- Color-coded level badges (L1:red, L2a:amber, L2b:emerald, L3:blue)
- Migration endpoint to backfill existing data
MAST Import Progress Indicator: ✅ Complete
- Background job tracking for import operations
- Real-time progress polling from frontend
- Visual progress bar with stage indicators
- Async download with file-by-file progress tracking
Chunked Downloads & Resume: ✅ Complete
- HTTP Range header support for chunked downloads (5MB chunks)
- Parallel file downloads using asyncio (3 concurrent files)
- Byte-level progress tracking with speed (MB/s) and ETA
- State persistence for resume capability (JSON state files)
- Resume interrupted downloads from last byte position
- Import-from-existing endpoint for recovering completed downloads
- Frontend progress UI with per-file progress bars
FITS File Type Detection: ✅ Complete
- Classify FITS files by filename suffix (image vs table)
- Visual type badges in file listings (🖼️ image, 📊 table)
- Disable View button for non-viewable table files
- Graceful error handling for non-image FITS files in viewer
MAST Metadata Preservation: ✅ Complete
- Preserve ALL MAST fields (~30+) with
mast_prefix in Metadata dictionary - Enhanced ImageMetadata with proposal info, calibration level, wavelength range
- Robust observation date extraction with fallbacks (t_min → t_max → t_obs_release)
- Refresh metadata endpoint for single observation
- Bulk refresh metadata endpoint for all MAST imports
- Frontend "Refresh Metadata" button in dashboard
- JsonElement to basic type conversion for MongoDB serialization
Phase 3 Deliverables:
- Python microservice with scientific computing capabilities
- Integration with .NET backend (HTTP client communication)
- MAST Portal search and download functionality
- Processing level tracking and lineage visualization
- Import progress indicator with real-time updates
- Chunked downloads with HTTP Range headers and resume capability
- Byte-level progress tracking with speed and ETA
- FITS file type detection and viewer improvements
- MAST metadata preservation and refresh capability
Phase 4: Frontend & FITS Viewer Features ✅ Complete
Complete React frontend application with advanced FITS visualization capabilities inspired by OpenFITS and similar tools.
React Application:
- Modern, responsive dashboard design
- File upload interface for JWST data
- Real-time processing status updates
- Interactive data visualization components
- Results display with export capabilities
Centralized API Service Layer: ✅ Complete
- Core HTTP client (
apiClient.ts) with automatic JSON handling and error extraction - Custom error class (
ApiError.ts) with status codes and type guards - JWST data service (
jwstDataService.ts) for CRUD, processing, archive operations - MAST service (
mastService.ts) for search, import, progress tracking, resume - Service re-exports (
index.ts) for clean imports - Replaced 15 inline fetch() calls across 4 components
- Consistent error handling across all API operations
Core Viewer Features (A-series):
- A0: Delete/archive by processing level (L1/L2a/L2b/L3)
- A1: Interactive stretch and level controls
- A2: Histogram display panel with adjustable black/white points
- A3: Pixel coordinate and value display on hover
- A4: Export processed image as PNG/JPEG
- Format selection (PNG lossless, JPEG with quality control)
- Resolution presets (1200px, 2048px, 4096px, custom 10-8000px)
- JPEG quality slider (1-100%)
- Export options popover UI
- Input validation (backend + processing engine)
- E2E tests for export workflow
- A5: 3D data cube navigator for wavelength/time slices
Color & Composite (B-series):
B1: RGB Composite Creator (Epic) — Wizard-based workflow for creating false-color composites
| Task | Description | Blocked By | Status |
|---|---|---|---|
| B1.1 | Composite generation backend (processing engine + API endpoint) | — | [x] |
| B1.2 | Reusable Wizard/Stepper UI component | — | [x] |
| B1.3 | Observation selection step (card grid with thumbnails) | B1.2 | [x] |
| B1.4 | Channel assignment step with auto-suggest (wavelength sorting) | B1.3 | [x] |
| B1.5 | Preview and export step (generate composite, download PNG/JPEG) | B1.1, B1.4 | [x] |
| B1.6 | Per-channel adjustment controls (enhancement - stretch/levels per channel) | B1.5 | [x] |
| B1.7 | UI refresh: merge to 2-step wizard with drag-and-drop + thumbnails | B1.6 | [x] |
| B1.8 | Per-channel weight sliders (0–200% intensity balance) | B1.7 | [x] |
Architecture Decision: Wizard flow chosen over simple modal for better UX, guided experience, and reusability of stepper component for future multi-step workflows (batch export, guided import, etc.)
UI Refresh (B1.7–B1.8): Consolidated original 3-step wizard into 2 steps — Step 1: Assign Channels (drag-and-drop with FITS thumbnails, target-scoped auto-sort) → Step 2: Preview & Export (per-channel stretch controls, weight sliders, channel swap, live preview, export). Added per-channel weight multiplier across the full stack (frontend → C# backend → Python processing engine).
B2: WCS Mosaic Generator (Epic) — Combine multiple observations into seamless large-area images
| Task | Description | Blocked By | Status |
|---|---|---|---|
| B2.1 | Add reproject dependency and mosaic engine (processing engine) |
— | [x] |
| B2.2 | Mosaic API endpoints (MosaicController, MosaicService) | B2.1 | [x] |
| B2.3 | Footprint preview endpoint (show combined coverage before generation) | B2.1 | [x] |
| B2.4 | MosaicDialog component with multi-file selection | B2.2 | [x] |
| B2.5 | Footprint preview visualization in dialog | B2.3, B2.4 | [x] |
| B2.6 | Mosaic result display and export | B2.4 | [x] |
| B2.7 | Mosaic wizard UI refresh: 2-step flow, thumbnail cards, reusable WizardStepper | B2.6 | [x] |
Key Difference from RGB Composite: RGB composite stacks 3 images as R/G/B color channels (same sky field, different filters). Mosaic spatially combines N images from different sky positions using WCS reprojection to create larger coverage area.
B3: Multi-Channel Composite (4+ filters) — Extend RGB composite to support N-channel color mapping
NASA's published JWST composites typically use 4–6 filters mapped to distinct color channels (e.g., Southern Ring Nebula MIRI uses F770W→Blue, F1130W→Cyan, F1280W→Green, F1800W→Red). The current wizard only supports 3 channels (R/G/B), which limits how closely users can recreate reference images.
| Task | Description | Blocked By | Status |
|---|---|---|---|
| B3.1 | N-channel color mapping engine (processing engine — map N filters to RGB via hue) | — | [x] |
| B3.2 | Backend API support for N-channel composite requests | B3.1 | [x] |
| B3.3 | Wizard UI: dynamic channel list with color picker / wavelength-to-hue auto-assign | B3.2 | [x] |
| B3.4 | Luminance channel support (L in LRGB — broadband or combined for detail) | B3.1 | [x] |
| B3.5 | Preset color mappings for common JWST filter sets (NIRCam, MIRI) | B3.3 | [x] |
| B3.6 | Remove deprecated /composite/generate endpoint and frontend references |
B3.3 | [x] |
Motivation: Professional tools like PixInsight and SAOImageDS9 support arbitrary filter-to-hue mapping. JWST programs routinely observe in 4–8 filters per target. Limiting to 3 channels forces users to either drop filters or awkwardly combine filters into a single channel.
Related: Issue #357 (refine default stretch/background neutralization)
Data Acquisition (F-series):
F1: S3 Direct Access for FITS Downloads — Use s3://stpubdata/jwst/public/ for faster data access
STScI mirrors the full JWST public archive on AWS S3 (s3://stpubdata/jwst/public/). Downloading via S3 is significantly faster than HTTP from MAST (no rate limiting, AWS-native throughput, supports multipart downloads). The bucket is public — no authentication required, only a --no-sign-request flag.
| Task | Description | Blocked By | Status |
|---|---|---|---|
| F1.1 | S3 client integration in processing engine (boto3, anonymous access) | — | [x] |
| F1.2 | S3 path resolution via MAST get_cloud_uris() API (PR #396) |
F1.1 | [x] |
| F1.3 | Download engine: S3 multipart download with progress tracking | F1.1 | [x] |
| F1.4 | Backend API to select download source (S3 preferred, HTTP fallback) | F1.2, F1.3 | [x] |
| F1.5 | Frontend: download source indicator and preference setting | F1.4 | [x] |
F2: Storage Abstraction Layer — Decouple file storage from local filesystem
The application currently reads/writes all data to a shared /app/data/ Docker volume. Before migrating to S3, introduce a storage abstraction so providers can be swapped via config. This is the foundation for F3.
| Task | Description | Blocked By | Status |
|---|---|---|---|
| F2.1 | IStorageProvider interface in backend (.NET): Write, ReadStream, Exists, Delete, GetPresignedUrl, List |
— | [x] |
| F2.2 | LocalStorageProvider implementation (wraps current /app/data/ filesystem) |
F2.1 | [x] |
| F2.3 | Python StorageProvider ABC with read_to_temp(), write_from_path(), write_from_bytes(), presigned_url() |
— | [x] |
| F2.4 | LocalStorage Python implementation (current filesystem behavior) |
F2.3 | [x] |
| F2.5 | MongoDB migration — normalize FilePath values to storage keys (strip /app/data/ prefix) |
F2.2 | [x] |
| F2.6 | Environment switch: STORAGE_PROVIDER=local|s3 with DI registration |
F2.2, F2.4 | [x] |
F3: S3 Storage for Application Data — Migrate MAST downloads, uploads, and outputs to S3
Replace the shared Docker volume with S3 for all application data. Bucket structure: jwst-data-{env}/mast/{obs_id}/{file}.fits, uploads/{user_id}/{uuid}.fits, mosaic/{uuid}_i2d.fits, exports/{export_id}.json.
| Task | Description | Blocked By | Status |
|---|---|---|---|
| F3.1 | S3StorageProvider implementation (backend .NET, AWS SDK) |
F2.1, F2.2 | [x] |
| F3.2 | S3Storage implementation (processing engine Python, boto3) |
F2.3, F2.4 | [x] |
| F3.3 | MAST downloads to S3 — stream via S3 multipart upload, LRU temp cache for processing | F3.1, F3.2 | [x] |
| F3.4 | User uploads to S3 — stream multipart form data to uploads/{userId}/{guid}{ext} |
F3.1 | [x] |
| F3.5 | Generated outputs to S3 — mosaic/composite results to mosaic/ and exports/ prefixes |
F3.2 | [x] |
| F3.6 | Presigned URLs for file downloads (15-min expiry, skip proxying through backend) | F3.1 | [x] |
| F3.7 | S3 Intelligent-Tiering lifecycle policy on mast/ prefix (manual script) |
F3.1 | [x] |
| F3.8 | Local dev parity — SeaweedFS in docker-compose.yml (s3 profile) | F3.1 | [x] |
Image Analysis (C-series):
- C2: Region selection and statistics (mean, median, std, min, max, sum, pixel count)
- C3: Image comparison/blink mode (toggle, side-by-side, opacity overlay)
- C4: Color balance and curves adjustment
Note: C1 (Smoothing/Noise Reduction) moved to Phase 8 (requires backend endpoint wiring)
Visualization & Export (D-series):
- D3: WCS grid overlay (PR #180)
- D4: Scale bar (PR #183)
- D5: Annotation tools (text, arrows, circles) (PR #181)
- D6: AVM metadata embedding on export (PR #208)
Note: D1 (Batch Processing) moved to Phase 8
Dashboard & UX (E-series):
- E1: Search by target name in top search bar (filter local observations by
targetName) - E2: Automatic FITS thumbnail generation for dashboard cards
Reliability & UX Polish (G-series):
- G1: Auto-recovery startup scan & data visibility model (PR #385)
- G2: MAST error propagation — show actual errors, not generic 503 (PR #395)
- G3: S3 cloud URI resolution via MAST API (PR #396)
- G4: Docker healthcheck probe for processing engine (PR #382) — other services use
service_starteddependency - G5: Smart mosaic pre-selection with target priority & warnings (PR #387)
- G6: Floating analysis bar & unified file selection (PR #386)
- G7: Dynamic file size warnings on mosaic cards (PR #388)
- G8: E2E tests for MAST download workflow (PR #380)
Design System Polish (P-series):
Token-based design system established in P14–P16. This series audits adoption and closes remaining gaps.
P17: Design Token Audit & Migration — Audit all CSS against system.md, fix violations
| Task | Description | Status |
|---|---|---|
| P17.1 | Add foundation tokens (overlay, shadow-xl, text-3xl) to index.css | [x] |
| P17.2 | Spacing violations — 14 hardcoded values → nearest --space-* token | [x] |
| P17.3 | Radius violations — 12 hardcoded values → nearest --radius-* token | [x] |
| P17.4 | Typography violations — 97 hardcoded font-sizes → --text-* tokens | [x] |
| P17.5 | Shadow violations — 5 simple migrations + 3 modal shadows → tokens | [x] |
| P17.6 | Color/overlay violations — ~25 rgba backgrounds → --overlay-* tokens | [x] |
P18: Button Standardization — Shared base class + variant system
| Task | Description | Status |
|---|---|---|
| P18.1 | Add --text-inverse token, replace hardcoded white across 18 files |
[x] |
| P18.2 | Deduplicate .btn-action into index.css, remove from component files | [x] |
| P18.3 | Deduplicate .btn-export into index.css, keep component-specific overrides | [x] |
P19: Button Base Class (.btn-base) — Requires JSX changes across 25+ components
| Task | Description | Status |
|---|---|---|
| P19.1 | Define .btn-base shared class (padding, radius, font-size, cursor, transition) | [x] |
| P19.2 | Add .btn-base className to all button components (~25 files) | [x] |
| P19.3 | Consolidate padding to 3 tiers (compact, standard, large) via modifiers | [x] |
| P19.4 | Enforce min-height standard (38px regular, 36px icon-only) | [x] |
| P19.5 | Standardize 30px icon buttons to shared class (#609) | [x] |
Note: P19.6 (micro buttons) moved to Phase 8 — tracked as #610
Phase 4 Deliverables:
- Centralized API service layer with type-safe error handling
- File upload and management interface
- Real-time processing status dashboard
- Delete/archive by processing level
- Interactive stretch and level controls
- Complete React frontend application
- Interactive data visualization components
- Histogram display panel
- Pixel coordinate and value display
- Export processed images (PNG/JPEG with quality/resolution presets)
- 3D data cube navigation (slice navigation with playback)
Phase 5: Scientific Processing & Infrastructure ✅ Complete
Backend processing capabilities, infrastructure improvements, and remaining viewer features.
Tier 1 — Core Science Features:
- FITS table viewer for non-image FITS products (binary tables, catalog data)
- Spectral data visualization (1D spectrum plotting for MOS/IFU)
- Job queue + WebSocket progress (replace polling, enable large operations)
- SignalR hub, unified job tracker, queue pattern infrastructure
- MAST import progress via SignalR (Phase 3)
- Async composite export via job queue + SignalR (Phase 4)
- Async mosaic export via job queue + SignalR (Phase 5)
- Async mosaic save-to-library via job queue + SignalR (Phase 5)
- Large mosaic generation resilience — cap preview resolution to 2048px (configurable via
Mosaic:MaxPreviewDimension) with structured timing logs for evidence-based monitoring
Note: Permalinkable viewer state moved to Phase 8
Guided Discovery Experience (v1 UX pivot):
Transforms the app from tool-first to content-first. Full design in docs/plans/design/guided-discovery-experience.md.
- Phase A — React Router + layout shell (routes:
/,/library,/target/:name,/create; move dashboard to My Library) - Phase B — Suggestion engine + chromatic ordering (Python recipe endpoint, featured targets config, color mapping fix)
- Phase C — New frontend pages (discovery home, target detail, guided creation flow)
- Phase D — Polish (loading skeletons, error states, end-to-end verification of featured targets)
Tier 2 — Image Processing:
- D2: Source detection overlay
Note: C1 (Smoothing) and D1 (Batch processing) moved to Phase 8
Phase 5 Deliverables:
- FITS table viewer
- Spectral viewer
- Job queue with WebSocket progress (SignalR hub, unified tracker, MAST import, composite export, mosaic export/save)
- Guided Discovery Experience (v1 UX pivot — discovery home, target detail, guided creation, chromatic ordering)
- D2: Source detection overlay