Processing Engine Scaling
Exploratory analysis of how to scale composite and mosaic processing from a single user to many concurrent users.
Status: Brainstorm (Feb 2026) — no decisions made, no implementation planned yet.
Current Architecture
User → .NET API → Background Service → HTTP call → Python container (1 instance)
↓ ↓
Job Tracker Process FITS
SignalR push Return bytes
One processing container, one request at a time per worker. The .NET job queue serializes work, which is fine for one user but becomes a bottleneck with concurrent users.
Scaling Levels
Level 0: Caching (biggest bang, zero infrastructure)
Before scaling compute, eliminate redundant compute. Most users visiting a popular target will request the same composite with default settings.
Composite result cache: hash the inputs (dataIds + color mapping + stretch params + dimensions) → check if result exists → return cached blob. Store in S3, keyed by hash. TTL-based expiry.
100 users looking at M16 = 1 composite generation + 99 cache hits.
Pre-generation for featured targets: the 12 featured targets have curated recipes. Generate default composites on deploy or nightly. Users see instant results for the most common paths.
Ties into the "permalinkable viewer state" roadmap item — a cached composite with a stable hash is naturally shareable.
Effort: Small. High leverage at every scale level.
Level 1: Multiple Workers (several users)
Bump uvicorn to N workers matching CPU cores. Combined with caching, handles 5-10 concurrent unique composite requests.
Effort: One line change in Docker CMD. Zero architecture change.
Level 2: SQS + Fargate Auto-Scaling Workers (tens of users)
Decouple job submission from processing. Instead of the .NET background service calling the processing engine via HTTP, it pushes a message to SQS:
User → .NET API → SQS queue → Worker containers (auto-scaled)
↓ ↓
Job Tracker Pull job, process,
SignalR push ←──────────── write result to S3,
notify via SNS/callback
Why this fits the project well:
- The job queue pattern already exists in .NET —
CompositeQueue,CompositeBackgroundService,JobTracker, SignalR push. The bounded channel just becomes SQS. - Workers are the same Python Docker image, running a queue consumer instead of a web server.
- ECS Fargate auto-scales workers based on queue depth (0 workers when idle = $0).
- Each worker processes one job, pulls the next when done. No coordination needed.
- Results go to S3 (already have S3 storage provider).
What changes:
- Processing engine gets a queue consumer mode (read SQS, process, write to S3, ack)
- .NET
CompositeBackgroundServicepushes to SQS instead of calling HTTP - Job completion via SNS → webhook or polling result in S3
- FITS files must be in S3 (not local disk) so workers can access them
What doesn't change:
- Frontend (still gets SignalR progress, same job IDs)
- Job tracker (still tracks status, just updated differently)
- Processing Python code (same functions, different entry point)
Effort: Medium. Most work is in the SQS/SNS plumbing and Fargate task definitions.
Level 3: Step Functions for Complex Pipelines (many users)
Large mosaics are multi-step: download sources → reproject each → combine → stretch → encode. With Step Functions, each step is a separate Fargate task:
Step Functions:
1. Fan-out: reproject each source file (parallel Fargate tasks)
2. Combine reprojected files (single task, needs all outputs)
3. Stretch + encode (single task)
4. Write result to S3, notify
Parallelizes the expensive reprojection step and handles mosaics that would timeout or OOM a single container.
Honest assessment: real engineering effort, only worth it for users generating 10+ source mosaics regularly. For 2-4 source mosaics (the common case), Level 2 handles it fine.
Effort: Large.
Lambda Assessment
Evaluated Lambda as a processing backend. Summary: poor fit for the core workloads.
| Factor | Impact |
|---|---|
| Cold starts | 10-30s for numpy/astropy/scipy stack, even with container images |
| File sizes | FITS files are 100MB-5GB; Lambda response limit is 6MB; requires S3 intermediary for everything |
| Execution time | Large mosaics can exceed Lambda's 15-minute limit |
| Memory | 10GB max may be tight for large multi-source mosaics |
Where Lambda does fit: thumbnail generation (small input/output, embarrassingly parallel, stateless), recipe suggestions (lightweight, fast), and scheduled tasks (cleanup, metadata refresh).
Scaling Summary
| Users | Architecture | Effort | Monthly Cost Delta |
|---|---|---|---|
| 1-5 | Current + caching + multiple workers | Small | ~$0 |
| 5-30 | SQS + Fargate auto-scaling workers | Medium | ~$10-30 (scales to zero) |
| 30-100+ | Above + Step Functions for mosaics, CDN for cached results | Large | ~$30-100 |
| 100+ | Kubernetes, GPU workers, tiered processing | Very large | Varies widely |
Recommended Path
- Now: Implement composite result caching (Level 0). Highest leverage, works at every scale.
- When needed: Add multiple workers (Level 1). One-line change.
- With real users: SQS + Fargate (Level 2). Natural evolution of existing job queue pattern.
- At scale: Step Functions for mosaics (Level 3). Only if large mosaics become common.
Caching is the prerequisite that makes everything else cheaper. Build it first regardless of which scaling path is chosen.