Skip to content

Git History Security Audit

Date: 2026-01-31 Task: #35 - Review and Clean Git History Status: Complete - No critical issues found

Executive Summary

The git history was scanned for secrets, credentials, and sensitive information. No real secrets or API keys were found. The only credentials in the history are intentional development placeholders (admin/password) that are documented in repository setup/security docs.

Scan Results

Secrets and Credentials

Category Status Details
API Keys ✅ Clean No API keys found
AWS Credentials ✅ Clean No AKIA patterns found
GitHub Tokens ✅ Clean No ghp_/gho_ patterns found
Private Keys ✅ Clean No PEM/key files found
.env Files ✅ Clean Never committed
MongoDB Credentials ⚠️ Dev Only admin:password placeholder in docker-compose.yml

Files with Development Credentials

Only one file contains credentials (development placeholders):

./docker/docker-compose.yml:
  MONGO_INITDB_ROOT_PASSWORD: password
  MongoDB__ConnectionString=mongodb://admin:password@mongodb:27017
```text

**Action Required**: Task #18 will address this by moving to environment variables.

### Sensitive File Extensions

No sensitive files (`.pem`, `.key`, `.p12`, `.credentials`, etc.) were ever committed.

### Email Addresses in Git History

- 2 personal email addresses found in commit history (redacted from this report)
- 1 GitHub noreply address (current)
- 1 bot address (dependabot)
- 1 co-author address (Claude noreply)

**Note**: This is normal for public repositories. If you prefer privacy, you can:

1. Use GitHub's noreply address for future commits (now in use)
2. Optionally rewrite history (not recommended unless necessary)

### Large Files in History

| Size   | File                   | Notes                     |
| ------ | ---------------------- | ------------------------- |
| 666 KB | `package-lock.json`    | Normal, multiple versions |
| 547 KB | `deep-space.png`       | Background image          |
| 528 KB | `constellation_bg.png` | Background image          |

No unusually large files that would bloat the repository.

### Repository Size

**Before cleanup**:

- Git directory: 2.8 GB
- Cause: Orphaned FITS file blobs from removed commits

**After cleanup**:

- Git directory: **1.8 MB** ✅
- Pack file: 1.61 MB

**Cleanup performed**:

```bash
git reflog expire --expire=now --all
git gc --prune=now --aggressive
```text

The large blobs were unreachable (orphaned) FITS files that had been committed at some point but later removed. They were pruned from the pack file.

**Note**: The `data/` directory (20 GB) is properly gitignored and not tracked.

## .gitignore Coverage

Current `.gitignore` properly excludes:

-  `data/` and `data/mast/` - FITS files
-  `.env` files - Would be covered by `*.env` pattern (add explicitly)
-  `node_modules/`
-  `__pycache__/`, `venv/`
-  `.vscode/`
-  `.claude/`

### Recommended .gitignore Additions

```gitignore
# Environment files (explicit)
.env
.env.local
.env.*.local
*.env

# IDE
.idea/

# Logs
*.log
logs/

# Credentials (safety net)
*.pem
*.key
*.p12
credentials.json
secrets.json

Action Items

Previously Required (Now Complete)

  1. ~~Task #18: Remove hardcoded credentials from docker-compose.yml~~ — Resolved. Credentials moved to .env with ${VARIABLE} syntax in docker-compose. .env.example provided.

  2. ~~Update .gitignore: Add explicit .env patterns~~ — Resolved.

Optional

  1. Git GC: Run git gc --aggressive to optimize repository size
  2. Email Privacy: Consider using GitHub noreply for future commits
  3. Squash History: Not recommended unless you want a cleaner history

Conclusion

The repository is safe to make public. Task #18 (credential externalization) is complete. No real secrets, API keys, or sensitive data were found in the git history.

The admin/password credentials are:

  • Clearly documented as development-only
  • Only used for local MongoDB in Docker
  • Standard practice for development environments

No history rewriting (git-filter-repo) is necessary.