Files
Charon/docs/plans/history_rewrite.md
GitHub Actions 9adf2735dd feat(history-rewrite): Enhance history rewrite process with detailed backup and validation steps
- Added a comprehensive plan for history rewrites in `docs/plans/history_rewrite.md`, including backup requirements and a checklist for destructive operations.
- Created a QA report for history-rewrite scripts in `docs/reports/qa_report.md`, summarizing tests, findings, and recommendations.
- Introduced `check_refs.sh` script to list branches and tags, saving a tarball of tag references.
- Updated `clean_history.sh` to include non-interactive mode and improved error handling for backup branch pushes.
- Enhanced `preview_removals.sh` to support JSON output format and added shallow clone detection.
- Added Bats tests for `clean_history.sh` and `validate_after_rewrite.sh` to ensure functionality and error handling.
- Implemented pre-commit hook to block commits to `data/backups/` directory.
- Improved validation script to check for backup branch existence and run pre-commit checks.
- Created temporary test scripts for validating `clean_history.sh` and `validate_after_rewrite.sh` functionality.
2025-12-09 14:07:17 +00:00

181 lines
9.8 KiB
Markdown

# History Rewrite: Plan, Checklist, and Recovery
## Summary
- This document describes the agreed process, checks, and recovery steps for destructive history rewrites performed with the scripts in `scripts/history-rewrite/`.
- It updates the previous guidance by adding explicit backup requirements, tag backups, and a `--backup-branch` argument or `BACKUP_BRANCH` env variable that must be set and pushed to a remote before running a destructive rewrite.
## Minimum Requirements
- Tools: `git` (>=2.25), `git-filter-repo` (Python-based utility), `pre-commit`.
- Optional tools: `bats-core` for tests, `shellcheck` for linting scripts.
## Overview
Use the `preview_removals.sh` script to preview which commits/objects will be removed. Always run `clean_history.sh` with `--dry-run` and create a remote backup branch and a tag backup tarball in `data/backups/` before any destructive operation. After a rewrite, run `validate_after_rewrite.sh` to confirm the repository matches expectations.
## Naming Conventions & Backup Policy
- Backup branch name format: `backup/history-YYYYMMDD-HHMMSS`.
- Tag backup tarball: `data/backups/tags-YYYYMMDD-HHMMSS.tar.gz`.
- Metadata: `data/backups/history-YYYYMMDD-HHMMSS.json` with keys `backup_branch`, `tag_tar`, `created_at`, `remote`.
## Checklist (Before a Destructive Rewrite)
1. Run the preview step and attach output to the PR:
- `scripts/history-rewrite/preview_removals.sh --paths 'backend/codeql-db' --strip-size 50 --format json`
- Attach the output (or paste it into the PR) for reviewer consumption.
2. Create a local and remote backup branch:
- `git checkout -b backup/history-YYYYMMDD-HHMMSS`
- `git push origin backup/history-YYYYMMDD-HHMMSS`
- Record the branch name in `--backup-branch` or set `BACKUP_BRANCH` env var so validators can find it.
3. Capture tags:
- `git tag -l | xargs -n1 git show-ref --tags` and push tags to the origin, or create a tarball of tags in `data/backups/`.
- Example tag tarball: `git for-each-ref --format='%(refname)' refs/tags/ | xargs -n1 git rev-parse --verify --quiet | tar -czf data/backups/tags-YYYYMMDD-HHMMSS.tar.gz --files-from -` (create a scripted helper if needed).
4. Ensure `data/backups` exists and is included as a tarball or log attachment in the PR:
- `mkdir -p data/backups && tar -czf data/backups/history-YYYYMMDD-HHMMSS.tar.gz data/backups/` (if logs are present).
5. Run the CI dry-run job and ensure it completes successfully. If `dry-run` reports findings, address them first.
6. Ensure maintainers approve and that you have a scheduled maintenance window. Do not run a destructive `--force` push without explicit approvals.
## Typical Usage Examples
Preview candidates to remove:
```bash
scripts/history-rewrite/preview_removals.sh --paths 'backend/codeql-db,import' --strip-size 50 --format json
```
Create a backup branch and push:
```bash
git checkout -b backup/history-$(date -u +%Y%m%d-%H%M%S)
git push origin HEAD
export BACKUP_BRANCH=$(git rev-parse --abbrev-ref HEAD)
```
Create a tarball of tags and save logs in `data/backups/`:
```bash
mkdir -p data/backups
git for-each-ref --format='%(refname)' refs/tags/ | xargs -n1 -I{} git show-ref --tags {} >> data/backups/tags-$(date -u +%Y%m%d-%H%M%S).txt
tar -czf data/backups/tags-$(date -u +%Y%m%d-%H%M%S).tar.gz data/backups/*
```
Dry-run the rewrite (do not push):
```bash
scripts/history-rewrite/clean_history.sh --paths 'backend/codeql-db,import' --strip-size 50 --dry-run --backup-branch "$BACKUP_BRANCH"
```
Perform the rewrite (coordinated action, after approvals):
```bash
scripts/history-rewrite/clean_history.sh --paths 'backend/codeql-db,import' --strip-size 50 --backup-branch "$BACKUP_BRANCH" --force
# After local rewrite, force-push coordinated with maintainers: `git push origin --all --force`
```
Validate after rewrite:
```bash
scripts/history-rewrite/validate_after_rewrite.sh --backup-branch "$BACKUP_BRANCH"
```
## Recovery Steps (if things go wrong)
1. Ensure your local clone still has the `backup/history-...` branch. If the branch was pushed to origin, check it using:
- `git ls-remote origin | grep backup/history-` or `git fetch origin backup/history-YYYY...`.
2. Restore the branch to a new or restored head:
- `git checkout -b restore-YYYY backup/history-YYYYMMDD-HHMMSS`
- `git push origin restore-YYYY` and open a PR to restore history.
3. For tags: restore from tarball or tag list by re-creating tags and pushing them to the remote:
- `tar -xzf data/backups/tags-YYYYMMDD-HHMMSS.tar.gz -C /tmp/tags
- Recreate tags as needed and `git push origin --tags`.
4. If a destructive push changed history on remote: coordinate with maintainers to either push restore branches or restore from the backup branch using `git push origin refs/heads/restore-YYYY:refs/heads/main` (requires a maintainers-only action).
## Checklist for PR Reviewers
- Confirm `data/backups` is present or attached in the PR.
- Confirm the backup branch (`backup/history-YYYYMMDD-HHMMSS`) is pushed to origin.
- Confirm tag backups exist and are included in the backup tarball.
- Ensure `preview_removals` output is attached to the PR as evidence.
- Ensure maintainers have scheduled the maintenance window and have approved the change.
## Notes & Safety
- Avoid running destructive pushes from forks without a coordinated maintainers plan.
- The default behavior of the scripts is non-destructive (`--dry-run`)—use `--force` only after approvals.
- The `validate_after_rewrite.sh` script accepts `--backup-branch` or reads `BACKUP_BRANCH` env var; make sure it's present (or the script will exit non-zero).
---
For implementation details, see `scripts/history-rewrite/` and current CI workflows that run the script tests.
History rewrite plan
====================
Rationale
---------
Some committed CodeQL DB directories or large binary blobs can bloat clones, CI cache sizes, and repository size overall. This plan provides a non-destructive, auditable history-rewrite solution to remove these directories and optionally strip out huge blobs.
Scope
-----
This plan targets CodeQL DB directories (e.g., backend/codeql-db, codeql-db, codeql-db-js, codeql-db-go) and other large blobs. Scripts are non-destructive by default and require `--force` to make destructive changes.
Risk & Mitigation
-----------------
- Rewriting history changes commit hashes. We never force-push in the scripts automatically; the maintainer must coordinate before running `git push --force`.
- Always create a backup branch before rewriting; the script creates `backup/history-YYYYMMDD-HHMMSS` and pushes it to `origin`.
- Require the manual confirmation string `I UNDERSTAND` before running any destructive change.
Overview of steps
-----------------
1. Prepare: create and checkout a non-main feature branch (do not run on `main` or `master`).
2. Dry-run and preview: run a dry-run to preview commits and blobs to remove.
- `scripts/history-rewrite/clean_history.sh --dry-run --paths 'backend/codeql-db,codeql-db' --strip-size 50`
3. Optional detailed preview:
- `scripts/history-rewrite/preview_removals.sh --paths 'backend/codeql-db,codeql-db' --strip-size 50`
4. With approval, run the destructive rewrite in a local clone or dedicated environment.
- `scripts/history-rewrite/clean_history.sh --force --paths 'backend/codeql-db,codeql-db' --strip-size 50`
- When prompted, type `I UNDERSTAND` to proceed.
5. Validation: run the validator script and ensure CI passes locally:
- `scripts/history-rewrite/validate_after_rewrite.sh`
6. Coordinate with maintainers and force-push only after consensus.
Installation & prerequisites
----------------------------
- git >= 2.25
- git-filter-repo: install via package manager or pip. See https://github.com/newren/git-filter-repo.
- pre-commit (optional): installed in the repository virtual environment (`.venv`).
Sample commands and dry-run outputs
----------------------------------
Dry-run:
```
scripts/history-rewrite/clean_history.sh --dry-run --paths 'backend/codeql-db,codeql-db' --strip-size 50
```
Sample dry-run output (excerpt):
--- Path: backend/codeql-db
2b7c6f8d1a... (commits touching this path)
--- Objects in paths
f6a9abcd... backend/codeql-db/project.sarif
--- Example large objects (candidate for --strip-size)
f3ae1234... size=104857600
Force-run (coordination required):
```
scripts/history-rewrite/clean_history.sh --force --paths 'backend/codeql-db,codeql-db' --strip-size 50
```
Followed by verification and manual force-push:
- Check `data/backups/history_cleanup-YYYYMMDD-HHMMSS.log`
- `scripts/history-rewrite/validate_after_rewrite.sh`
- `git push --all --force` (only after maintainers approve)
Rollback plan
-------------
If problems occur, restore from the backup branch:
git checkout -b restore/YYYYMMDD-HHMMSS backup/history-YYYYMMDD-HHMMSS
git push origin restore/YYYYMMDD-HHMMSS
Post rewrite maintenance
------------------------
- Run `git gc --aggressive --prune=now` on clones and local copies.
- Run `git count-objects -vH` to confirm size improvements.
- Refresh CI caches and mirrors after the change.
Communication & Approval
------------------------
Open a PR with dry-run logs and `preview_removals` output, tag maintainers for approval before `--force` is used.
CI automation
-------------
- A CI dry-run workflow `.github/workflows/dry-run-history-rewrite.yml` runs a non-destructive check that fails CI when banned history entries or large objects are found. It is triggered on PRs and a daily schedule.
- A PR checklist template `.github/PULL_REQUEST_TEMPLATE/history-rewrite.md` and a checklist validator `.github/workflows/pr-checklist.yml` ensure contributors attach the preview output and backups before seeking approval.
- The PR checklist validator is conditional: it only enforces the checklist when the PR modifies `scripts/history-rewrite/*`, `docs/plans/history_rewrite.md`, or similar history-rewrite related files. This avoids blocking unrelated PRs.