9.8 KiB
History Rewrite: Plan, Checklist, and Recovery
Summary
- This document describes the agreed process, checks, and recovery steps for destructive history rewrites performed with the scripts in
scripts/history-rewrite/. - It updates the previous guidance by adding explicit backup requirements, tag backups, and a
--backup-branchargument orBACKUP_BRANCHenv variable that must be set and pushed to a remote before running a destructive rewrite.
Minimum Requirements
- Tools:
git(>=2.25),git-filter-repo(Python-based utility),pre-commit. - Optional tools:
bats-corefor tests,shellcheckfor linting scripts.
Overview
Use the preview_removals.sh script to preview which commits/objects will be removed. Always run clean_history.sh with --dry-run and create a remote backup branch and a tag backup tarball in data/backups/ before any destructive operation. After a rewrite, run validate_after_rewrite.sh to confirm the repository matches expectations.
Naming Conventions & Backup Policy
- Backup branch name format:
backup/history-YYYYMMDD-HHMMSS. - Tag backup tarball:
data/backups/tags-YYYYMMDD-HHMMSS.tar.gz. - Metadata:
data/backups/history-YYYYMMDD-HHMMSS.jsonwith keysbackup_branch,tag_tar,created_at,remote.
Checklist (Before a Destructive Rewrite)
- Run the preview step and attach output to the PR:
scripts/history-rewrite/preview_removals.sh --paths 'backend/codeql-db' --strip-size 50 --format json- Attach the output (or paste it into the PR) for reviewer consumption.
- Create a local and remote backup branch:
git checkout -b backup/history-YYYYMMDD-HHMMSSgit push origin backup/history-YYYYMMDD-HHMMSS- Record the branch name in
--backup-branchor setBACKUP_BRANCHenv var so validators can find it.
- Capture tags:
git tag -l | xargs -n1 git show-ref --tagsand push tags to the origin, or create a tarball of tags indata/backups/.- Example tag tarball:
git for-each-ref --format='%(refname)' refs/tags/ | xargs -n1 git rev-parse --verify --quiet | tar -czf data/backups/tags-YYYYMMDD-HHMMSS.tar.gz --files-from -(create a scripted helper if needed).
- Ensure
data/backupsexists and is included as a tarball or log attachment in the PR:mkdir -p data/backups && tar -czf data/backups/history-YYYYMMDD-HHMMSS.tar.gz data/backups/(if logs are present).
- Run the CI dry-run job and ensure it completes successfully. If
dry-runreports findings, address them first. - Ensure maintainers approve and that you have a scheduled maintenance window. Do not run a destructive
--forcepush without explicit approvals.
Typical Usage Examples
Preview candidates to remove:
scripts/history-rewrite/preview_removals.sh --paths 'backend/codeql-db,import' --strip-size 50 --format json
Create a backup branch and push:
git checkout -b backup/history-$(date -u +%Y%m%d-%H%M%S)
git push origin HEAD
export BACKUP_BRANCH=$(git rev-parse --abbrev-ref HEAD)
Create a tarball of tags and save logs in data/backups/:
mkdir -p data/backups
git for-each-ref --format='%(refname)' refs/tags/ | xargs -n1 -I{} git show-ref --tags {} >> data/backups/tags-$(date -u +%Y%m%d-%H%M%S).txt
tar -czf data/backups/tags-$(date -u +%Y%m%d-%H%M%S).tar.gz data/backups/*
Dry-run the rewrite (do not push):
scripts/history-rewrite/clean_history.sh --paths 'backend/codeql-db,import' --strip-size 50 --dry-run --backup-branch "$BACKUP_BRANCH"
Perform the rewrite (coordinated action, after approvals):
scripts/history-rewrite/clean_history.sh --paths 'backend/codeql-db,import' --strip-size 50 --backup-branch "$BACKUP_BRANCH" --force
# After local rewrite, force-push coordinated with maintainers: `git push origin --all --force`
Validate after rewrite:
scripts/history-rewrite/validate_after_rewrite.sh --backup-branch "$BACKUP_BRANCH"
Recovery Steps (if things go wrong)
- Ensure your local clone still has the
backup/history-...branch. If the branch was pushed to origin, check it using:git ls-remote origin | grep backup/history-orgit fetch origin backup/history-YYYY....
- Restore the branch to a new or restored head:
git checkout -b restore-YYYY backup/history-YYYYMMDD-HHMMSSgit push origin restore-YYYYand open a PR to restore history.
- For tags: restore from tarball or tag list by re-creating tags and pushing them to the remote:
- `tar -xzf data/backups/tags-YYYYMMDD-HHMMSS.tar.gz -C /tmp/tags
- Recreate tags as needed and
git push origin --tags.
- If a destructive push changed history on remote: coordinate with maintainers to either push restore branches or restore from the backup branch using
git push origin refs/heads/restore-YYYY:refs/heads/main(requires a maintainers-only action).
Checklist for PR Reviewers
- Confirm
data/backupsis present or attached in the PR. - Confirm the backup branch (
backup/history-YYYYMMDD-HHMMSS) is pushed to origin. - Confirm tag backups exist and are included in the backup tarball.
- Ensure
preview_removalsoutput is attached to the PR as evidence. - Ensure maintainers have scheduled the maintenance window and have approved the change.
Notes & Safety
- Avoid running destructive pushes from forks without a coordinated maintainers plan.
- The default behavior of the scripts is non-destructive (
--dry-run)—use--forceonly after approvals. - The
validate_after_rewrite.shscript accepts--backup-branchor readsBACKUP_BRANCHenv var; make sure it's present (or the script will exit non-zero).
For implementation details, see scripts/history-rewrite/ and current CI workflows that run the script tests.
History rewrite plan
Rationale
Some committed CodeQL DB directories or large binary blobs can bloat clones, CI cache sizes, and repository size overall. This plan provides a non-destructive, auditable history-rewrite solution to remove these directories and optionally strip out huge blobs.
Scope
This plan targets CodeQL DB directories (e.g., backend/codeql-db, codeql-db, codeql-db-js, codeql-db-go) and other large blobs. Scripts are non-destructive by default and require --force to make destructive changes.
Risk & Mitigation
- Rewriting history changes commit hashes. We never force-push in the scripts automatically; the maintainer must coordinate before running
git push --force. - Always create a backup branch before rewriting; the script creates
backup/history-YYYYMMDD-HHMMSSand pushes it toorigin. - Require the manual confirmation string
I UNDERSTANDbefore running any destructive change.
Overview of steps
- Prepare: create and checkout a non-main feature branch (do not run on
mainormaster). - Dry-run and preview: run a dry-run to preview commits and blobs to remove.
scripts/history-rewrite/clean_history.sh --dry-run --paths 'backend/codeql-db,codeql-db' --strip-size 50
- Optional detailed preview:
scripts/history-rewrite/preview_removals.sh --paths 'backend/codeql-db,codeql-db' --strip-size 50
- With approval, run the destructive rewrite in a local clone or dedicated environment.
scripts/history-rewrite/clean_history.sh --force --paths 'backend/codeql-db,codeql-db' --strip-size 50- When prompted, type
I UNDERSTANDto proceed.
- Validation: run the validator script and ensure CI passes locally:
scripts/history-rewrite/validate_after_rewrite.sh
- Coordinate with maintainers and force-push only after consensus.
Installation & prerequisites
- git >= 2.25
- git-filter-repo: install via package manager or pip. See https://github.com/newren/git-filter-repo.
- pre-commit (optional): installed in the repository virtual environment (
.venv).
Sample commands and dry-run outputs
Dry-run:
scripts/history-rewrite/clean_history.sh --dry-run --paths 'backend/codeql-db,codeql-db' --strip-size 50
Sample dry-run output (excerpt):
--- Path: backend/codeql-db 2b7c6f8d1a... (commits touching this path) --- Objects in paths f6a9abcd... backend/codeql-db/project.sarif --- Example large objects (candidate for --strip-size) f3ae1234... size=104857600
Force-run (coordination required):
scripts/history-rewrite/clean_history.sh --force --paths 'backend/codeql-db,codeql-db' --strip-size 50
Followed by verification and manual force-push:
- Check
data/backups/history_cleanup-YYYYMMDD-HHMMSS.log scripts/history-rewrite/validate_after_rewrite.shgit push --all --force(only after maintainers approve)
Rollback plan
If problems occur, restore from the backup branch:
git checkout -b restore/YYYYMMDD-HHMMSS backup/history-YYYYMMDD-HHMMSS git push origin restore/YYYYMMDD-HHMMSS
Post rewrite maintenance
- Run
git gc --aggressive --prune=nowon clones and local copies. - Run
git count-objects -vHto confirm size improvements. - Refresh CI caches and mirrors after the change.
Communication & Approval
Open a PR with dry-run logs and preview_removals output, tag maintainers for approval before --force is used.
CI automation
- A CI dry-run workflow
.github/workflows/dry-run-history-rewrite.ymlruns a non-destructive check that fails CI when banned history entries or large objects are found. It is triggered on PRs and a daily schedule. - A PR checklist template
.github/PULL_REQUEST_TEMPLATE/history-rewrite.mdand a checklist validator.github/workflows/pr-checklist.ymlensure contributors attach the preview output and backups before seeking approval. - The PR checklist validator is conditional: it only enforces the checklist when the PR modifies
scripts/history-rewrite/*,docs/plans/history_rewrite.md, or similar history-rewrite related files. This avoids blocking unrelated PRs.