Files
Charon/docs/plans/history_rewrite.md
GitHub Actions 3169b05156 fix: skip incomplete system log viewer tests
- Marked 12 tests as skip pending feature implementation
- Features tracked in GitHub issue #686 (system log viewer feature completion)
- Tests cover sorting by timestamp/level/method/URI/status, pagination controls, filtering by text/level, download functionality
- Unblocks Phase 2 at 91.7% pass rate to proceed to Phase 3 security enforcement validation
- TODO comments in code reference GitHub #686 for feature completion tracking
- Tests skipped: Pagination (3), Search/Filter (2), Download (2), Sorting (1), Log Display (4)
2026-02-09 21:55:55 +00:00

9.8 KiB

History Rewrite: Plan, Checklist, and Recovery

Summary

  • This document describes the agreed process, checks, and recovery steps for destructive history rewrites performed with the scripts in scripts/history-rewrite/.
  • It updates the previous guidance by adding explicit backup requirements, tag backups, and a --backup-branch argument or BACKUP_BRANCH env variable that must be set and pushed to a remote before running a destructive rewrite.

Minimum Requirements

  • Tools: git (>=2.25), git-filter-repo (Python-based utility), pre-commit.
  • Optional tools: bats-core for tests, shellcheck for linting scripts.

Overview

Use the preview_removals.sh script to preview which commits/objects will be removed. Always run clean_history.sh with --dry-run and create a remote backup branch and a tag backup tarball in data/backups/ before any destructive operation. After a rewrite, run validate_after_rewrite.sh to confirm the repository matches expectations.

Naming Conventions & Backup Policy

  • Backup branch name format: backup/history-YYYYMMDD-HHMMSS.
  • Tag backup tarball: data/backups/tags-YYYYMMDD-HHMMSS.tar.gz.
  • Metadata: data/backups/history-YYYYMMDD-HHMMSS.json with keys backup_branch, tag_tar, created_at, remote.

Checklist (Before a Destructive Rewrite)

  1. Run the preview step and attach output to the PR:
    • scripts/history-rewrite/preview_removals.sh --paths 'backend/codeql-db' --strip-size 50 --format json
    • Attach the output (or paste it into the PR) for reviewer consumption.
  2. Create a local and remote backup branch:
    • git checkout -b backup/history-YYYYMMDD-HHMMSS
    • git push origin backup/history-YYYYMMDD-HHMMSS
    • Record the branch name in --backup-branch or set BACKUP_BRANCH env var so validators can find it.
  3. Capture tags:
    • git tag -l | xargs -n1 git show-ref --tags and push tags to the origin, or create a tarball of tags in data/backups/.
    • Example tag tarball: git for-each-ref --format='%(refname)' refs/tags/ | xargs -n1 git rev-parse --verify --quiet | tar -czf data/backups/tags-YYYYMMDD-HHMMSS.tar.gz --files-from - (create a scripted helper if needed).
  4. Ensure data/backups exists and is included as a tarball or log attachment in the PR:
    • mkdir -p data/backups && tar -czf data/backups/history-YYYYMMDD-HHMMSS.tar.gz data/backups/ (if logs are present).
  5. Run the CI dry-run job and ensure it completes successfully. If dry-run reports findings, address them first.
  6. Ensure maintainers approve and that you have a scheduled maintenance window. Do not run a destructive --force push without explicit approvals.

Typical Usage Examples

Preview candidates to remove:

scripts/history-rewrite/preview_removals.sh --paths 'backend/codeql-db,import' --strip-size 50 --format json

Create a backup branch and push:

git checkout -b backup/history-$(date -u +%Y%m%d-%H%M%S)
git push origin HEAD
export BACKUP_BRANCH=$(git rev-parse --abbrev-ref HEAD)

Create a tarball of tags and save logs in data/backups/:

mkdir -p data/backups
git for-each-ref --format='%(refname)' refs/tags/ | xargs -n1 -I{} git show-ref --tags {} >> data/backups/tags-$(date -u +%Y%m%d-%H%M%S).txt
tar -czf data/backups/tags-$(date -u +%Y%m%d-%H%M%S).tar.gz data/backups/*

Dry-run the rewrite (do not push):

scripts/history-rewrite/clean_history.sh --paths 'backend/codeql-db,import' --strip-size 50 --dry-run --backup-branch "$BACKUP_BRANCH"

Perform the rewrite (coordinated action, after approvals):

scripts/history-rewrite/clean_history.sh --paths 'backend/codeql-db,import' --strip-size 50 --backup-branch "$BACKUP_BRANCH" --force
# After local rewrite, force-push coordinated with maintainers: `git push origin --all --force`

Validate after rewrite:

scripts/history-rewrite/validate_after_rewrite.sh --backup-branch "$BACKUP_BRANCH"

Recovery Steps (if things go wrong)

  1. Ensure your local clone still has the backup/history-... branch. If the branch was pushed to origin, check it using:
    • git ls-remote origin | grep backup/history- or git fetch origin backup/history-YYYY....
  2. Restore the branch to a new or restored head:
    • git checkout -b restore-YYYY backup/history-YYYYMMDD-HHMMSS
    • git push origin restore-YYYY and open a PR to restore history.
  3. For tags: restore from tarball or tag list by re-creating tags and pushing them to the remote:
    • `tar -xzf data/backups/tags-YYYYMMDD-HHMMSS.tar.gz -C /tmp/tags
    • Recreate tags as needed and git push origin --tags.
  4. If a destructive push changed history on remote: coordinate with maintainers to either push restore branches or restore from the backup branch using git push origin refs/heads/restore-YYYY:refs/heads/main (requires a maintainers-only action).

Checklist for PR Reviewers

  • Confirm data/backups is present or attached in the PR.
  • Confirm the backup branch (backup/history-YYYYMMDD-HHMMSS) is pushed to origin.
  • Confirm tag backups exist and are included in the backup tarball.
  • Ensure preview_removals output is attached to the PR as evidence.
  • Ensure maintainers have scheduled the maintenance window and have approved the change.

Notes & Safety

  • Avoid running destructive pushes from forks without a coordinated maintainers plan.
  • The default behavior of the scripts is non-destructive (--dry-run)—use --force only after approvals.
  • The validate_after_rewrite.sh script accepts --backup-branch or reads BACKUP_BRANCH env var; make sure it's present (or the script will exit non-zero).

For implementation details, see scripts/history-rewrite/ and current CI workflows that run the script tests.

History rewrite plan

Rationale

Some committed CodeQL DB directories or large binary blobs can bloat clones, CI cache sizes, and repository size overall. This plan provides a non-destructive, auditable history-rewrite solution to remove these directories and optionally strip out huge blobs.

Scope

This plan targets CodeQL DB directories (e.g., backend/codeql-db, codeql-db, codeql-db-js, codeql-db-go) and other large blobs. Scripts are non-destructive by default and require --force to make destructive changes.

Risk & Mitigation

  • Rewriting history changes commit hashes. We never force-push in the scripts automatically; the maintainer must coordinate before running git push --force.
  • Always create a backup branch before rewriting; the script creates backup/history-YYYYMMDD-HHMMSS and pushes it to origin.
  • Require the manual confirmation string I UNDERSTAND before running any destructive change.

Overview of steps

  1. Prepare: create and checkout a non-main feature branch (do not run on main or master).
  2. Dry-run and preview: run a dry-run to preview commits and blobs to remove.
    • scripts/history-rewrite/clean_history.sh --dry-run --paths 'backend/codeql-db,codeql-db' --strip-size 50
  3. Optional detailed preview:
    • scripts/history-rewrite/preview_removals.sh --paths 'backend/codeql-db,codeql-db' --strip-size 50
  4. With approval, run the destructive rewrite in a local clone or dedicated environment.
    • scripts/history-rewrite/clean_history.sh --force --paths 'backend/codeql-db,codeql-db' --strip-size 50
    • When prompted, type I UNDERSTAND to proceed.
  5. Validation: run the validator script and ensure CI passes locally:
    • scripts/history-rewrite/validate_after_rewrite.sh
  6. Coordinate with maintainers and force-push only after consensus.

Installation & prerequisites

Sample commands and dry-run outputs

Dry-run:

scripts/history-rewrite/clean_history.sh --dry-run --paths 'backend/codeql-db,codeql-db' --strip-size 50

Sample dry-run output (excerpt):

--- Path: backend/codeql-db 2b7c6f8d1a... (commits touching this path) --- Objects in paths f6a9abcd... backend/codeql-db/project.sarif --- Example large objects (candidate for --strip-size) f3ae1234... size=104857600

Force-run (coordination required):

scripts/history-rewrite/clean_history.sh --force --paths 'backend/codeql-db,codeql-db' --strip-size 50

Followed by verification and manual force-push:

  • Check data/backups/history_cleanup-YYYYMMDD-HHMMSS.log
  • scripts/history-rewrite/validate_after_rewrite.sh
  • git push --all --force (only after maintainers approve)

Rollback plan

If problems occur, restore from the backup branch:

git checkout -b restore/YYYYMMDD-HHMMSS backup/history-YYYYMMDD-HHMMSS git push origin restore/YYYYMMDD-HHMMSS

Post rewrite maintenance

  • Run git gc --aggressive --prune=now on clones and local copies.
  • Run git count-objects -vH to confirm size improvements.
  • Refresh CI caches and mirrors after the change.

Communication & Approval

Open a PR with dry-run logs and preview_removals output, tag maintainers for approval before --force is used.

CI automation

  • A CI dry-run workflow .github/workflows/dry-run-history-rewrite.yml runs a non-destructive check that fails CI when banned history entries or large objects are found. It is triggered on PRs and a daily schedule.
  • A PR checklist template .github/PULL_REQUEST_TEMPLATE/history-rewrite.md and a checklist validator .github/workflows/pr-checklist.yml ensure contributors attach the preview output and backups before seeking approval.
  • The PR checklist validator is conditional: it only enforces the checklist when the PR modifies scripts/history-rewrite/*, docs/plans/history_rewrite.md, or similar history-rewrite related files. This avoids blocking unrelated PRs.