Files

GitHub Actions acea4307ba Enhance documentation and testing plans

- Added references to existing test files in the UI/UX testing plan.
- Updated CI failure remediation plan with improved file paths and clarity.
- Expanded CrowdSec full implementation documentation with detailed configuration steps and scripts.
- Improved CrowdSec testing plan with clearer objectives and expected results.
- Updated current specification documentation with additional context on CVE remediation.
- Enhanced docs-to-issues workflow documentation for better issue tracking.
- Corrected numbering in UI/UX bugfixes specification for clarity.
- Improved WAF testing plan with detailed curl commands and expected results.
- Updated QA reports for CrowdSec implementation and UI/UX testing with detailed results and coverage metrics.
- Fixed rate limit integration test summary with clear identification of issues and resolutions.
- Enhanced rate limit test status report with detailed root causes and next steps for follow-up.

2025-12-14 02:45:24 +00:00

16 KiB

Raw Blame History

CI/CD Failure Remediation Plan

Created: 2025-12-13 Status: Planned Priority: High

Executive Summary

Three GitHub Actions workflows have failed. This document provides root cause analysis and specific remediation steps for each failure.

1. Quality Checks Workflow (`quality-checks.yml`)

1.1 Frontend Test Timeout

File: frontend/src/components/tests/LiveLogViewer.test.tsx Test: "displays blocked requests with special styling" under "Security Mode" Error: Test timed out in 5000ms

Root Cause Analysis

The failing test at line 374 has a race condition between:

The await act(async () => { mockOnSecurityMessage(blockedLog); }) call
The subsequent waitFor assertions

The test attempts to verify multiple DOM elements after sending a security log message:

await waitFor(() => {
  // Use getAllByText since 'WAF' appears both in dropdown option and source badge
  const wafElements = screen.getAllByText('WAF');
  expect(wafElements.length).toBeGreaterThanOrEqual(2); // Option + badge
  expect(screen.getByText('10.0.0.1')).toBeTruthy();
  expect(screen.getByText(/BLOCKED: SQL injection detected/)).toBeTruthy();
  // Block reason is shown in brackets - check for the text content
  expect(screen.getByText(/\[SQL injection detected\]/)).toBeTruthy();
});

Issues identified:

Multiple assertions in single waitFor: The waitFor contains 4 separate assertions. If any one fails, the entire block retries, potentially causing timeout.
Complex regex matching: The regex patterns /BLOCKED: SQL injection detected/ and /\[SQL injection detected\]/ may be matching against DOM elements that are not yet rendered.
State update timing: The act() wrapper is async but the component's state update (setLogs) may not complete before waitFor starts checking.

Recommended Fix

Option A: Increase test timeout (Quick fix)

// Line 371-374 - Add timeout option
it('displays blocked requests with special styling', async () => {
  render(<LiveLogViewer mode="security" />);

  // Wait for connection to establish
  await waitFor(() => expect(screen.getByText('Connected')).toBeTruthy());

  const blockedLog: logsApi.SecurityLogEntry = {
    timestamp: '2025-12-12T10:30:00Z',
    level: 'warn',
    logger: 'http.handlers.waf',
    client_ip: '10.0.0.1',
    method: 'POST',
    uri: '/admin',
    status: 403,
    duration: 0.001,
    size: 0,
    user_agent: 'Attack/1.0',
    host: 'example.com',
    source: 'waf',
    blocked: true,
    block_reason: 'SQL injection detected',
  };

  // Send message inside act to properly handle state updates
  await act(async () => {
    if (mockOnSecurityMessage) {
      mockOnSecurityMessage(blockedLog);
    }
  });

  // Split assertions into separate waitFor calls to isolate failures
  await waitFor(() => {
    expect(screen.getByText('10.0.0.1')).toBeTruthy();
  }, { timeout: 10000 });

  await waitFor(() => {
    expect(screen.getByText(/BLOCKED: SQL injection detected/)).toBeTruthy();
  });

  await waitFor(() => {
    expect(screen.getByText(/\[SQL injection detected\]/)).toBeTruthy();
  });

  // Verify WAF badge appears (separate from dropdown option)
  await waitFor(() => {
    const wafElements = screen.getAllByText('WAF');
    expect(wafElements.length).toBeGreaterThanOrEqual(2);
  });
}, 15000); // Increase overall test timeout

Option B: Use findBy queries (Preferred)

it('displays blocked requests with special styling', async () => {
  render(<LiveLogViewer mode="security" />);

  // Wait for connection
  await screen.findByText('Connected');

  const blockedLog: logsApi.SecurityLogEntry = {
    // ... same as before
  };

  await act(async () => {
    if (mockOnSecurityMessage) {
      mockOnSecurityMessage(blockedLog);
    }
  });

  // Use findBy for async queries - cleaner and more reliable
  await screen.findByText('10.0.0.1');
  await screen.findByText(/BLOCKED: SQL injection detected/);
  await screen.findByText(/\[SQL injection detected\]/);

  // For getAllByText, wrap in waitFor since findAllBy isn't used for counts
  await waitFor(() => {
    const wafElements = screen.getAllByText('WAF');
    expect(wafElements.length).toBeGreaterThanOrEqual(2);
  });
});

1.2 Backend gosec G115 - Integer Overflow Errors

1.2.1 backup_service.go Line 293

File: backend/internal/services/backup_service.go Error: G115: integer overflow conversion uint64 -> int64

Current Code:

func (s *BackupService) GetAvailableSpace() (int64, error) {
    var stat syscall.Statfs_t
    if err := syscall.Statfs(s.BackupDir, &stat); err != nil {
        return 0, fmt.Errorf("failed to get disk space: %w", err)
    }
    // Available blocks * block size = available bytes
    return int64(stat.Bavail) * int64(stat.Bsize), nil  // Line 293
}

Root Cause:

stat.Bavail is uint64 (available blocks)
stat.Bsize is int64 (block size, can be negative on some systems but practically always positive)
Converting uint64 to int64 can overflow if Bavail exceeds math.MaxInt64

Recommended Fix:

import (
    "math"
    // ... other imports
)

func (s *BackupService) GetAvailableSpace() (int64, error) {
    var stat syscall.Statfs_t
    if err := syscall.Statfs(s.BackupDir, &stat); err != nil {
        return 0, fmt.Errorf("failed to get disk space: %w", err)
    }

    // Safe conversion with overflow check
    // Bavail is uint64, Bsize is int64 (but always positive for valid filesystems)
    bavail := stat.Bavail
    bsize := stat.Bsize

    // Check for negative block size (invalid filesystem state)
    if bsize < 0 {
        return 0, fmt.Errorf("invalid block size: %d", bsize)
    }

    // Check if Bavail exceeds int64 max before conversion
    if bavail > uint64(math.MaxInt64) {
        // Cap at max int64 - this represents ~8 exabytes which is sufficient
        return math.MaxInt64, nil
    }

    // Safe to convert now
    availBlocks := int64(bavail)
    blockSize := bsize

    // Check for overflow in multiplication
    if availBlocks > 0 && blockSize > math.MaxInt64/availBlocks {
        return math.MaxInt64, nil
    }

    return availBlocks * blockSize, nil
}

1.2.2 proxy_host_handler.go Lines 216 and 235

File: backend/internal/api/handlers/proxy_host_handler.go Error: G115: integer overflow conversion int -> uint

Current Code (Lines 213-222 for certificate_id, Lines 231-240 for access_list_id):

// Line 216
case int:
    id := uint(t)  // G115: int -> uint overflow
    host.CertificateID = &id

// Line 235
case int:
    id := uint(t)  // G115: int -> uint overflow
    host.AccessListID = &id

Root Cause:

When JSON is unmarshaled into map[string]interface{}, numeric values can become float64 or int
Converting a negative int to uint causes overflow (e.g., -1 becomes 18446744073709551615)
Database IDs should never be negative

Recommended Fix:

Create a helper function and apply to both locations:

// Add at package level or in a shared util package
// safeIntToUint safely converts an int to uint, returning false if negative
func safeIntToUint(i int) (uint, bool) {
    if i < 0 {
        return 0, false
    }
    return uint(i), true
}

// safeFloat64ToUint safely converts a float64 to uint, returning false if negative or fractional
func safeFloat64ToUint(f float64) (uint, bool) {
    if f < 0 || f != float64(uint(f)) {
        return 0, false
    }
    return uint(f), true
}

Updated handler code (Lines 210-245):

if v, ok := payload["certificate_id"]; ok {
    if v == nil {
        host.CertificateID = nil
    } else {
        switch t := v.(type) {
        case float64:
            if id, ok := safeFloat64ToUint(t); ok {
                host.CertificateID = &id
            }
        case int:
            if id, ok := safeIntToUint(t); ok {
                host.CertificateID = &id
            }
        case string:
            if n, err := strconv.ParseUint(t, 10, 32); err == nil {
                id := uint(n)
                host.CertificateID = &id
            }
        }
    }
}
if v, ok := payload["access_list_id"]; ok {
    if v == nil {
        host.AccessListID = nil
    } else {
        switch t := v.(type) {
        case float64:
            if id, ok := safeFloat64ToUint(t); ok {
                host.AccessListID = &id
            }
        case int:
            if id, ok := safeIntToUint(t); ok {
                host.AccessListID = &id
            }
        case string:
            if n, err := strconv.ParseUint(t, 10, 32); err == nil {
                id := uint(n)
                host.AccessListID = &id
            }
        }
    }
}

2. PR Checklist Validation Workflow (`pr-checklist.yml`)

Error: "Validate history-rewrite checklist (conditional)" failed

Root Cause Analysis

The workflow at .github/workflows/pr-checklist.yml validates PRs that touch history-rewrite related files. It checks for three items in the PR body:

preview_removals.sh mention - Pattern: /preview_removals\.sh/i
data/backups mention - Pattern: /data\/?backups/i
explicit non-run of --force - Pattern: /(?:\[\s*[xX]\s*\]\s*)?(?:i will not run|will not run|do not run|don'?t run|won'?t run)\b[^\n]*--force/i

When this check triggers:

The check only runs if the PR modifies files matching:

scripts/history-rewrite/*
docs/plans/history_rewrite.md
Any file containing history-rewrite in the path

Resolution Options

Option A: If PR legitimately touches history-rewrite files

Update the PR description to include all required checklist items from .github/PULL_REQUEST_TEMPLATE/history-rewrite.md:

## Checklist - required for history rewrite PRs

- [x] I have created a **local** backup branch: `backup/history-YYYYMMDD-HHMMSS`
- [x] I have pushed the backup branch to the remote origin
- [x] I have run a dry-run locally: `scripts/history-rewrite/preview_removals.sh --paths '...'`
- [x] I have verified the `data/backups` tarball is present
- [x] I will not run the destructive `--force` step without explicit approval

Option B: If PR doesn't need history-rewrite validation

Ensure the PR doesn't modify files in:

scripts/history-rewrite/
docs/plans/history_rewrite.md
Any files with history-rewrite in the name

Option C: False positive - workflow issue

If the workflow is triggering incorrectly, check the file list detection logic at line 27-28 of the workflow.

3. Benchmark Workflow (`benchmark.yml`)

3.1 "Resource not accessible by integration" Error

Root Cause:

The benchmark-action/github-action-benchmark@v1 action requires write permissions to push benchmark results to the repository. This fails on:

Pull requests from forks (restricted permissions)
PRs where GITHUB_TOKEN doesn't have contents: write permission

Current Workflow Configuration:

permissions:
  contents: write
  deployments: write

The error occurs because:

On PRs, the token may not have write access
The auto-push: true setting tries to push on main branch only, but the action still needs permissions to access the benchmark data

Recommended Fix:

- name: Store Benchmark Result
  uses: benchmark-action/github-action-benchmark@v1
  with:
    name: Go Benchmark
    tool: 'go'
    output-file-path: backend/output.txt
    github-token: ${{ secrets.GITHUB_TOKEN }}
    # Only auto-push on main branch pushes, not PRs
    auto-push: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
    alert-threshold: '150%'
    comment-on-alert: true
    fail-on-alert: false
    summary-always: true
    # Skip external data storage on PRs to avoid permission errors
    skip-fetch-gh-pages: ${{ github.event_name == 'pull_request' }}

Alternatively, for fork PRs, consider using pull_request_target event with caution or skip the benchmark action entirely on PRs.

3.2 Performance Regression (1.51x threshold exceeded)

Warning: Performance regression 1.51x (165768 ns/op vs 109674 ns/op) exceeds 1.5x threshold

Benchmark Functions Available:

From backend/internal/api/handlers/benchmark_test.go:

Benchmark	Line
`BenchmarkSecurityHandler_GetStatus`	47
`BenchmarkSecurityHandler_GetStatus_NoSettings`	82
`BenchmarkSecurityHandler_ListDecisions`	105
`BenchmarkSecurityHandler_ListRuleSets`	138
`BenchmarkSecurityHandler_UpsertRuleSet`	171
`BenchmarkSecurityHandler_CreateDecision`	202
`BenchmarkSecurityHandler_GetConfig`	233
`BenchmarkSecurityHandler_UpdateConfig`	266
`BenchmarkSecurityHandler_GetStatus_Parallel`	303
`BenchmarkSecurityHandler_ListDecisions_Parallel`	336
`BenchmarkSecurityHandler_LargeRuleSetContent`	383
`BenchmarkSecurityHandler_ManySettingsLookups`	420

Root Cause Analysis:

The 1.51x regression (165768 ns vs 109674 ns ≈ 56μs increase) likely comes from:

Database operations: Benchmarks use in-memory SQLite. Any additional queries or model changes would show up here.
Parallel benchmark flakiness: BenchmarkSecurityHandler_GetStatus_Parallel and BenchmarkSecurityHandler_ListDecisions_Parallel use shared memory databases which can have contention.
CI environment variability: GitHub Actions runners have variable performance.

Investigation Steps:

Run benchmarks locally to establish baseline:

cd backend && go test -bench=. -benchmem -benchtime=3s ./internal/api/handlers/... -run=^$

Compare with previous commit:

git stash
git checkout HEAD~1
go test -bench=. -benchmem ./internal/api/handlers/... -run=^$ > bench_old.txt
git checkout -
git stash pop
go test -bench=. -benchmem ./internal/api/handlers/... -run=^$ > bench_new.txt
benchstat bench_old.txt bench_new.txt

Check recent changes to:
- internal/api/handlers/security_handler.go
- internal/models/security*.go
- Database query patterns

Recommended Actions:

If real regression:

Profile the affected handler using go test -cpuprofile
Review recent commits for inefficient code
Optimize the specific slow path

If CI flakiness:

Increase alert-threshold to 175% or 200%
Add -benchtime=3s for more stable results
Consider running benchmarks multiple times and averaging

Workflow Fix for Threshold:

- name: Store Benchmark Result
  uses: benchmark-action/github-action-benchmark@v1
  with:
    # ... other options
    # Increase threshold to account for CI variability
    alert-threshold: '175%'
    # Don't fail on alert for benchmarks - just warn
    fail-on-alert: false

Summary of Required Changes

File	Change	Priority
`frontend/src/components/__tests__/LiveLogViewer.test.tsx`	Split waitFor assertions, increase timeout	High
`backend/internal/services/backup_service.go`	Add overflow protection for `GetAvailableSpace`	High
`backend/internal/api/handlers/proxy_host_handler.go`	Add safe int-to-uint conversion helpers	High
`.github/workflows/benchmark.yml`	Add permission guards, adjust threshold	Medium
PR Description	Add history-rewrite checklist items if applicable	Conditional

Implementation Order

Backend gosec fixes - Blocking CI, straightforward fixes
Frontend test timeout - Blocking CI, may need iteration
Benchmark workflow - Non-blocking (fail-on-alert: false), can be addressed after
PR checklist - Context-dependent, may not need code changes

Testing Verification

After implementing fixes:

# Backend
cd backend && go test ./... -v
golangci-lint run  # Should pass gosec G115

# Frontend
cd frontend && npm test

# Full CI simulation
pre-commit run --all-files

16 KiB Raw Blame History

CI/CD Failure Remediation Plan

Executive Summary

1. Quality Checks Workflow (quality-checks.yml)

1.1 Frontend Test Timeout

Root Cause Analysis

Recommended Fix

1.2 Backend gosec G115 - Integer Overflow Errors

1.2.1 backup_service.go Line 293

1.2.2 proxy_host_handler.go Lines 216 and 235

2. PR Checklist Validation Workflow (pr-checklist.yml)

Root Cause Analysis

Resolution Options

3. Benchmark Workflow (benchmark.yml)

3.1 "Resource not accessible by integration" Error

3.2 Performance Regression (1.51x threshold exceeded)

Summary of Required Changes

Implementation Order

Testing Verification

16 KiB

Raw Blame History

1. Quality Checks Workflow (`quality-checks.yml`)

2. PR Checklist Validation Workflow (`pr-checklist.yml`)

3. Benchmark Workflow (`benchmark.yml`)