chore: Enhance documentation for E2E testing:
- Added clarity and structure to README files, including recent updates and getting started sections. - Improved manual verification documentation for CrowdSec authentication, emphasizing expected outputs and success criteria. - Updated debugging guide with detailed output examples and automatic trace capture information. - Refined best practices for E2E tests, focusing on efficient polling, locator strategies, and state management. - Documented triage report for DNS Provider feature tests, highlighting issues fixed and test results before and after improvements. - Revised E2E test writing guide to include when to use specific helper functions and patterns for better test reliability. - Enhanced troubleshooting documentation with clear resolutions for common issues, including timeout and token configuration problems. - Updated tests README to provide quick links and best practices for writing robust tests.
This commit is contained in:
@@ -11,6 +11,7 @@
|
||||
### Issue 1: `rate_limit` handler never appears in running Caddy config
|
||||
|
||||
**Observed symptom** (from CI log):
|
||||
|
||||
```
|
||||
Attempt 10/10: rate_limit handler not found, waiting...
|
||||
✗ rate_limit handler verification failed after 10 attempts
|
||||
@@ -22,6 +23,7 @@ Rate limit enforcement test FAILED
|
||||
#### Code path trace
|
||||
|
||||
The `verify_rate_limit_config` function in `scripts/rate_limit_integration.sh` (lines ~35–58) executes:
|
||||
|
||||
```bash
|
||||
caddy_config=$(curl -s http://localhost:2119/config 2>/dev/null || echo "")
|
||||
if echo "$caddy_config" | grep -q '"handler":"rate_limit"'; then
|
||||
@@ -48,6 +50,7 @@ The handler is absent from Caddy's running config because `ApplyConfig` in `back
|
||||
**Root cause A — silent failure of the security config POST step** (contributing):
|
||||
|
||||
The security config POST step in the script discards stdout only; curl exits 0 for HTTP 4xx without -f flag, so auth failures are invisible:
|
||||
|
||||
```bash
|
||||
# scripts/rate_limit_integration.sh, ~line 248
|
||||
curl -s -X POST -H "Content-Type: application/json" \
|
||||
@@ -55,9 +58,11 @@ curl -s -X POST -H "Content-Type: application/json" \
|
||||
-b ${TMP_COOKIE} \
|
||||
http://localhost:8280/api/v1/security/config >/dev/null
|
||||
```
|
||||
|
||||
No HTTP status check is performed. If this returns 4xx (e.g., `403 Forbidden` because the requesting user lacks the admin role, or `401 Unauthorized` because the cookie was not accepted), the config is never saved to DB, `ApplyConfig` is never called with the rate_limit values, and the handler is never injected.
|
||||
|
||||
The route is protected by `middleware.RequireRole(models.RoleAdmin)` (routes.go:572–573):
|
||||
|
||||
```go
|
||||
securityAdmin := management.Group("/security")
|
||||
securityAdmin.Use(middleware.RequireRole(models.RoleAdmin))
|
||||
@@ -69,6 +74,7 @@ A non-admin authenticated user, or an unauthenticated request, returns `403` sil
|
||||
**Root cause B — warn-and-proceed instead of fail-hard** (amplifier):
|
||||
|
||||
`verify_rate_limit_config` returns `1` on failure, but the calling site in the script treats the failure as non-fatal:
|
||||
|
||||
```bash
|
||||
# scripts/rate_limit_integration.sh, ~line 269
|
||||
if ! verify_rate_limit_config; then
|
||||
@@ -76,11 +82,13 @@ if ! verify_rate_limit_config; then
|
||||
echo "Proceeding with test anyway..."
|
||||
fi
|
||||
```
|
||||
|
||||
The enforcement test that follows is guaranteed to fail when the handler is absent (all requests pass through with HTTP 200, never hitting 429), yet the test proceeds unconditionally. The verification failure should be a hard exit.
|
||||
|
||||
**Root cause C — no response code check for proxy host creation** (contributing):
|
||||
|
||||
The proxy host creation at step 5 checks the status code (`201` vs other), but allows non-201 with a soft log message:
|
||||
|
||||
```bash
|
||||
if [ "$CREATE_STATUS" = "201" ]; then
|
||||
echo "✓ Proxy host created successfully"
|
||||
@@ -88,11 +96,13 @@ else
|
||||
echo " Proxy host may already exist (status: $CREATE_STATUS)"
|
||||
fi
|
||||
```
|
||||
|
||||
If this returns `401` (auth failure), no proxy host is registered. Requests to `http://localhost:8180/get` with `Host: ratelimit.local` then hit Caddy's catch-all route returning HTTP 200 (the Charon frontend), not the backend. No 429 will ever appear regardless of rate limit configuration.
|
||||
|
||||
**Root cause D — `ApplyConfig` failure is swallowed; Caddy not yet ready when config is posted** (primary):
|
||||
|
||||
In `UpdateConfig` (`security_handler.go:289–292`):
|
||||
|
||||
```go
|
||||
if h.caddyManager != nil {
|
||||
if err := h.caddyManager.ApplyConfig(c.Request.Context()); err != nil {
|
||||
@@ -101,6 +111,7 @@ if h.caddyManager != nil {
|
||||
}
|
||||
c.JSON(http.StatusOK, gin.H{"config": payload})
|
||||
```
|
||||
|
||||
If `ApplyConfig` fails (Caddy not yet fully initialized, config validation error), the error is logged as a warning but the HTTP response is still `200 OK`. The test script sees 200, assumes success, and proceeds.
|
||||
|
||||
---
|
||||
@@ -110,11 +121,13 @@ If `ApplyConfig` fails (Caddy not yet fully initialized, config validation error
|
||||
**Observed symptom**: During non-CI Docker builds, the GeoIP download step prints `⚠️ Checksum failed` and creates a `.placeholder` file, but the downloaded `.mmdb` is left on disk alongside the placeholder.
|
||||
|
||||
**Code location**: `Dockerfile`, lines that contain:
|
||||
|
||||
```dockerfile
|
||||
ARG GEOLITE2_COUNTRY_SHA256=aa154fc6bcd712644de232a4abcdd07dac1f801308c0b6f93dbc2b375443da7b
|
||||
```
|
||||
|
||||
**Non-CI verification block** (Dockerfile, local build path):
|
||||
|
||||
```dockerfile
|
||||
if [ -s /app/data/geoip/GeoLite2-Country.mmdb ] && \
|
||||
echo "${GEOLITE2_COUNTRY_SHA256} /app/data/geoip/GeoLite2-Country.mmdb" | sha256sum -c -; then
|
||||
@@ -146,6 +159,7 @@ fi;
|
||||
**Required change**: Capture the HTTP status code from the login response. Fail fast if login returns non-200.
|
||||
|
||||
Exact change — replace:
|
||||
|
||||
```bash
|
||||
curl -s -X POST -H "Content-Type: application/json" \
|
||||
-d '{"email":"ratelimit@example.local","password":"password123"}' \
|
||||
@@ -156,6 +170,7 @@ echo "✓ Authentication complete"
|
||||
```
|
||||
|
||||
With:
|
||||
|
||||
```bash
|
||||
LOGIN_STATUS=$(curl -s -w "\n%{http_code}" -X POST -H "Content-Type: application/json" \
|
||||
-d '{"email":"ratelimit@example.local","password":"password123"}' \
|
||||
@@ -174,6 +189,7 @@ echo "✓ Authentication complete (HTTP $LOGIN_STATUS)"
|
||||
**Current behavior**: Non-201 responses are treated as "may already exist" and execution continues — including `401`/`403` auth failures.
|
||||
|
||||
Required change — replace:
|
||||
|
||||
```bash
|
||||
if [ "$CREATE_STATUS" = "201" ]; then
|
||||
echo "✓ Proxy host created successfully"
|
||||
@@ -183,6 +199,7 @@ fi
|
||||
```
|
||||
|
||||
With:
|
||||
|
||||
```bash
|
||||
if [ "$CREATE_STATUS" = "201" ]; then
|
||||
echo "✓ Proxy host created successfully"
|
||||
@@ -201,6 +218,7 @@ fi
|
||||
**Rationale**: Root Cause D is the primary driver of handler-not-found failures. If Caddy's admin API is not yet fully initialized when the security config is POSTed, `ApplyConfig` fails silently (logged as a warning only), the rate_limit handler is never injected into Caddy's running config, and the verification loop times out. The readiness gate ensures Caddy is accepting admin API requests before any config change is attempted.
|
||||
|
||||
**Required change** — insert before the security config POST:
|
||||
|
||||
```bash
|
||||
echo "Waiting for Caddy admin API to be ready..."
|
||||
for i in {1..20}; do
|
||||
@@ -224,6 +242,7 @@ done
|
||||
**Current behavior**: Response is discarded with `>/dev/null`. No status check.
|
||||
|
||||
Required change — replace:
|
||||
|
||||
```bash
|
||||
curl -s -X POST -H "Content-Type: application/json" \
|
||||
-d "${SEC_CFG_PAYLOAD}" \
|
||||
@@ -234,6 +253,7 @@ echo "✓ Rate limiting configured"
|
||||
```
|
||||
|
||||
With:
|
||||
|
||||
```bash
|
||||
SEC_CONFIG_RESP=$(curl -s -w "\n%{http_code}" -X POST -H "Content-Type: application/json" \
|
||||
-d "${SEC_CFG_PAYLOAD}" \
|
||||
@@ -258,6 +278,7 @@ echo "✓ Rate limiting configured (HTTP $SEC_CONFIG_STATUS)"
|
||||
**Current behavior**: Failed verification logs a warning and continues.
|
||||
|
||||
Required change — replace:
|
||||
|
||||
```bash
|
||||
echo "Waiting for Caddy to apply configuration..."
|
||||
sleep 5
|
||||
@@ -270,6 +291,7 @@ fi
|
||||
```
|
||||
|
||||
With:
|
||||
|
||||
```bash
|
||||
echo "Waiting for Caddy to apply configuration..."
|
||||
sleep 8
|
||||
@@ -307,6 +329,7 @@ local wait=5 # was: 3
|
||||
#### Change 7 — Use trailing slash on Caddy admin API URL in `verify_rate_limit_config`
|
||||
|
||||
**Location**: `verify_rate_limit_config`, line ~42:
|
||||
|
||||
```bash
|
||||
caddy_config=$(curl -s http://localhost:2119/config 2>/dev/null || echo "")
|
||||
```
|
||||
@@ -314,11 +337,13 @@ caddy_config=$(curl -s http://localhost:2119/config 2>/dev/null || echo "")
|
||||
Caddy's admin API specification defines `GET /config/` (with trailing slash) as the canonical endpoint for the full running config. Omitting the slash works in practice because Caddy does not redirect, but using the canonical form is more correct and avoids any future behavioral change:
|
||||
|
||||
Replace:
|
||||
|
||||
```bash
|
||||
caddy_config=$(curl -s http://localhost:2119/config 2>/dev/null || echo "")
|
||||
```
|
||||
|
||||
With:
|
||||
|
||||
```bash
|
||||
caddy_config=$(curl -s http://localhost:2119/config/ 2>/dev/null || echo "")
|
||||
```
|
||||
@@ -377,6 +402,7 @@ fi
|
||||
**Important**: Do NOT remove the `ARG GEOLITE2_COUNTRY_SHA256` declaration from the Dockerfile. The `update-geolite2.yml` workflow uses `sed` to update that ARG. If the ARG disappears, the workflow's `sed` command will silently no-op and fail to update the Dockerfile on next run, leaving the stale hash in source while the workflow reports success. Keeping the ARG (even unused) preserves Renovate/workflow compatibility.
|
||||
|
||||
Keep:
|
||||
|
||||
```dockerfile
|
||||
ARG GEOLITE2_COUNTRY_SHA256=aa154fc6bcd712644de232a4abcdd07dac1f801308c0b6f93dbc2b375443da7b
|
||||
```
|
||||
@@ -402,6 +428,7 @@ This ARG is now only referenced by the `update-geolite2.yml` workflow (to know i
|
||||
### Validating Issue 1 fix
|
||||
|
||||
**Step 1 — Build and run the integration test locally:**
|
||||
|
||||
```bash
|
||||
# From /projects/Charon
|
||||
chmod +x scripts/rate_limit_integration.sh
|
||||
@@ -409,6 +436,7 @@ scripts/rate_limit_integration.sh 2>&1 | tee /tmp/ratelimit-test.log
|
||||
```
|
||||
|
||||
**Expected output sequence (key lines)**:
|
||||
|
||||
```
|
||||
✓ Charon API is ready
|
||||
✓ Authentication complete (HTTP 200)
|
||||
@@ -428,16 +456,20 @@ Sending request 3+1 (should return 429 Too Many Requests)...
|
||||
|
||||
**Step 2 — Deliberately break auth to verify the new guard fires:**
|
||||
Temporarily change `password123` in the login curl to a wrong password. The test should now print:
|
||||
|
||||
```
|
||||
✗ Login failed (HTTP 401) — aborting
|
||||
```
|
||||
|
||||
and exit with code 1, rather than proceeding to a confusing 429-enforcement failure.
|
||||
|
||||
**Step 3 — Verify Caddy config contains the handler before enforcement:**
|
||||
|
||||
```bash
|
||||
# After security config step and sleep 8:
|
||||
curl -s http://localhost:2119/config/ | python3 -m json.tool | grep -A2 '"handler": "rate_limit"'
|
||||
```
|
||||
|
||||
Expected: handler block with `"rate_limits"` sub-key containing `"static"` zone.
|
||||
|
||||
**Step 4 — CI validation:** Push to a PR and observe the `Rate Limiting Integration` workflow. The workflow now exits at the first unmissable error rather than proceeding to a deceptive "enforcement test FAILED" message.
|
||||
@@ -445,21 +477,27 @@ Expected: handler block with `"rate_limits"` sub-key containing `"static"` zone.
|
||||
### Validating Issue 2 fix
|
||||
|
||||
**Step 1 — Local build without CI flag:**
|
||||
|
||||
```bash
|
||||
docker build -t charon:geolip-test --build-arg CI=false . 2>&1 | grep -E "GeoIP|GeoLite|checksum|✅|⚠️"
|
||||
```
|
||||
|
||||
Expected: `✅ GeoIP downloaded` (no mention of checksum failure).
|
||||
|
||||
**Step 2 — Verify file is present and readable:**
|
||||
|
||||
```bash
|
||||
docker run --rm charon:geolip-test stat /app/data/geoip/GeoLite2-Country.mmdb
|
||||
```
|
||||
|
||||
Expected: file exists with non-zero size, no `.placeholder` alongside.
|
||||
|
||||
**Step 3 — Confirm ARG still exists for workflow compatibility:**
|
||||
|
||||
```bash
|
||||
grep "GEOLITE2_COUNTRY_SHA256" Dockerfile
|
||||
```
|
||||
|
||||
Expected: `ARG GEOLITE2_COUNTRY_SHA256=<hash>` line is present.
|
||||
|
||||
---
|
||||
|
||||
@@ -37,6 +37,7 @@ Content-Type: application/json
|
||||
```
|
||||
|
||||
**Key design decisions:**
|
||||
|
||||
- **Token storage:** The bot token is stored in `NotificationProvider.Token` (`json:"-"`, encrypted at rest) — never in the URL field. This mirrors the Gotify pattern where secrets are separated from endpoints.
|
||||
- **URL field:** Stores only the `chat_id` (e.g., `987654321`). At dispatch time, the full API URL is constructed dynamically: `https://api.telegram.org/bot` + decryptedToken + `/sendMessage`. The `chat_id` is passed in the POST body alongside the message text. This prevents token leakage via API responses since URL is `json:"url"`.
|
||||
- **SSRF mitigation:** Before dispatching, validate that the constructed URL hostname is exactly `api.telegram.org`. This prevents SSRF if stored data is tampered with.
|
||||
@@ -475,6 +476,7 @@ Request/response schemas are unchanged. The `type` field now accepts `"telegram"
|
||||
Modeled after `tests/settings/email-notification-provider.spec.ts`.
|
||||
|
||||
Test scenarios:
|
||||
|
||||
1. Create a Telegram provider (name, chat_id in URL field, bot token in token field, enable events)
|
||||
2. Verify provider appears in the list
|
||||
3. Edit the Telegram provider (change name, verify token preservation)
|
||||
@@ -611,6 +613,7 @@ Add telegram to the payload matrix test scenarios.
|
||||
**Scope:** Feature flags, service layer, handler layer, all Go unit tests
|
||||
|
||||
**Files changed:**
|
||||
|
||||
- `backend/internal/notifications/feature_flags.go`
|
||||
- `backend/internal/api/handlers/feature_flags_handler.go`
|
||||
- `backend/internal/notifications/router.go`
|
||||
@@ -624,6 +627,7 @@ Add telegram to the payload matrix test scenarios.
|
||||
**Dependencies:** None (self-contained backend change)
|
||||
|
||||
**Validation gates:**
|
||||
|
||||
- `go test ./...` passes
|
||||
- `make lint-fast` passes
|
||||
- Coverage ≥ 85%
|
||||
@@ -636,6 +640,7 @@ Add telegram to the payload matrix test scenarios.
|
||||
**Scope:** Frontend API client, Notifications page, i18n strings, frontend unit tests, Playwright E2E tests
|
||||
|
||||
**Files changed:**
|
||||
|
||||
- `frontend/src/api/notifications.ts`
|
||||
- `frontend/src/pages/Notifications.tsx`
|
||||
- `frontend/src/locales/en/translation.json`
|
||||
@@ -648,6 +653,7 @@ Add telegram to the payload matrix test scenarios.
|
||||
**Dependencies:** PR-1 must be merged first (backend must accept `type: "telegram"`)
|
||||
|
||||
**Validation gates:**
|
||||
|
||||
- `npm test` passes
|
||||
- `npm run type-check` passes
|
||||
- `npx playwright test --project=firefox` passes
|
||||
|
||||
@@ -55,6 +55,7 @@ disabled={testMutation.isPending || (isNew && !isEmail)}
|
||||
**Why it was added:** The backend `Test` handler at `notification_provider_handler.go` (L333-336) requires a saved provider ID for all non-email types. For Gotify/Telegram, the server needs the stored token. For Discord/Webhook, the server still fetches the provider from DB. Without a saved provider, the backend returns `MISSING_PROVIDER_ID`.
|
||||
|
||||
**Why it breaks tests:** Many existing E2E and unit tests click the test button from a **new (unsaved) provider form** using mocked endpoints. With the new guard:
|
||||
|
||||
1. The `<button>` is `disabled` → browser ignores clicks → mocked routes never receive requests
|
||||
2. Even if not disabled, `handleTest()` returns early with a toast instead of calling `testMutation.mutate()`
|
||||
3. Tests that `waitForRequest` on `/providers/test` time out (60s default)
|
||||
@@ -103,6 +104,7 @@ These tests open the "Add Provider" form (no `id`), click `provider-test-btn`, a
|
||||
| 2 | retry split distinguishes retryable and non-retryable failures | L410 | webhook | `provider-test-btn` disabled for new webhook form; `waitForResponse` times out |
|
||||
|
||||
**Tests that should still pass:**
|
||||
|
||||
- `valid payload flows for discord, gotify, and webhook` (L54) — uses `provider-save-btn`, not test button
|
||||
- `malformed payload scenarios` (L158) — API-level tests via `page.request.post`
|
||||
- `missing required fields block submit` (L192) — uses save button
|
||||
@@ -119,6 +121,7 @@ These tests open the "Add Provider" form (no `id`), click `provider-test-btn`, a
|
||||
| 2 | should test telegram notification provider | L265 | Row-level Send Test button; possible accessible name mismatch in WebKit with `title` attribute |
|
||||
|
||||
**Tests that should pass:**
|
||||
|
||||
- Form rendering tests (L25, L65) — UI assertions only
|
||||
- Create telegram provider (L89) — mocked POST
|
||||
- Delete telegram provider (L324) — mocked DELETE + confirm dialog
|
||||
@@ -265,6 +268,7 @@ it('disables test button when provider is new (unsaved) and not email type', asy
|
||||
**File:** `tests/settings/notifications.spec.ts`
|
||||
|
||||
**Strategy:** For tests that click the test button from a new form, restructure the flow to:
|
||||
|
||||
1. First **save** the provider (mocked create → returns id)
|
||||
2. Then **test** from the saved provider row's Send Test button (row buttons are not gated by `isNew`)
|
||||
|
||||
@@ -360,6 +364,7 @@ Same pattern: save first, then test from row.
|
||||
#### Fix 9: "should edit telegram notification provider and preserve token" (L159)
|
||||
|
||||
**Problem:** Uses fragile keyboard navigation to reach the Edit button:
|
||||
|
||||
```typescript
|
||||
await sendTestButton.focus();
|
||||
await page.keyboard.press('Tab');
|
||||
@@ -388,6 +393,7 @@ Or use a structural locator based on the edit icon class.
|
||||
**Probable issue:** The `getByRole('button', { name: /send test/i })` relies on `title` for accessible name. WebKit may not compute accessible name from `title` the same way.
|
||||
|
||||
**Fix (source — preferred):** Add explicit `aria-label` to the row Send Test button in `Notifications.tsx` (L703):
|
||||
|
||||
```tsx
|
||||
<Button
|
||||
variant="secondary"
|
||||
@@ -399,6 +405,7 @@ Or use a structural locator based on the edit icon class.
|
||||
```
|
||||
|
||||
**Fix (test — alternative):** Use structural locator:
|
||||
|
||||
```typescript
|
||||
const sendTestButton = providerRow.locator('button').first();
|
||||
```
|
||||
@@ -469,18 +476,21 @@ Consider adding `aria-label` attributes to all icon-only buttons in the provider
|
||||
**Rationale:** All fixes are tightly coupled to the Telegram feature PR and represent test adaptations to a correct behavioral change. No cross-domain changes. Small total diff.
|
||||
|
||||
### Commit 1: "fix(test): adapt notification tests to save-before-test guard"
|
||||
|
||||
- **Scope:** All unit test and E2E test fixes (Phases 1-3)
|
||||
- **Files:** `Notifications.test.tsx`, `notifications.spec.ts`, `notifications-payload.spec.ts`, `telegram-notification-provider.spec.ts`
|
||||
- **Dependencies:** None
|
||||
- **Validation Gate:** All notification-related tests pass locally on at least one browser
|
||||
|
||||
### Commit 2: "feat(a11y): add aria-labels to notification provider row buttons"
|
||||
|
||||
- **Scope:** Source code accessibility improvement (Phase 4)
|
||||
- **Files:** `Notifications.tsx`
|
||||
- **Dependencies:** Depends on Commit 1 (tests must pass first)
|
||||
- **Validation Gate:** Telegram spec tests pass consistently on WebKit
|
||||
|
||||
### Rollback
|
||||
|
||||
- These are test-only changes (except the optional aria-label). Reverting either commit has zero production impact.
|
||||
- If tests still fail after fixes, the next step is to run with `--debug` and capture trace artifacts.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user