feat: add ManualDNSChallenge component and related hooks for manual DNS challenge management

- Implemented `useManualChallenge`, `useChallengePoll`, and `useManualChallengeMutations` hooks for managing manual DNS challenges. - Created tests for the `useManualChallenge` hooks to ensure correct fetching and mutation behavior. - Added `ManualDNSChallenge` component for displaying challenge details and actions. - Developed end-to-end tests for the Manual DNS Provider feature, covering provider selection, challenge UI, and accessibility compliance. - Included error handling tests for verification failures and network errors.
2026-01-12 04:01:40 +00:00
parent a199dfd079
commit d7939bed70
132 changed files with 8680 additions and 878 deletions
--- a/docs/plans/custom_dns_plugin_spec.md
+++ b/docs/plans/custom_dns_plugin_spec.md
@@ -5,7 +5,7 @@
 **Estimated Time:** 48-68 hours
 **Author:** Planning Agent
 **Date:** January 8, 2026
-**Last Revised:** January 8, 2026
+**Last Revised:** January 11, 2026
 **Related:** [Phase 5 Custom Plugins Spec](phase5_custom_plugins_spec.md)

 ---
@@ -274,6 +274,71 @@ If Charon is unavailable during a DNS challenge:
 3. **Health check**: Caddy pre-checks Charon availability via `/health` before initiating challenges
 4. **Circuit breaker**: After 5 consecutive failures, Caddy disables the custom provider for 5 minutes

+#### 3.3.5 Concurrent Challenge Handling
+
+To prevent race conditions when multiple certificate requests target the same FQDN simultaneously:
+
+**Database Locking Strategy:**
+```sql
+-- Acquire exclusive lock when creating challenge for FQDN
+BEGIN;
+SELECT * FROM dns_challenges
+  WHERE fqdn = '_acme-challenge.example.com'
+  AND status IN ('created', 'pending', 'verifying')
+  FOR UPDATE NOWAIT;
+-- If lock acquired and no active challenge exists, create new challenge
+-- Otherwise, return CHALLENGE_IN_PROGRESS error
+COMMIT;
+```
+
+**Queueing Behavior:**
+| Scenario | Behavior |
+|----------|----------|
+| No active challenge for FQDN | Create new challenge immediately |
+| Active challenge exists (same user) | Return existing challenge ID |
+| Active challenge exists (different user) | Return `CHALLENGE_IN_PROGRESS` (409) |
+| Active challenge expired/failed | Allow new challenge creation |
+
+**Implementation Requirements:**
+```go
+func (s *ChallengeService) CreateChallenge(ctx context.Context, fqdn string, userID uint) (*Challenge, error) {
+    tx := s.db.Begin()
+    defer tx.Rollback()
+
+    // Attempt to acquire lock on existing active challenges
+    var existing Challenge
+    err := tx.Set("gorm:query_option", "FOR UPDATE NOWAIT").
+        Where("fqdn = ? AND status IN (?)", fqdn, []string{"created", "pending", "verifying"}).
+        First(&existing).Error
+
+    if err == nil {
+        // Active challenge exists
+        if existing.UserID == userID {
+            return &existing, nil // Return existing challenge to same user
+        }
+        return nil, ErrChallengeInProgress // Different user, reject
+    }
+
+    if !errors.Is(err, gorm.ErrRecordNotFound) {
+        return nil, fmt.Errorf("lock acquisition failed: %w", err)
+    }
+
+    // No active challenge, create new one
+    challenge := &Challenge{FQDN: fqdn, UserID: userID, Status: "created"}
+    if err := tx.Create(challenge).Error; err != nil {
+        return nil, err
+    }
+
+    tx.Commit()
+    return challenge, nil
+}
+```
+
+**Timeout Handling:**
+- Challenges automatically transition to `expired` after 10 minutes
+- Expired challenges release the "lock" on the FQDN
+- Subsequent requests can then create new challenges
+
 ### 3.4 Database Model Impact

 Current `dns_providers` table schema:
@@ -343,6 +408,65 @@ User provides webhook URLs for create/delete TXT records. Charon POSTs JSON payl
 }
 ```

+#### Security Hardening
+
+**DNS Rebinding Protection:**
+Webhook URLs MUST be validated at both configuration time AND request execution time to prevent DNS rebinding attacks:
+
+```go
+// Configuration-time validation
+func (w *WebhookProvider) ValidateCredentials(creds map[string]string) error {
+    if err := security.ValidateExternalURL(creds["create_url"]); err != nil {
+        return fmt.Errorf("create_url validation failed: %w", err)
+    }
+    // ... validate delete_url
+}
+
+// Execution-time validation (re-validate before each request)
+func (w *WebhookProvider) executeWebhook(ctx context.Context, url string, payload []byte) error {
+    // Re-validate URL to prevent DNS rebinding
+    if err := security.ValidateExternalURL(url); err != nil {
+        return fmt.Errorf("webhook URL failed re-validation: %w", err)
+    }
+    // ... execute request
+}
+```
+
+**Response Size Limit:**
+```go
+const MaxWebhookResponseSize = 1 * 1024 * 1024 // 1MB
+
+// Enforce response size limit
+resp, err := client.Do(req)
+if err != nil {
+    return err
+}
+defer resp.Body.Close()
+
+limitedReader := io.LimitReader(resp.Body, MaxWebhookResponseSize+1)
+body, err := io.ReadAll(limitedReader)
+if len(body) > MaxWebhookResponseSize {
+    return ErrWebhookResponseTooLarge
+}
+```
+
+**TLS Validation:**
+```json
+{
+  "credentials": {
+    "insecure_skip_verify": false
+  }
+}
+```
+
+> ⚠️ **WARNING:** Setting `insecure_skip_verify: true` disables TLS certificate validation. This should ONLY be used in development/testing environments with self-signed certificates. NEVER enable in production.
+
+**Idempotency Requirement:**
+Webhook endpoints MUST support the `request_id` field for request deduplication. Charon will include a unique `request_id` (UUIDv4) in every webhook payload. Webhook implementations SHOULD:
+1. Store processed `request_id` values with a TTL of at least 24 hours
+2. Return cached response for duplicate `request_id` values
+3. Use `request_id` for audit logging correlation
+
 #### Rate Limiting and Circuit Breaker

 To prevent abuse and ensure reliability, webhook plugins enforce:
@@ -352,6 +476,7 @@ To prevent abuse and ensure reliability, webhook plugins enforce:
 | Max calls per minute | 10 | Requests beyond limit return 429 Too Many Requests |
 | Circuit breaker threshold | 5 consecutive failures | Provider disabled for 5 minutes |
 | Circuit breaker reset | Automatic after 5 minutes | First successful call fully resets counter |
+| Max response size | 1MB | Responses exceeding limit return 413 error |

 **Implementation Requirements:**
 ```go
@@ -462,6 +587,125 @@ esac
 3. Timeout prevents resource exhaustion
 4. All executions are audit-logged

+#### Security Requirements (Mandatory)
+
+**Argument Sanitization:**
+All script arguments MUST be validated against a strict allowlist pattern:
+
+```go
+var validArgumentPattern = regexp.MustCompile(`^[a-zA-Z0-9._=-]+$`)
+
+func sanitizeArgument(arg string) (string, error) {
+    if !validArgumentPattern.MatchString(arg) {
+        return "", ErrInvalidScriptArgument
+    }
+    if len(arg) > 1024 {
+        return "", ErrArgumentTooLong
+    }
+    return arg, nil
+}
+
+// Usage
+for i, arg := range args {
+    sanitized, err := sanitizeArgument(arg)
+    if err != nil {
+        return fmt.Errorf("argument %d contains invalid characters: %w", i, err)
+    }
+    args[i] = sanitized
+}
+```
+
+**Symlink Resolution:**
+Path validation MUST use `filepath.EvalSymlinks()` BEFORE checking the allowed directory prefix to prevent symlink escape attacks:
+
+```go
+func validateScriptPath(scriptPath string) error {
+    // CRITICAL: Resolve symlinks FIRST
+    resolvedPath, err := filepath.EvalSymlinks(scriptPath)
+    if err != nil {
+        return fmt.Errorf("failed to resolve script path: %w", err)
+    }
+
+    // Then validate resolved path is within allowed directory
+    absPath, err := filepath.Abs(resolvedPath)
+    if err != nil {
+        return fmt.Errorf("failed to resolve absolute path: %w", err)
+    }
+
+    allowedDir := "/scripts/"
+    if !strings.HasPrefix(absPath, allowedDir) {
+        return ErrScriptPathInvalid
+    }
+
+    return nil
+}
+```
+
+**Resource Limits (MANDATORY):**
+The following rlimits MUST be enforced for all script executions:
+
+| Resource | Limit | Purpose |
+|----------|-------|------|
+| `RLIMIT_NOFILE` | 256 | Prevent file descriptor exhaustion |
+| `RLIMIT_NPROC` | 64 | Prevent fork bombs |
+| `RLIMIT_AS` | 256MB | Prevent memory exhaustion |
+| `RLIMIT_CPU` | 60s | Prevent CPU exhaustion |
+| `RLIMIT_FSIZE` | 10MB | Prevent disk filling |
+
+```go
+// MANDATORY: Apply rlimits before script execution
+func setMandatoryResourceLimits() error {
+    limits := []struct {
+        resource int
+        limit    uint64
+    }{
+        {syscall.RLIMIT_NOFILE, 256},
+        {syscall.RLIMIT_NPROC, 64},
+        {syscall.RLIMIT_AS, 256 * 1024 * 1024},
+        {syscall.RLIMIT_CPU, 60},
+        {syscall.RLIMIT_FSIZE, 10 * 1024 * 1024},
+    }
+
+    for _, l := range limits {
+        if err := syscall.Setrlimit(l.resource, &syscall.Rlimit{Cur: l.limit, Max: l.limit}); err != nil {
+            return fmt.Errorf("failed to set rlimit %d: %w", l.resource, err)
+        }
+    }
+    return nil
+}
+```
+
+**Environment Variable Clearing:**
+Inherited environment variables MUST be explicitly cleared before setting script environment:
+
+```go
+func executeScript(scriptPath string, args []string, userEnv map[string]string) error {
+    cmd := exec.CommandContext(ctx, scriptPath, args...)
+
+    // CRITICAL: Start with empty environment (clear inherited vars)
+    cmd.Env = []string{}
+
+    // Add only essential system variables
+    cmd.Env = append(cmd.Env,
+        "PATH=/usr/local/bin:/usr/bin:/bin",
+        "HOME=/tmp",
+        "LANG=C.UTF-8",
+        "TZ=UTC",
+    )
+
+    // Add user-provided environment variables (after validation)
+    for key, value := range userEnv {
+        if err := validateEnvVar(key, value); err != nil {
+            return fmt.Errorf("invalid env var %s: %w", key, err)
+        }
+        cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", key, value))
+    }
+
+    // Execute with cleared environment
+    return cmd.Run()
+}
+```
+
 #### Implementation Complexity
 - Backend: ~250 lines (ScriptProvider + executor)
 - Frontend: ~80 lines (form fields)
@@ -491,11 +735,52 @@ RFC 2136 defines a standard protocol for dynamic DNS updates. Supported by BIND,
 ```

 #### TSIG Algorithms Supported
- `hmac-md5` (legacy)
- `hmac-sha1`
- `hmac-sha256` (recommended)
- `hmac-sha384`
- `hmac-sha512`
+
+| Algorithm | Status | Notes |
+|-----------|--------|-------|
+| `hmac-md5` | ⚠️ **DEPRECATED** | Cryptographically weak; will be removed in v2.0 |
+| `hmac-sha1` | Legacy | Avoid for new deployments |
+| `hmac-sha256` | ✅ Recommended | Default for new configurations |
+| `hmac-sha384` | Supported | Higher security, slightly more overhead |
+| `hmac-sha512` | Supported | Highest security |
+
+> ⚠️ **DEPRECATION WARNING:** `hmac-md5` is cryptographically weak and should not be used for new deployments. Support for `hmac-md5` will be removed in Charon v2.0. Migrate to `hmac-sha256` or stronger.
+
+**Secure Memory Handling for TSIG Secrets:**
+
+TSIG secrets MUST be handled securely in memory:
+
+```go
+import "github.com/awnumar/memguard"
+
+type RFC2136Provider struct {
+    tsigSecret *memguard.Enclave // Encrypted in memory
+}
+
+func (r *RFC2136Provider) SetTSIGSecret(secret []byte) error {
+    // Store secret in encrypted memory enclave
+    enclave := memguard.NewEnclave(secret)
+
+    // Immediately wipe the source buffer
+    memguard.WipeBytes(secret)
+
+    r.tsigSecret = enclave
+    return nil
+}
+
+func (r *RFC2136Provider) Cleanup() error {
+    if r.tsigSecret != nil {
+        r.tsigSecret.Destroy()
+    }
+    return nil
+}
+```
+
+**Requirements:**
+1. TSIG secrets MUST be stored in encrypted memory enclaves when in use
+2. Source buffers containing secrets MUST be wiped immediately after copying
+3. Secrets MUST NOT appear in debug output, stack traces, or core dumps
+4. Provider `Cleanup()` MUST securely destroy all secret material

 #### DNS UPDATE Message Flow
 ```
@@ -600,6 +885,85 @@ No automation - UI shows required TXT record details, user creates manually, cli
 - Polling endpoint for UI updates (10-second interval)
 - Timeout after configurable period

+#### Session Security Requirements
+
+**Challenge-User Binding:**
+Manual challenges MUST be bound to the authenticated user's session:
+
+```go
+type Challenge struct {
+    ID        string    `json:"id"`         // UUIDv4 (cryptographically random)
+    UserID    uint      `json:"user_id"`    // Owner of this challenge
+    SessionID string    `json:"-"`          // Session that created challenge
+    // ... other fields
+}
+
+// Verify challenge ownership before any operation
+func (s *ManualChallengeService) VerifyOwnership(ctx context.Context, challengeID string, userID uint) error {
+    var challenge Challenge
+    if err := s.db.Where("id = ?", challengeID).First(&challenge).Error; err != nil {
+        return ErrChallengeNotFound
+    }
+
+    if challenge.UserID != userID {
+        // Log potential unauthorized access attempt
+        s.auditLog.Warn("unauthorized challenge access attempt",
+            "challenge_id", challengeID,
+            "owner_id", challenge.UserID,
+            "requester_id", userID,
+        )
+        return ErrUnauthorized
+    }
+
+    return nil
+}
+```
+
+**CSRF Protection:**
+All state-changing operations (POST, PUT, DELETE) on manual challenges MUST validate CSRF tokens:
+
+```go
+// Middleware for manual challenge endpoints
+func CSRFProtection(next http.Handler) http.Handler {
+    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+        if r.Method == "POST" || r.Method == "PUT" || r.Method == "DELETE" {
+            token := r.Header.Get("X-CSRF-Token")
+            sessionToken := getSessionCSRFToken(r)
+
+            if !secureCompare(token, sessionToken) {
+                http.Error(w, "CSRF token mismatch", http.StatusForbidden)
+                return
+            }
+        }
+        next.ServeHTTP(w, r)
+    })
+}
+```
+
+**Challenge ID Generation:**
+Challenge IDs MUST use cryptographically random UUIDs (UUIDv4):
+
+```go
+import "github.com/google/uuid"
+
+func generateChallengeID() string {
+    // UUIDv4 uses crypto/rand, providing 122 bits of randomness
+    return uuid.New().String()
+}
+
+// DO NOT use:
+// - Sequential IDs (predictable)
+// - UUIDv1 (contains timestamp/MAC address)
+// - Custom random without proper entropy
+```
+
+**Session Validation on Each Request:**
+| Endpoint | Required Validations |
+|----------|---------------------|
+| `GET /manual-challenge/:id` | Valid session, challenge.user_id == session.user_id |
+| `POST /manual-challenge/:id/verify` | Valid session, CSRF token, challenge ownership |
+| `DELETE /manual-challenge/:id` | Valid session, CSRF token, challenge ownership |
+
 **Note:** Although Charon has existing WebSocket infrastructure (`backend/internal/services/websocket_tracker.go`), polling is chosen for simplicity:
 - Avoids additional WebSocket connection management complexity
 - 10-second polling interval provides acceptable UX for manual workflows
@@ -866,10 +1230,13 @@ All manual challenge and custom plugin endpoints use consistent error codes:
 |------------|-------------|-------------|
 | `CHALLENGE_NOT_FOUND` | 404 | Challenge ID does not exist |
 | `CHALLENGE_EXPIRED` | 410 | Challenge has timed out |
+| `CHALLENGE_IN_PROGRESS` | 409 | Another challenge is currently active for this FQDN |
 | `DNS_NOT_PROPAGATED` | 200 | DNS record not yet found (success: false) |
 | `INVALID_PROVIDER_TYPE` | 400 | Unknown provider type |
+| `INVALID_SCRIPT_ARGUMENT` | 400 | Script argument contains invalid characters (only `[a-zA-Z0-9._=-]` allowed) |
 | `WEBHOOK_TIMEOUT` | 504 | Webhook did not respond in time |
 | `WEBHOOK_RATE_LIMITED` | 429 | Too many webhook calls (>10/min) |
+| `WEBHOOK_RESPONSE_TOO_LARGE` | 413 | Webhook response exceeded 1MB limit |
 | `PROVIDER_CIRCUIT_OPEN` | 503 | Provider disabled due to consecutive failures |
 | `SCRIPT_TIMEOUT` | 504 | Script execution exceeded timeout |
 | `SCRIPT_PATH_INVALID` | 400 | Script path not in allowed directory |
@@ -1264,9 +1631,72 @@ services:
 **Note:** Full seccomp profile customization is out of scope for this feature. Users relying on script plugins in high-security environments should review container security configuration.
 ```

-### 9.4 Audit Logging
+### 9.4 Log Redaction Patterns

-All custom plugin operations MUST be logged:
+Sensitive data MUST be redacted from all logs, including debug logs, error messages, and audit trails.
+
+**Required Redaction Patterns:**
+
+| Field Pattern | Redaction | Example |
+|---------------|-----------|--------|
+| `api_token` | `[REDACTED:api_token]` | `Bearer abc123` → `Bearer [REDACTED:api_token]` |
+| `api_key` | `[REDACTED:api_key]` | `X-API-Key: secret` → `X-API-Key: [REDACTED:api_key]` |
+| `secret` | `[REDACTED:secret]` | `client_secret=xyz` → `client_secret=[REDACTED:secret]` |
+| `password` | `[REDACTED:password]` | `password=abc` → `password=[REDACTED:password]` |
+| `tsig_key_secret` | `[REDACTED:tsig_secret]` | TSIG key value → `[REDACTED:tsig_secret]` |
+| `authorization` | `[REDACTED:auth]` | `Authorization: Bearer ...` → `Authorization: [REDACTED:auth]` |
+| `bearer` | `[REDACTED:bearer]` | Bearer token values → `[REDACTED:bearer]` |
+
+**Implementation:**
+
+```go
+import "regexp"
+
+var sensitivePatterns = []struct {
+    pattern *regexp.Regexp
+    replace string
+}{
+    {regexp.MustCompile(`(?i)(api_token["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:api_token]`},
+    {regexp.MustCompile(`(?i)(api_key["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:api_key]`},
+    {regexp.MustCompile(`(?i)(secret["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:secret]`},
+    {regexp.MustCompile(`(?i)(password["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:password]`},
+    {regexp.MustCompile(`(?i)(tsig_key_secret["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:tsig_secret]`},
+    {regexp.MustCompile(`(?i)(authorization["']?\s*[:=]\s*["']?)(Bearer\s+)?[^"'\s,}]+`), `$1[REDACTED:auth]`},
+    {regexp.MustCompile(`(?i)Bearer\s+[A-Za-z0-9\-_=]+\.?[A-Za-z0-9\-_=]*\.?[A-Za-z0-9\-_=]*`), `Bearer [REDACTED:bearer]`},
+}
+
+func RedactSensitiveData(input string) string {
+    result := input
+    for _, sp := range sensitivePatterns {
+        result = sp.pattern.ReplaceAllString(result, sp.replace)
+    }
+    return result
+}
+
+// Apply to all log output
+func (l *Logger) LogWithRedaction(level, msg string, fields map[string]any) {
+    // Redact message
+    msg = RedactSensitiveData(msg)
+
+    // Redact field values
+    for key, value := range fields {
+        if str, ok := value.(string); ok {
+            fields[key] = RedactSensitiveData(str)
+        }
+    }
+
+    l.underlying.Log(level, msg, fields)
+}
+```
+
+**Enforcement:**
+- All plugin code MUST use the redacting logger
+- Pre-commit hooks SHOULD scan for potential credential logging
+- Security tests MUST verify no secrets appear in logs
+
+### 9.5 Audit Logging
+
+All custom plugin operations MUST be logged (with redaction applied):

 ```go
 type PluginAuditEvent struct {
@@ -1277,8 +1707,8 @@ type PluginAuditEvent struct {
    Domain      string
    Success     bool
    Duration    time.Duration
-    ErrorMsg    string
-    Details     map[string]any // Redacted credentials
+    ErrorMsg    string           // Redacted before logging
+    Details     map[string]any   // Redacted credentials
 }
 ```

@@ -1458,7 +1888,55 @@ func TestWebhookProvider_ValidateCredentials(t *testing.T) {
 | Cancel challenge | User clicks "Cancel Challenge" | Challenge marked as cancelled, UI returns to provider list |
 | Refresh during challenge | User refreshes page during pending challenge | Challenge state persisted, countdown continues from correct time |

-### 11.3 Security Tests
+### 11.3 Additional Required Test Scenarios
+
+#### Webhook Tests
+
+| Scenario | Description | Expected Result |
+|----------|-------------|----------------|
+| Retry exhaustion | Webhook returns 500 for all 3 retry attempts | `WEBHOOK_TIMEOUT` error after final retry |
+| Response too large | Webhook returns >1MB response | `WEBHOOK_RESPONSE_TOO_LARGE` error (413) |
+| DNS rebinding | URL resolves to internal IP on second resolution | Request blocked, `SSRF_DETECTED` error |
+| Idempotency replay | Same `request_id` sent twice | Second request returns cached response |
+
+#### Circuit Breaker Tests
+
+| Scenario | Description | Expected Result |
+|----------|-------------|----------------|
+| Open state transition | 5 consecutive failures | Circuit opens, `PROVIDER_CIRCUIT_OPEN` (503) |
+| Half-open state | Wait 5 minutes after open | Next request allowed (test request) |
+| Reset on success | Successful request in half-open | Circuit fully closes, counter resets |
+| Stay open on failure | Failed request in half-open | Circuit remains open for another 5 minutes |
+
+#### Script Tests
+
+| Scenario | Description | Expected Result |
+|----------|-------------|----------------|
+| Timeout boundary (pass) | Script completes in 59 seconds | Success, output captured |
+| Timeout boundary (fail) | Script runs for 61 seconds | `SCRIPT_TIMEOUT` error (504) |
+| Invalid argument chars | Argument contains `; rm -rf /` | `INVALID_SCRIPT_ARGUMENT` error (400) |
+| Symlink escape | Script path is symlink to `/etc/passwd` | `SCRIPT_PATH_INVALID` error (400) |
+| Resource limit breach | Script tries to fork 100 processes | Script killed, resource limit error |
+
+#### Manual Challenge Tests
+
+| Scenario | Description | Expected Result |
+|----------|-------------|----------------|
+| Concurrent verify race | Two users verify same FQDN simultaneously | Only one succeeds, other gets `CHALLENGE_IN_PROGRESS` |
+| CSRF token mismatch | POST without valid CSRF token | 403 Forbidden |
+| Challenge ownership | User A tries to access User B's challenge | 403 Forbidden, audit log entry |
+| Predictable ID attack | Attempt to enumerate challenge IDs | No information leakage, 404 for non-existent |
+
+#### RFC 2136 Tests
+
+| Scenario | Description | Expected Result |
+|----------|-------------|----------------|
+| Network timeout | DNS server unreachable | Timeout error with retry logic |
+| Connection refused | DNS server port closed | `TSIG_AUTH_FAILED` or connection error |
+| TSIG key mismatch | Wrong TSIG secret configured | `TSIG_AUTH_FAILED` (401) |
+| Zone transfer denied | Server rejects update | Appropriate error message with zone info |
+
+### 11.4 Security Tests

 | Test | Tool | Target |
 |------|------|--------|
@@ -1467,7 +1945,7 @@ func TestWebhookProvider_ValidateCredentials(t *testing.T) {
 | Credential leakage in logs | Log analysis | All providers |
 | TSIG key handling | Memory dump analysis | RFC2136Provider |

-### 11.4 Coverage Requirements
+### 11.5 Coverage Requirements

 - Backend: ≥85% coverage
 - Frontend: ≥85% coverage
@@ -1503,6 +1981,39 @@ func TestWebhookProvider_ValidateCredentials(t *testing.T) {
 | PowerDNS ACME Setup | Self-hosted users | `docs/guides/powerdns-acme-setup.md` |
 | Building Webhook Endpoints | Developers | `docs/guides/webhook-development.md` |

+### 12.4 Operations and Security Documentation (Required)
+
+The following documentation MUST be created as part of implementation:
+
+| Document | Audience | Location | Priority |
+|----------|----------|----------|----------|
+| Custom DNS Plugin Troubleshooting | Support, Users | `docs/troubleshooting/custom-dns-plugins.md` | High |
+| Custom DNS Security Hardening | Security, Admins | `docs/security/custom-dns-hardening.md` | High |
+| Custom DNS Monitoring Guide | Operations | `docs/operations/custom-dns-monitoring.md` | Medium |
+
+**Required Content for `docs/troubleshooting/custom-dns-plugins.md`:**
+- Common error codes and resolutions
+- Webhook debugging checklist
+- Script execution troubleshooting
+- RFC 2136 connection issues
+- Manual challenge timeout scenarios
+- Log analysis procedures
+
+**Required Content for `docs/security/custom-dns-hardening.md`:**
+- Webhook endpoint security best practices
+- Script plugin security checklist
+- TSIG key management procedures
+- Network segmentation recommendations
+- Audit logging configuration
+- Incident response procedures
+
+**Required Content for `docs/operations/custom-dns-monitoring.md`:**
+- Key metrics to monitor (success rate, latency, errors)
+- Alerting thresholds and recommendations
+- Dashboard examples (Grafana/Prometheus)
+- Capacity planning guidelines
+- Runbook templates for common issues
+
 ---

 ## 13. Estimated Effort
@@ -1647,6 +2158,7 @@ zone "example.com" {
 |---------|------|--------|---------|
 | 1.0 | 2026-01-08 | Planning Agent | Initial specification |
 | 1.1 | 2026-01-08 | Planning Agent | Supervisor review: addressed 13 issues (see below) |
+| 1.2 | 2026-01-11 | Planning Agent | Supervisor review: addressed 9 critical/high priority findings (see Section 18) |

 ---

@@ -1690,3 +2202,53 @@ This specification was revised to address all 13 issues identified during Superv
 ---

 *This document has completed Supervisor review and is ready for technical review and stakeholder approval.*
+
+---
+
+## 18. Supervisor Review Summary (v1.2)
+
+This specification was revised on January 11, 2026 to address 9 critical/high priority findings:
+
+### Security Enhancements
+
+| # | Finding | Resolution |
+|---|---------|------------|
+| 1 | Missing concurrent challenge handling | Section 3.3.5 added with database locking (`SELECT ... FOR UPDATE`), queueing behavior, and `CHALLENGE_IN_PROGRESS` error |
+| 2 | Webhook DNS rebinding vulnerability | Section 4.1 updated: URLs validated at both configuration AND execution time |
+| 3 | Missing webhook response size limit | Section 4.1 updated: `MaxWebhookResponseSize = 1MB`, new error code added |
+| 4 | Missing webhook TLS skip option | Section 4.1 updated: `insecure_skip_verify` config with prominent warning |
+| 5 | Webhook idempotency missing | Section 4.1 updated: `request_id` requirement for deduplication |
+| 6 | Script argument sanitization weak | Section 4.2 updated: strict `[a-zA-Z0-9._=-]` pattern, new error code |
+| 7 | Symlink escape vulnerability | Section 4.2 updated: `filepath.EvalSymlinks()` MUST be called before prefix check |
+| 8 | Resource limits optional | Section 4.2 updated: rlimits now MANDATORY with specific values |
+| 9 | Environment variable leakage | Section 4.2 updated: explicit environment clearing before script execution |
+| 10 | RFC 2136 hmac-md5 insecure | Section 4.3 updated: `hmac-md5` marked DEPRECATED with removal warning |
+| 11 | TSIG secret memory exposure | Section 4.3 updated: secure memory handling with memguard pattern |
+| 12 | Manual challenge session binding missing | Section 4.4 updated: challenge-user binding, CSRF validation, UUIDv4 IDs |
+| 13 | Log credential exposure | Section 9.4 added: comprehensive redaction patterns for 7 sensitive fields |
+
+### Error Codes Added (Section 7.3)
+
+| Code | HTTP Status | Description |
+|------|-------------|-------------|
+| `CHALLENGE_IN_PROGRESS` | 409 | Another challenge active for FQDN |
+| `WEBHOOK_RESPONSE_TOO_LARGE` | 413 | Response exceeded 1MB limit |
+| `INVALID_SCRIPT_ARGUMENT` | 400 | Invalid characters in script argument |
+
+### Testing Scenarios Added (Section 11.3)
+
+- Webhook retry exhaustion tests
+- Circuit breaker state transition tests
+- Script timeout boundary tests (59s pass, 61s fail)
+- Manual challenge concurrent verify race condition test
+- RFC 2136 network error tests
+
+### Documentation Requirements Added (Section 12.4)
+
+- `docs/troubleshooting/custom-dns-plugins.md`
+- `docs/security/custom-dns-hardening.md`
+- `docs/operations/custom-dns-monitoring.md`
+
+---
+
+*This document has been updated to address all supervisor review findings from January 11, 2026.*