Files
Charon/docs/plans/custom_dns_plugin_spec.md
GitHub Actions 3169b05156 fix: skip incomplete system log viewer tests
- Marked 12 tests as skip pending feature implementation
- Features tracked in GitHub issue #686 (system log viewer feature completion)
- Tests cover sorting by timestamp/level/method/URI/status, pagination controls, filtering by text/level, download functionality
- Unblocks Phase 2 at 91.7% pass rate to proceed to Phase 3 security enforcement validation
- TODO comments in code reference GitHub #686 for feature completion tracking
- Tests skipped: Pagination (3), Search/Filter (2), Download (2), Sorting (1), Log Display (4)
2026-02-09 21:55:55 +00:00

2341 lines
95 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Custom DNS Provider Plugin Support - Feature Specification
**Status:** 📋 Planning (Revised)
**Priority:** P2 (Medium)
**Estimated Time:** 48-68 hours
**Author:** Planning Agent
**Date:** January 8, 2026
**Last Revised:** January 11, 2026
**Related:** [Phase 5 Custom Plugins Spec](phase5_custom_plugins_spec.md)
---
## 1. Executive Summary
### Problem Statement
Charon currently supports 10 built-in DNS providers for ACME DNS-01 challenges:
- Cloudflare, Route53, DigitalOcean, Hetzner, DNSimple, Vultr, GoDaddy, Namecheap, Google Cloud DNS, Azure
Users with DNS services not on this list cannot obtain wildcard certificates or use DNS-01 challenges. This limitation affects:
- Organizations using self-hosted DNS (BIND, PowerDNS, Knot DNS)
- Users of regional/niche DNS providers
- Enterprise environments with custom DNS APIs
- Air-gapped or on-premise deployments
### Proposed Solution
Implement multiple extensibility mechanisms that balance ease-of-use with flexibility:
| Option | Target User | Complexity | Automation Level |
|--------|-------------|------------|------------------|
| **A: Webhook Plugin** | DevOps, Integration teams | Medium | Full |
| **B: Script Plugin** | Sysadmins, Power users | Low-Medium | Full |
| **C: RFC 2136 Plugin** | Self-hosted DNS admins | Medium | Full |
| **D: Manual Plugin** | One-off certs, Testing | None | Manual |
### Success Criteria
- Users can obtain certificates using any DNS provider
- At least one plugin option is production-ready within 2 weeks
- Existing built-in providers continue to work unchanged
- 85% test coverage maintained
---
## 2. User Stories
### 2.1 Webhook Plugin (Option A)
> **As a DevOps engineer** with a custom DNS API, I want to provide webhook endpoints so Charon can automate DNS challenges without building a custom integration.
**Acceptance Criteria:**
- I can configure URLs for create/delete TXT record operations
- Charon sends JSON payloads with record details
- I can set custom headers for authentication
- Retry logic handles temporary failures
### 2.2 Script Plugin (Option B)
> **As a system administrator**, I want to run a shell script when Charon needs to create/delete TXT records so I can use my existing DNS automation tools.
**Acceptance Criteria:**
- I can specify a script path inside the container
- Script receives ACTION, DOMAIN, TOKEN, VALUE as arguments
- Script exit code determines success/failure
- Timeout prevents hung scripts
### 2.3 RFC 2136 Plugin (Option C)
> **As a network engineer** running BIND or PowerDNS, I want to use RFC 2136 Dynamic DNS Updates so Charon integrates with my existing infrastructure.
**Acceptance Criteria:**
- I can configure DNS server address and TSIG key
- Charon sends standards-compliant UPDATE messages
- Zone detection works automatically
- Works with BIND9, PowerDNS, Knot DNS
### 2.4 Manual Plugin (Option D)
> **As a user** with an unsupported provider, I want Charon to show me the required TXT record details so I can create it manually.
**Acceptance Criteria:**
- UI clearly displays the record name and value
- I can copy values with one click
- "Verify" button checks if record exists
- Progress indicator shows timeout countdown
### 2.5 General Stories
> **As an administrator**, I want to see all available DNS provider types (built-in + custom) in a unified list.
> **As a security officer**, I want custom plugin configurations to be validated and logged for audit purposes.
---
## 3. Architecture Analysis
### 3.1 Current Plugin System
Charon already has a well-designed plugin architecture in `backend/pkg/dnsprovider/`:
```
backend/pkg/dnsprovider/
├── plugin.go # ProviderPlugin interface (13 methods)
├── registry.go # Thread-safe registry (Global singleton)
├── errors.go # Custom error types
└── builtin/
├── init.go # Auto-registers 10 built-in providers
├── cloudflare.go # Example: implements ProviderPlugin
├── route53.go
└── ... (8 more providers)
```
**Key Interface Methods:**
```go
type ProviderPlugin interface {
Type() string
Metadata() ProviderMetadata
Init() error
Cleanup() error
RequiredCredentialFields() []CredentialFieldSpec
OptionalCredentialFields() []CredentialFieldSpec
ValidateCredentials(creds map[string]string) error
TestCredentials(creds map[string]string) error
SupportsMultiCredential() bool
BuildCaddyConfig(creds map[string]string) map[string]any
BuildCaddyConfigForZone(baseDomain string, creds map[string]string) map[string]any
PropagationTimeout() time.Duration
PollingInterval() time.Duration
}
```
### 3.2 How Custom Plugins Integrate
The existing architecture supports custom plugins via the registry pattern:
```
┌────────────────────────────────────────────────────────────────────┐
│ DNS Provider Registry │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────────┐│
│ │ Cloudflare │ │ Route53 │ │ ... (8) │ │ Custom Plugins ││
│ │ (built-in) │ │ (built-in) │ │ (built-in) │ │ ┌────────────────┐ ││
│ └────────────┘ └────────────┘ └────────────┘ │ │ Webhook Plugin │ ││
│ │ ├────────────────┤ ││
│ │ │ Script Plugin │ ││
│ │ ├────────────────┤ ││
│ │ │ RFC2136 Plugin │ ││
│ │ ├────────────────┤ ││
│ │ │ Manual Plugin │ ││
│ │ └────────────────┘ ││
│ └────────────────────┘│
└────────────────────────────────────────────────────────────────────┘
┌───────────────┴───────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ DNS Provider │ │ Caddy Config │
│ Service Layer │ │ Builder │
│ (CRUD + Test) │ │ (TLS Automation)│
└─────────────────┘ └─────────────────┘
```
### 3.3 Caddy DNS Challenge Integration
Caddy's TLS automation supports custom DNS providers via its module system. For Options A, B, C, we need to either:
1. **Use Caddy's `exec` DNS provider** - Caddy calls an external command
2. **Build a custom Caddy module** - Complex, requires Caddy rebuild
3. **Use Charon as a DNS proxy** - Charon handles DNS operations, returns status to Caddy
**Recommended Approach:** Option 3 (Charon as DNS proxy) for Webhook/Script plugins, native Caddy module for RFC 2136.
#### 3.3.1 Charon DNS Proxy Architecture
For Webhook and Script plugins, Charon acts as a DNS challenge proxy between Caddy and the external DNS provider:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ DNS Challenge Flow (Webhook/Script) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ 1. Certificate ┌──────────┐ 2. DNS-01 Challenge │
│ │ Caddy │ ──────────────▶ │ ACME │ ◀───────────────────── │
│ │ (TLS) │ │ Server │ │
│ └────┬─────┘ └──────────┘ │
│ │ │
│ │ 3. Create TXT record │
│ │ (via exec module or │
│ │ internal API) │
│ ▼ │
│ ┌──────────┐ 4. POST /internal/dns-challenge │
│ │ Charon │ ───────────────────────────────────────────────────────── │
│ │ (Proxy) │ │
│ └────┬─────┘ │
│ │ │
│ │ 5. Execute plugin (webhook/script) │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ External DNS Provider │ │
│ │ (Webhook endpoint or DNS server via script) │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
#### 3.3.2 Challenge Lifecycle State Machine
```
┌─────────────┐
│ CREATED │
│ (initial) │
└──────┬──────┘
Plugin executes create
┌─────────────┐
┌─────────────────────│ PENDING │─────────────────────┐
│ │ (awaiting │ │
│ │ propagation)│ │
│ └──────┬──────┘ │
│ │ │
Timeout (10 min) DNS check passes Plugin error
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ EXPIRED │ │ VERIFYING │ │ FAILED │
│ │ │ │ │ │
└─────────────┘ └──────┬──────┘ └─────────────┘
┌───────────┴───────────┐
│ │
ACME success ACME failure
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ VERIFIED │ │ FAILED │
│ (success) │ │ │
└─────────────┘ └─────────────┘
```
**State Definitions:**
| State | Description | Next States | TTL |
|-------|-------------|-------------|-----|
| `CREATED` | Challenge record created, plugin not yet executed | PENDING, FAILED | - |
| `PENDING` | Plugin executed, waiting for DNS propagation | VERIFYING, EXPIRED, FAILED | 10 min |
| `VERIFYING` | DNS record found, ACME validation in progress | VERIFIED, FAILED | 2 min |
| `VERIFIED` | Challenge completed successfully | (terminal) | 24h cleanup |
| `EXPIRED` | Timeout waiting for DNS propagation | (terminal) | 24h cleanup |
| `FAILED` | Plugin error or ACME validation failure | (terminal) | 24h cleanup |
#### 3.3.3 Caddy Communication
Charon exposes an internal API for Caddy to delegate DNS challenge operations:
```
POST /internal/dns-challenge/create
{
"provider_id": "uuid",
"fqdn": "_acme-challenge.example.com",
"value": "token-value"
}
Response: {"challenge_id": "uuid", "status": "pending"}
DELETE /internal/dns-challenge/{challenge_id}
Response: {"status": "deleted"}
```
#### 3.3.4 Error Handling When Charon is Unavailable
If Charon is unavailable during a DNS challenge:
1. **Caddy retry**: Caddy's built-in retry mechanism (3 attempts, exponential backoff)
2. **Graceful degradation**: If Charon remains unavailable, Caddy logs error and fails certificate issuance
3. **Health check**: Caddy pre-checks Charon availability via `/health` before initiating challenges
4. **Circuit breaker**: After 5 consecutive failures, Caddy disables the custom provider for 5 minutes
#### 3.3.5 Concurrent Challenge Handling
To prevent race conditions when multiple certificate requests target the same FQDN simultaneously:
**Database Locking Strategy:**
```sql
-- Acquire exclusive lock when creating challenge for FQDN
BEGIN;
SELECT * FROM dns_challenges
WHERE fqdn = '_acme-challenge.example.com'
AND status IN ('created', 'pending', 'verifying')
FOR UPDATE NOWAIT;
-- If lock acquired and no active challenge exists, create new challenge
-- Otherwise, return CHALLENGE_IN_PROGRESS error
COMMIT;
```
**Queueing Behavior:**
| Scenario | Behavior |
|----------|----------|
| No active challenge for FQDN | Create new challenge immediately |
| Active challenge exists (same user) | Return existing challenge ID |
| Active challenge exists (different user) | Return `CHALLENGE_IN_PROGRESS` (409) |
| Active challenge expired/failed | Allow new challenge creation |
**Implementation Requirements:**
```go
func (s *ChallengeService) CreateChallenge(ctx context.Context, fqdn string, userID uint) (*Challenge, error) {
tx := s.db.Begin()
defer tx.Rollback()
// Attempt to acquire lock on existing active challenges
var existing Challenge
err := tx.Set("gorm:query_option", "FOR UPDATE NOWAIT").
Where("fqdn = ? AND status IN (?)", fqdn, []string{"created", "pending", "verifying"}).
First(&existing).Error
if err == nil {
// Active challenge exists
if existing.UserID == userID {
return &existing, nil // Return existing challenge to same user
}
return nil, ErrChallengeInProgress // Different user, reject
}
if !errors.Is(err, gorm.ErrRecordNotFound) {
return nil, fmt.Errorf("lock acquisition failed: %w", err)
}
// No active challenge, create new one
challenge := &Challenge{FQDN: fqdn, UserID: userID, Status: "created"}
if err := tx.Create(challenge).Error; err != nil {
return nil, err
}
tx.Commit()
return challenge, nil
}
```
**Timeout Handling:**
- Challenges automatically transition to `expired` after 10 minutes
- Expired challenges release the "lock" on the FQDN
- Subsequent requests can then create new challenges
### 3.4 Database Model Impact
Current `dns_providers` table schema:
```sql
CREATE TABLE dns_providers (
id INTEGER PRIMARY KEY,
uuid VARCHAR(36) UNIQUE,
name VARCHAR(255) NOT NULL,
provider_type VARCHAR(50) NOT NULL, -- 'cloudflare', 'webhook', 'script', etc.
enabled BOOLEAN DEFAULT TRUE,
is_default BOOLEAN DEFAULT FALSE,
credentials_encrypted TEXT, -- Encrypted JSON blob
key_version INTEGER DEFAULT 1,
propagation_timeout INTEGER DEFAULT 120,
polling_interval INTEGER DEFAULT 5,
-- ... statistics fields
);
```
Custom plugins will use the same table with different `provider_type` values and plugin-specific credentials.
---
## 4. Proposed Solutions
### 4.1 Option A: Generic Webhook Plugin
#### Overview
User provides webhook URLs for create/delete TXT records. Charon POSTs JSON payloads with record details.
#### Configuration
```json
{
"name": "My Webhook DNS",
"provider_type": "webhook",
"credentials": {
"create_url": "https://api.example.com/dns/txt/create",
"delete_url": "https://api.example.com/dns/txt/delete",
"auth_header": "X-API-Key",
"auth_value": "secret-token-here",
"timeout_seconds": "30",
"retry_count": "3"
}
}
```
#### Request Payload (Sent to Webhook)
```json
{
"action": "create",
"fqdn": "_acme-challenge.example.com",
"domain": "example.com",
"subdomain": "_acme-challenge",
"value": "gZrH7wL9t3kM2nP4...",
"ttl": 300,
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-01-08T15:30:00Z"
}
```
#### Expected Response
```json
{
"success": true,
"message": "TXT record created",
"record_id": "optional-id-for-deletion"
}
```
#### Security Hardening
**DNS Rebinding Protection:**
Webhook URLs MUST be validated at both configuration time AND request execution time to prevent DNS rebinding attacks:
```go
// Configuration-time validation
func (w *WebhookProvider) ValidateCredentials(creds map[string]string) error {
if err := security.ValidateExternalURL(creds["create_url"]); err != nil {
return fmt.Errorf("create_url validation failed: %w", err)
}
// ... validate delete_url
}
// Execution-time validation (re-validate before each request)
func (w *WebhookProvider) executeWebhook(ctx context.Context, url string, payload []byte) error {
// Re-validate URL to prevent DNS rebinding
if err := security.ValidateExternalURL(url); err != nil {
return fmt.Errorf("webhook URL failed re-validation: %w", err)
}
// ... execute request
}
```
**Response Size Limit:**
```go
const MaxWebhookResponseSize = 1 * 1024 * 1024 // 1MB
// Enforce response size limit
resp, err := client.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
limitedReader := io.LimitReader(resp.Body, MaxWebhookResponseSize+1)
body, err := io.ReadAll(limitedReader)
if len(body) > MaxWebhookResponseSize {
return ErrWebhookResponseTooLarge
}
```
**TLS Validation:**
```json
{
"credentials": {
"insecure_skip_verify": false
}
}
```
> ⚠️ **WARNING:** Setting `insecure_skip_verify: true` disables TLS certificate validation. This should ONLY be used in development/testing environments with self-signed certificates. NEVER enable in production.
**Idempotency Requirement:**
Webhook endpoints MUST support the `request_id` field for request deduplication. Charon will include a unique `request_id` (UUIDv4) in every webhook payload. Webhook implementations SHOULD:
1. Store processed `request_id` values with a TTL of at least 24 hours
2. Return cached response for duplicate `request_id` values
3. Use `request_id` for audit logging correlation
#### Rate Limiting and Circuit Breaker
To prevent abuse and ensure reliability, webhook plugins enforce:
| Limit | Value | Behavior |
|-------|-------|----------|
| Max calls per minute | 10 | Requests beyond limit return 429 Too Many Requests |
| Circuit breaker threshold | 5 consecutive failures | Provider disabled for 5 minutes |
| Circuit breaker reset | Automatic after 5 minutes | First successful call fully resets counter |
| Max response size | 1MB | Responses exceeding limit return 413 error |
**Implementation Requirements:**
```go
type WebhookRateLimiter struct {
callsPerMinute int // Max 10
consecutiveFails int // Track failures
disabledUntil time.Time // Circuit breaker timestamp
}
func (w *WebhookProvider) executeWithRateLimit(ctx context.Context, req *WebhookRequest) error {
if time.Now().Before(w.rateLimiter.disabledUntil) {
return ErrProviderCircuitOpen
}
// ... execute webhook with rate limiting
}
```
#### Pros
- Works with any HTTP-capable system
- No code changes required on user side (just API endpoint)
- Supports complex authentication (headers, query params)
- Can integrate with existing automation (Terraform, Ansible AWX, etc.)
#### Cons
- User must implement and host webhook endpoint
- Network latency adds to propagation time
- Debugging requires access to both Charon and webhook logs
- Security: webhook credentials stored in Charon
#### Implementation Complexity
- Backend: ~200 lines (WebhookProvider implementation)
- Frontend: ~100 lines (form fields)
- Tests: ~150 lines
---
### 4.2 Option B: Custom Script Plugin
#### Overview
User provides path to shell script inside container. Script receives ACTION, DOMAIN, TOKEN, VALUE as arguments.
#### Configuration
```json
{
"name": "My Script DNS",
"provider_type": "script",
"credentials": {
"script_path": "/scripts/dns-update.sh",
"timeout_seconds": "60",
"env_vars": "DNS_SERVER=ns1.example.com,API_KEY=${API_KEY}"
}
}
```
#### Script Interface
```bash
#!/bin/bash
# Called by Charon for DNS-01 challenge
# Arguments:
# $1 = ACTION: "create" or "delete"
# $2 = FQDN: "_acme-challenge.example.com"
# $3 = TOKEN: Challenge token (for identification)
# $4 = VALUE: TXT record value to set
ACTION="$1"
FQDN="$2"
TOKEN="$3"
VALUE="$4"
case "$ACTION" in
create)
# Create TXT record
nsupdate <<EOF
server ${DNS_SERVER}
update add ${FQDN} 300 TXT "${VALUE}"
send
EOF
;;
delete)
# Delete TXT record
nsupdate <<EOF
server ${DNS_SERVER}
update delete ${FQDN} TXT
send
EOF
;;
esac
# Exit code: 0 = success, non-zero = failure
```
#### Pros
- Maximum flexibility - any tool/language can be used
- Direct access to host system (if volume-mounted)
- Familiar paradigm for sysadmins
- Can leverage existing scripts/tooling
#### Cons
- **Security Risk:** Script execution in container context
- Harder to debug than API calls
- Script must be mounted into container
- No automatic retries (must implement in script)
- Sandboxing limits capability
#### Security Mitigations
1. Script must be in allowlisted directory (`/scripts/`)
2. Scripts run with restricted permissions (no network by default)
3. Timeout prevents resource exhaustion
4. All executions are audit-logged
#### Security Requirements (Mandatory)
**Argument Sanitization:**
All script arguments MUST be validated against a strict allowlist pattern:
```go
var validArgumentPattern = regexp.MustCompile(`^[a-zA-Z0-9._=-]+$`)
func sanitizeArgument(arg string) (string, error) {
if !validArgumentPattern.MatchString(arg) {
return "", ErrInvalidScriptArgument
}
if len(arg) > 1024 {
return "", ErrArgumentTooLong
}
return arg, nil
}
// Usage
for i, arg := range args {
sanitized, err := sanitizeArgument(arg)
if err != nil {
return fmt.Errorf("argument %d contains invalid characters: %w", i, err)
}
args[i] = sanitized
}
```
**Symlink Resolution:**
Path validation MUST use `filepath.EvalSymlinks()` BEFORE checking the allowed directory prefix to prevent symlink escape attacks:
```go
func validateScriptPath(scriptPath string) error {
// CRITICAL: Resolve symlinks FIRST
resolvedPath, err := filepath.EvalSymlinks(scriptPath)
if err != nil {
return fmt.Errorf("failed to resolve script path: %w", err)
}
// Then validate resolved path is within allowed directory
absPath, err := filepath.Abs(resolvedPath)
if err != nil {
return fmt.Errorf("failed to resolve absolute path: %w", err)
}
allowedDir := "/scripts/"
if !strings.HasPrefix(absPath, allowedDir) {
return ErrScriptPathInvalid
}
return nil
}
```
**Resource Limits (MANDATORY):**
The following rlimits MUST be enforced for all script executions:
| Resource | Limit | Purpose |
|----------|-------|------|
| `RLIMIT_NOFILE` | 256 | Prevent file descriptor exhaustion |
| `RLIMIT_NPROC` | 64 | Prevent fork bombs |
| `RLIMIT_AS` | 256MB | Prevent memory exhaustion |
| `RLIMIT_CPU` | 60s | Prevent CPU exhaustion |
| `RLIMIT_FSIZE` | 10MB | Prevent disk filling |
```go
// MANDATORY: Apply rlimits before script execution
func setMandatoryResourceLimits() error {
limits := []struct {
resource int
limit uint64
}{
{syscall.RLIMIT_NOFILE, 256},
{syscall.RLIMIT_NPROC, 64},
{syscall.RLIMIT_AS, 256 * 1024 * 1024},
{syscall.RLIMIT_CPU, 60},
{syscall.RLIMIT_FSIZE, 10 * 1024 * 1024},
}
for _, l := range limits {
if err := syscall.Setrlimit(l.resource, &syscall.Rlimit{Cur: l.limit, Max: l.limit}); err != nil {
return fmt.Errorf("failed to set rlimit %d: %w", l.resource, err)
}
}
return nil
}
```
**Environment Variable Clearing:**
Inherited environment variables MUST be explicitly cleared before setting script environment:
```go
func executeScript(scriptPath string, args []string, userEnv map[string]string) error {
cmd := exec.CommandContext(ctx, scriptPath, args...)
// CRITICAL: Start with empty environment (clear inherited vars)
cmd.Env = []string{}
// Add only essential system variables
cmd.Env = append(cmd.Env,
"PATH=/usr/local/bin:/usr/bin:/bin",
"HOME=/tmp",
"LANG=C.UTF-8",
"TZ=UTC",
)
// Add user-provided environment variables (after validation)
for key, value := range userEnv {
if err := validateEnvVar(key, value); err != nil {
return fmt.Errorf("invalid env var %s: %w", key, err)
}
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", key, value))
}
// Execute with cleared environment
return cmd.Run()
}
```
#### Implementation Complexity
- Backend: ~250 lines (ScriptProvider + executor)
- Frontend: ~80 lines (form fields)
- Tests: ~200 lines (including security tests)
---
### 4.3 Option C: RFC 2136 (Dynamic DNS Update) Plugin
#### Overview
RFC 2136 defines a standard protocol for dynamic DNS updates. Supported by BIND, PowerDNS, Knot DNS, and many self-hosted DNS servers.
#### Configuration
```json
{
"name": "My BIND Server",
"provider_type": "rfc2136",
"credentials": {
"nameserver": "ns1.example.com",
"port": "53",
"tsig_key_name": "acme-update-key",
"tsig_key_secret": "base64-encoded-secret",
"tsig_algorithm": "hmac-sha256",
"zone": "example.com"
}
}
```
#### TSIG Algorithms Supported
| Algorithm | Status | Notes |
|-----------|--------|-------|
| `hmac-md5` | ⚠️ **DEPRECATED** | Cryptographically weak; will be removed in v2.0 |
| `hmac-sha1` | Legacy | Avoid for new deployments |
| `hmac-sha256` | ✅ Recommended | Default for new configurations |
| `hmac-sha384` | Supported | Higher security, slightly more overhead |
| `hmac-sha512` | Supported | Highest security |
> ⚠️ **DEPRECATION WARNING:** `hmac-md5` is cryptographically weak and should not be used for new deployments. Support for `hmac-md5` will be removed in Charon v2.0. Migrate to `hmac-sha256` or stronger.
**Secure Memory Handling for TSIG Secrets:**
TSIG secrets MUST be handled securely in memory:
```go
import "github.com/awnumar/memguard"
type RFC2136Provider struct {
tsigSecret *memguard.Enclave // Encrypted in memory
}
func (r *RFC2136Provider) SetTSIGSecret(secret []byte) error {
// Store secret in encrypted memory enclave
enclave := memguard.NewEnclave(secret)
// Immediately wipe the source buffer
memguard.WipeBytes(secret)
r.tsigSecret = enclave
return nil
}
func (r *RFC2136Provider) Cleanup() error {
if r.tsigSecret != nil {
r.tsigSecret.Destroy()
}
return nil
}
```
**Requirements:**
1. TSIG secrets MUST be stored in encrypted memory enclaves when in use
2. Source buffers containing secrets MUST be wiped immediately after copying
3. Secrets MUST NOT appear in debug output, stack traces, or core dumps
4. Provider `Cleanup()` MUST securely destroy all secret material
#### DNS UPDATE Message Flow
```
┌──────────┐ ┌──────────────┐
│ Charon │ │ DNS Server │
│ │ DNS UPDATE │ (BIND, etc) │
│ │ ─────────────────▶│ │
│ │ TSIG-signed │ │
│ │ │ │
│ │ RESPONSE │ │
│ │ ◀─────────────────│ │
│ │ NOERROR/REFUSED │ │
└──────────┘ └──────────────┘
```
#### Caddy Integration
Caddy has a native RFC 2136 module: [caddy-dns/rfc2136](https://github.com/caddy-dns/rfc2136)
**DECISION:** Charon WILL ship with the RFC 2136 Caddy module pre-built in the Docker image. Users do NOT need to rebuild Caddy.
The Charon plugin would:
1. Store TSIG credentials encrypted
2. Generate Caddy config with proper RFC 2136 settings
3. Validate credentials by attempting a test query
**Dockerfile Addition (Phase 2):**
```dockerfile
# Build Caddy with RFC 2136 module
FROM caddy:builder AS caddy-builder
RUN xcaddy build \
--with github.com/caddy-dns/rfc2136
```
#### Pros
- Industry-standard protocol
- No custom server-side code needed
- Works with popular DNS servers (BIND9, PowerDNS, Knot)
- Secure with TSIG authentication
- Native Caddy module available
#### Cons
- Requires DNS server configuration for TSIG keys
- More complex setup than webhook
- Zone configuration required
- Firewall rules may need updating (TCP/UDP 53)
#### Implementation Complexity
- Backend: ~180 lines (RFC2136Provider)
- Frontend: ~120 lines (TSIG configuration form)
- Tests: ~150 lines
- Requires: Caddy rebuild with `caddy-dns/rfc2136` module
---
### 4.4 Option D: Manual/External Plugin
#### Overview
No automation - UI shows required TXT record details, user creates manually, clicks "Verify" when done.
#### UI Flow
```
┌─────────────────────────────────────────────────────────────────────┐
│ Manual DNS Challenge │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ To obtain a certificate for *.example.com, create the following │
│ TXT record at your DNS provider: │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Record Name: _acme-challenge.example.com [📋 Copy] │ │
│ ├────────────────────────────────────────────────────────────────┤ │
│ │ Record Value: gZrH7wL9t3kM2nP4qX5yR8sT... [📋 Copy] │ │
│ ├────────────────────────────────────────────────────────────────┤ │
│ │ TTL: 300 (5 minutes) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ ⏱️ Time remaining: 4:32 │
│ [━━━━━━━━━━━━━━━━━━━━━░░░░░░░░░░] 68% │
│ │
│ [Check DNS Now] [I've Created the Record - Verify] │
│ │
Record not yet propagated. Last check: 10 seconds ago │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
#### Configuration
```json
{
"name": "Manual DNS",
"provider_type": "manual",
"credentials": {
"timeout_minutes": "10",
"polling_interval_seconds": "30"
}
}
```
#### Technical Implementation
- Store challenge details in session/database
- Background job periodically queries DNS
- Polling endpoint for UI updates (10-second interval)
- Timeout after configurable period
#### Session Security Requirements
**Challenge-User Binding:**
Manual challenges MUST be bound to the authenticated user's session:
```go
type Challenge struct {
ID string `json:"id"` // UUIDv4 (cryptographically random)
UserID uint `json:"user_id"` // Owner of this challenge
SessionID string `json:"-"` // Session that created challenge
// ... other fields
}
// Verify challenge ownership before any operation
func (s *ManualChallengeService) VerifyOwnership(ctx context.Context, challengeID string, userID uint) error {
var challenge Challenge
if err := s.db.Where("id = ?", challengeID).First(&challenge).Error; err != nil {
return ErrChallengeNotFound
}
if challenge.UserID != userID {
// Log potential unauthorized access attempt
s.auditLog.Warn("unauthorized challenge access attempt",
"challenge_id", challengeID,
"owner_id", challenge.UserID,
"requester_id", userID,
)
return ErrUnauthorized
}
return nil
}
```
**CSRF Protection:**
All state-changing operations (POST, PUT, DELETE) on manual challenges MUST validate CSRF tokens:
```go
// Middleware for manual challenge endpoints
func CSRFProtection(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method == "POST" || r.Method == "PUT" || r.Method == "DELETE" {
token := r.Header.Get("X-CSRF-Token")
sessionToken := getSessionCSRFToken(r)
if !secureCompare(token, sessionToken) {
http.Error(w, "CSRF token mismatch", http.StatusForbidden)
return
}
}
next.ServeHTTP(w, r)
})
}
```
**Challenge ID Generation:**
Challenge IDs MUST use cryptographically random UUIDs (UUIDv4):
```go
import "github.com/google/uuid"
func generateChallengeID() string {
// UUIDv4 uses crypto/rand, providing 122 bits of randomness
return uuid.New().String()
}
// DO NOT use:
// - Sequential IDs (predictable)
// - UUIDv1 (contains timestamp/MAC address)
// - Custom random without proper entropy
```
**Session Validation on Each Request:**
| Endpoint | Required Validations |
|----------|---------------------|
| `GET /manual-challenge/:id` | Valid session, challenge.user_id == session.user_id |
| `POST /manual-challenge/:id/verify` | Valid session, CSRF token, challenge ownership |
| `DELETE /manual-challenge/:id` | Valid session, CSRF token, challenge ownership |
**Note:** Although Charon has existing WebSocket infrastructure (`backend/internal/services/websocket_tracker.go`), polling is chosen for simplicity:
- Avoids additional WebSocket connection management complexity
- 10-second polling interval provides acceptable UX for manual workflows
- Reduces frontend state management burden
**Polling Endpoint:**
```
GET /api/v1/dns-providers/:id/manual-challenge/:challengeId/poll
Response (every 10s):
{
"status": "pending|verified|expired|failed",
"dns_propagated": false,
"time_remaining_seconds": 432,
"last_check_at": "2026-01-08T15:35:00Z"
}
```
#### Pros
- Works with ANY DNS provider
- No integration required
- Good for testing/development
- One-off certificate issuance
#### Cons
- User must manually intervene
- Time-sensitive (ACME challenge timeout)
- Not suitable for automated renewals
- Doesn't scale for multiple certificates
#### Implementation Complexity
- Backend: ~150 lines (ManualProvider + verification endpoint)
- Frontend: ~300 lines (interactive UI with copy/verify)
- Tests: ~100 lines
---
## 5. Recommended Approach
### Phase 1: Manual Plugin (1 week)
**Rationale:** Unblocks all users immediately. Lowest risk, highest immediate value.
Deliverables:
- ManualProvider implementation
- Interactive challenge UI
- DNS verification endpoint
- User documentation
### Phase 2: RFC 2136 Plugin (1 week)
**Rationale:** Standards-based, serves self-hosted DNS users. Caddy module already exists.
Deliverables:
- RFC2136Provider implementation
- TSIG credential storage
- Caddy module integration documentation
- BIND9/PowerDNS setup guides
### Phase 3: Webhook Plugin (1 week)
**Rationale:** Most flexible option for custom integrations. Medium complexity.
Deliverables:
- WebhookProvider implementation
- Configurable retry logic
- Request/response logging
- Example webhook implementations (Node.js, Python)
---
## Future Work
### Phase 4: Script Plugin (Conditional)
> **Go/No-Go Gate:** Phase 4 only proceeds if >20 user requests are received via GitHub issues requesting script plugin functionality. Track via label `feature:script-plugin`.
**Rationale:** Power-user feature with significant security implications. Implement only if demand warrants the additional security review and maintenance burden.
Deliverables:
- ScriptProvider implementation
- Security sandbox
- Example scripts for common scenarios
### Implementation Order Justification
```
User Value
│ ★ Manual Plugin (Phase 1)
│ - Unblocks everyone immediately
│ - Lowest implementation risk
│ ★ RFC 2136 Plugin (Phase 2)
│ - Self-hosted DNS is common need
│ - Industry standard
│ ★ Webhook Plugin (Phase 3)
│ - Flexible for edge cases
│ - Integration-focused teams
│ ○ Script Plugin (Phase 4)
│ - Power users only
│ - Security concerns
└────────────────────────────────▶ Implementation Effort
```
---
## 6. Database Schema Changes
### 6.1 No New Tables Required
The existing `dns_providers` table schema supports custom plugins. The `provider_type` column accepts new values, and `credentials_encrypted` stores plugin-specific configuration.
### 6.2 Provider Type Enumeration
Expand the allowed `provider_type` values:
```go
// backend/pkg/dnsprovider/types.go
const (
// Built-in providers
TypeCloudflare = "cloudflare"
TypeRoute53 = "route53"
// ... existing providers
// Custom plugins
TypeWebhook = "webhook"
TypeScript = "script"
TypeRFC2136 = "rfc2136"
TypeManual = "manual"
)
```
### 6.3 Credential Schemas Per Plugin Type
#### Webhook Credentials
```json
{
"create_url": "string (required)",
"delete_url": "string (required)",
"auth_header": "string (optional)",
"auth_value": "string (optional, encrypted)",
"content_type": "string (default: application/json)",
"timeout_seconds": "integer (default: 30)",
"retry_count": "integer (default: 3)",
"custom_headers": "object (optional)"
}
```
#### Script Credentials
```json
{
"script_path": "string (required)",
"timeout_seconds": "integer (default: 60)",
"working_directory": "string (optional)",
"env_vars": "string (optional, KEY=VALUE format)"
}
```
#### RFC 2136 Credentials
```json
{
"nameserver": "string (required)",
"port": "integer (default: 53)",
"tsig_key_name": "string (required)",
"tsig_key_secret": "string (required, encrypted)",
"tsig_algorithm": "string (default: hmac-sha256)",
"zone": "string (optional, auto-detect)"
}
```
#### Manual Credentials
```json
{
"timeout_minutes": "integer (default: 10)",
"polling_interval_seconds": "integer (default: 30)"
}
```
### 6.4 Challenge Cleanup Mechanism
Challenges are cleaned up via Charon's existing scheduled task infrastructure (using `robfig/cron/v3`, same pattern as `backup_service.go`):
```go
// Cleanup job runs hourly
func (s *ManualChallengeService) scheduleCleanup() {
_, err := s.cron.AddFunc("0 * * * *", s.cleanupExpiredChallenges)
// ...
}
func (s *ManualChallengeService) cleanupExpiredChallenges() {
// Mark challenges in "pending" state > 24 hours as "expired"
// Delete challenge records > 7 days old
cutoff := time.Now().Add(-24 * time.Hour)
s.db.Model(&Challenge{}).
Where("status = ? AND created_at < ?", "pending", cutoff).
Update("status", "expired")
// Hard delete after 7 days
deleteCutoff := time.Now().Add(-7 * 24 * time.Hour)
s.db.Where("created_at < ?", deleteCutoff).Delete(&Challenge{})
}
```
**Cleanup Schedule:**
| Condition | Action | Frequency |
|-----------|--------|-----------|
| `pending` status > 24 hours | Mark as `expired` | Hourly |
| Any challenge > 7 days old | Hard delete | Hourly |
---
## 7. API Design
### 7.1 Existing Endpoints (No Changes)
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/v1/dns-providers` | List all providers |
| POST | `/api/v1/dns-providers` | Create provider |
| GET | `/api/v1/dns-providers/:id` | Get provider |
| PUT | `/api/v1/dns-providers/:id` | Update provider |
| DELETE | `/api/v1/dns-providers/:id` | Delete provider |
| POST | `/api/v1/dns-providers/:id/test` | Test credentials |
| GET | `/api/v1/dns-providers/types` | List provider types |
### 7.2 New Endpoints
#### Manual Challenge Status
```
GET /api/v1/dns-providers/:id/manual-challenge/:challengeId
```
Response:
```json
{
"id": "challenge-uuid",
"status": "pending|verified|expired|failed",
"fqdn": "_acme-challenge.example.com",
"value": "gZrH7wL9t3kM2nP4...",
"created_at": "2026-01-08T15:30:00Z",
"expires_at": "2026-01-08T15:40:00Z",
"last_check_at": "2026-01-08T15:35:00Z",
"dns_propagated": false
}
```
#### Manual Challenge Verification Trigger
```
POST /api/v1/dns-providers/:id/manual-challenge/:challengeId/verify
```
Response:
```json
{
"success": true,
"dns_found": true,
"message": "TXT record verified successfully"
}
```
### 7.3 Error Response Codes
All manual challenge and custom plugin endpoints use consistent error codes:
| Error Code | HTTP Status | Description |
|------------|-------------|-------------|
| `CHALLENGE_NOT_FOUND` | 404 | Challenge ID does not exist |
| `CHALLENGE_EXPIRED` | 410 | Challenge has timed out |
| `CHALLENGE_IN_PROGRESS` | 409 | Another challenge is currently active for this FQDN |
| `DNS_NOT_PROPAGATED` | 200 | DNS record not yet found (success: false) |
| `INVALID_PROVIDER_TYPE` | 400 | Unknown provider type |
| `INVALID_SCRIPT_ARGUMENT` | 400 | Script argument contains invalid characters (only `[a-zA-Z0-9._=-]` allowed) |
| `WEBHOOK_TIMEOUT` | 504 | Webhook did not respond in time |
| `WEBHOOK_RATE_LIMITED` | 429 | Too many webhook calls (>10/min) |
| `WEBHOOK_RESPONSE_TOO_LARGE` | 413 | Webhook response exceeded 1MB limit |
| `PROVIDER_CIRCUIT_OPEN` | 503 | Provider disabled due to consecutive failures |
| `SCRIPT_TIMEOUT` | 504 | Script execution exceeded timeout |
| `SCRIPT_PATH_INVALID` | 400 | Script path not in allowed directory |
| `TSIG_AUTH_FAILED` | 401 | RFC 2136 TSIG authentication failed |
**Error Response Format:**
```json
{
"success": false,
"error": {
"code": "CHALLENGE_EXPIRED",
"message": "Challenge timed out after 10 minutes",
"details": {
"challenge_id": "uuid",
"expired_at": "2026-01-08T15:40:00Z"
}
}
}
```
### 7.4 Updated Types Endpoint Response
The existing `/api/v1/dns-providers/types` endpoint will include custom plugins:
```json
{
"types": [
{
"type": "cloudflare",
"name": "Cloudflare",
"is_built_in": true,
"fields": [...]
},
{
"type": "webhook",
"name": "Webhook (Generic)",
"is_built_in": false,
"category": "custom",
"fields": [
{"name": "create_url", "label": "Create Record URL", "type": "text", "required": true},
{"name": "delete_url", "label": "Delete Record URL", "type": "text", "required": true},
{"name": "auth_header", "label": "Auth Header Name", "type": "text", "required": false},
{"name": "auth_value", "label": "Auth Header Value", "type": "password", "required": false}
]
},
{
"type": "rfc2136",
"name": "RFC 2136 (Dynamic DNS)",
"is_built_in": false,
"category": "custom",
"fields": [
{"name": "nameserver", "label": "DNS Server", "type": "text", "required": true},
{"name": "tsig_key_name", "label": "TSIG Key Name", "type": "text", "required": true},
{"name": "tsig_key_secret", "label": "TSIG Secret", "type": "password", "required": true},
{"name": "tsig_algorithm", "label": "TSIG Algorithm", "type": "select", "options": [...]}
]
},
{
"type": "manual",
"name": "Manual (No Automation)",
"is_built_in": false,
"category": "custom",
"fields": [
{"name": "timeout_minutes", "label": "Challenge Timeout (minutes)", "type": "number", "default": "10"}
]
}
]
}
```
---
## 8. Frontend UI Mockups
### 8.1 Provider Type Selection (Updated)
```
┌─────────────────────────────────────────────────────────────────────┐
│ Add DNS Provider │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Select Provider Type: │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ BUILT-IN PROVIDERS ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐││
│ │ │ ☁️ Cloudflare│ │ 🔶 Route53 │ │ 💧 Digital │ │ 🔷 Azure │││
│ │ │ │ │ │ │ Ocean │ │ │││
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐││
│ │ │ 🌐 Google │ │ 🟠 Hetzner │ │ 📛 GoDaddy │ │ 🔵 Namecheap│││
│ │ │ Cloud DNS │ │ │ │ │ │ │││
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ CUSTOM INTEGRATIONS ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐││
│ │ │ 🔗 Webhook │ │ 📜 Script │ │ 📡 RFC 2136 │ │ ✋ Manual │││
│ │ │ (HTTP) │ │ (Shell) │ │ (DDNS) │ │ │││
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ [Cancel] [Next →] │
└─────────────────────────────────────────────────────────────────────┘
```
### 8.2 Webhook Configuration Form
```
┌─────────────────────────────────────────────────────────────────────┐
│ Configure Webhook Provider │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Provider Name: │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ My Custom DNS Webhook ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ Create Record URL: * │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ https://api.example.com/dns/create ││
│ └─────────────────────────────────────────────────────────────────┘│
Charon will POST JSON with record details │
│ │
│ Delete Record URL: * │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ https://api.example.com/dns/delete ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ ── Authentication (Optional) ──────────────────────────────────────│
│ │
│ Header Name: Header Value: │
│ ┌───────────────────┐ ┌───────────────────────────────┐ │
│ │ X-API-Key │ │ •••••••••••••• │ │
│ └───────────────────┘ └───────────────────────────────┘ │
│ │
│ ── Advanced Settings ──────────────────────────────────────────────│
│ │
│ Timeout (seconds): [30 ▼] Retry Count: [3 ▼] │
│ │
│ │
│ [Test Connection] [Cancel] [Save Provider] │
└─────────────────────────────────────────────────────────────────────┘
```
### 8.3 RFC 2136 Configuration Form
```
┌─────────────────────────────────────────────────────────────────────┐
│ Configure RFC 2136 Provider │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Provider Name: │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ Internal BIND Server ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ DNS Server: * Port: │
│ ┌─────────────────────────────────────┐ ┌─────────────────────────┐│
│ │ ns1.internal.example.com │ │ 53 ││
│ └─────────────────────────────────────┘ └─────────────────────────┘│
│ │
│ ── TSIG Authentication ────────────────────────────────────────────│
│ │
│ Key Name: * │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ acme-update-key.example.com ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ Key Secret: * │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ •••••••••••••••••••••••••••••••• ││
│ └─────────────────────────────────────────────────────────────────┘│
Base64-encoded TSIG secret │
│ │
│ Algorithm: │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ HMAC-SHA256 (Recommended) ▼ ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ Zone (optional - auto-detected if empty): │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ [Test Connection] [Cancel] [Save Provider] │
└─────────────────────────────────────────────────────────────────────┘
```
### 8.4 Manual Challenge UI
```
┌─────────────────────────────────────────────────────────────────────┐
│ 🔐 Manual DNS Challenge │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Certificate Request: *.example.com │
│ Provider: Manual DNS (example-manual) │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ 📋 CREATE THIS TXT RECORD AT YOUR DNS PROVIDER ││
│ │ ││
│ │ Record Name: ││
│ │ ┌──────────────────────────────────────────────────┐ ┌──────┐││
│ │ │ _acme-challenge.example.com │ │ Copy │││
│ │ └──────────────────────────────────────────────────┘ └──────┘││
│ │ ││
│ │ Record Type: TXT ││
│ │ ││
│ │ Record Value: ││
│ │ ┌──────────────────────────────────────────────────┐ ┌──────┐││
│ │ │ gZrH7wL9t3kM2nP4qX5yR8sT0uV1wZ2aB3cD4eF5gH6iJ7 │ │ Copy │││
│ │ └──────────────────────────────────────────────────┘ └──────┘││
│ │ ││
│ │ TTL: 300 seconds (5 minutes) ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ ⏱️ Time Remaining: 7:23 ││
│ │ [━━━━━━━━━━━━━━━━━░░░░░░░░░░░░░░░] 52% ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ Status: ⏳ Waiting for DNS propagation... │
│ Last checked: 15 seconds ago │
│ │
│ ┌─────────────────────┐ ┌────────────────────────────────────────┐│
│ │ 🔍 Check DNS Now │ │ ✅ I've Created the Record - Verify ││
│ └─────────────────────┘ └────────────────────────────────────────┘│
│ │
│ [Cancel Challenge] │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 9. Security Considerations
### 9.1 Threat Model
| Threat | Risk Level | Mitigation |
|--------|------------|------------|
| Credential theft from database | High | AES-256-GCM encryption at rest, key rotation |
| Webhook URL SSRF | High | URL validation, internal IP blocking |
| Script path traversal | Critical | Allowlist `/scripts/` directory only |
| Script command injection | Critical | Sanitize all arguments, no shell expansion |
| TSIG key exposure in logs | Medium | Redact secrets in all logs |
| DNS cache poisoning | Low | TSIG authentication for RFC 2136 |
| Webhook response injection | Low | Strict JSON parsing, no eval |
### 9.2 SSRF Prevention for Webhooks
Webhook URL validation MUST use Charon's existing centralized SSRF protection in `backend/internal/security/url_validator.go`:
```go
// backend/internal/services/webhook_provider.go
import "github.com/Wikid82/charon/backend/internal/security"
func (w *WebhookProvider) validateWebhookURL(urlStr string) error {
// Use existing centralized SSRF validation
// This validates:
// - HTTPS scheme required (production)
// - DNS resolution with timeout
// - All resolved IPs checked against private/reserved ranges
// - Cloud metadata endpoints blocked (169.254.169.254)
// - IPv4-mapped IPv6 bypass prevention
_, err := security.ValidateExternalURL(urlStr)
if err != nil {
return fmt.Errorf("webhook URL validation failed: %w", err)
}
return nil
}
```
**Existing `security.ValidateExternalURL()` provides:**
- RFC 1918 private network blocking (10.x, 172.16.x, 192.168.x)
- Loopback blocking (127.x.x.x, ::1) unless `WithAllowLocalhost()` option
- Link-local blocking (169.254.x.x, fe80::) including cloud metadata
- Reserved range blocking (0.x.x.x, 240.x.x.x)
- IPv6 unique local blocking (fc00::)
- IPv4-mapped IPv6 bypass prevention (::ffff:192.168.1.1)
- Hostname length validation (RFC 1035, max 253 chars)
- Suspicious pattern detection (..)
- Port range validation with privileged port blocking
**DO NOT** duplicate SSRF validation logic. Reference the existing implementation.
```
### 9.3 Script Execution Security
```go
// backend/internal/services/script_provider.go
import (
"context"
"os/exec"
"syscall"
)
func executeScript(scriptPath string, args []string) error {
// 1. Validate script path
allowedDir := "/scripts/"
absPath, _ := filepath.Abs(scriptPath)
if !strings.HasPrefix(absPath, allowedDir) {
return errors.New("script must be in /scripts/ directory")
}
// 2. Verify script exists and is executable
info, err := os.Stat(absPath)
if err != nil || info.IsDir() {
return errors.New("invalid script path")
}
// 3. Create restricted command with timeout wrapper (defense-in-depth)
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
// Use 'timeout' command as additional safeguard against hung processes
cmd := exec.CommandContext(ctx, "timeout", "--signal=KILL", "55s", absPath)
cmd.Args = append(cmd.Args, args...)
cmd.Dir = allowedDir
// 4. Minimal but functional environment
cmd.Env = []string{
"PATH=/usr/local/bin:/usr/bin:/bin",
"HOME=/tmp",
"LANG=C.UTF-8",
}
// 5. Resource limits via rlimit (prevents resource exhaustion)
cmd.SysProcAttr = &syscall.SysProcAttr{
Credential: &syscall.Credential{
Uid: 65534, // nobody user
Gid: 65534,
},
}
// Apply resource limits
setResourceLimits(cmd)
// 6. Capture output for logging
output, err := cmd.CombinedOutput()
// 7. Audit log
logScriptExecution(scriptPath, args, cmd.ProcessState.ExitCode(), output)
return err
}
// setResourceLimits applies rlimits to prevent resource exhaustion
// Note: These are set via prlimit(2) or container security context
func setResourceLimits(cmd *exec.Cmd) {
// RLIMIT_NOFILE: Max open file descriptors (prevent fd exhaustion)
// RLIMIT_NPROC: Max processes (prevent fork bombs)
// RLIMIT_AS: Max address space (prevent memory exhaustion)
//
// Recommended values:
// - NOFILE: 256
// - NPROC: 64
// - AS: 256MB
//
// Implementation note: In containerized deployments, these limits
// should be enforced via container security context (securityContext
// in Kubernetes, --ulimit in Docker) for stronger isolation.
}
```
**Security Layers (Defense-in-Depth):**
| Layer | Protection | Implementation |
|-------|------------|----------------|
| 1. Path validation | Restrict to `/scripts/` | `filepath.Abs()` + prefix check |
| 2. Timeout | Prevent hung scripts | `context.WithTimeout` + `timeout` command |
| 3. Resource limits | Prevent resource exhaustion | `rlimit` (NOFILE=256, NPROC=64, AS=256MB) |
| 4. Minimal environment | Reduce attack surface | Explicit `PATH`, no secrets |
| 5. Non-root execution | Limit privilege | `nobody` user (UID 65534) |
| 6. Container isolation | Strongest isolation | seccomp profile (see below) |
| 7. Audit logging | Forensics | All executions logged |
**Container Security (seccomp profile):**
For production deployments, scripts run within Charon's container which should have a restrictive seccomp profile. Document this requirement:
```yaml
# docker-compose.yml (recommended)
services:
charon:
security_opt:
- seccomp:seccomp-profile.json # Or use default Docker profile
# Alternative: Use --cap-drop=ALL --cap-add=<minimal>
```
**Note:** Full seccomp profile customization is out of scope for this feature. Users relying on script plugins in high-security environments should review container security configuration.
```
### 9.4 Log Redaction Patterns
Sensitive data MUST be redacted from all logs, including debug logs, error messages, and audit trails.
**Required Redaction Patterns:**
| Field Pattern | Redaction | Example |
|---------------|-----------|--------|
| `api_token` | `[REDACTED:api_token]` | `Bearer abc123` → `Bearer [REDACTED:api_token]` |
| `api_key` | `[REDACTED:api_key]` | `X-API-Key: secret` → `X-API-Key: [REDACTED:api_key]` |
| `secret` | `[REDACTED:secret]` | `client_secret=xyz` → `client_secret=[REDACTED:secret]` |
| `password` | `[REDACTED:password]` | `password=abc` → `password=[REDACTED:password]` |
| `tsig_key_secret` | `[REDACTED:tsig_secret]` | TSIG key value → `[REDACTED:tsig_secret]` |
| `authorization` | `[REDACTED:auth]` | `Authorization: Bearer ...` → `Authorization: [REDACTED:auth]` |
| `bearer` | `[REDACTED:bearer]` | Bearer token values → `[REDACTED:bearer]` |
**Implementation:**
```go
import "regexp"
var sensitivePatterns = []struct {
pattern *regexp.Regexp
replace string
}{
{regexp.MustCompile(`(?i)(api_token["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:api_token]`},
{regexp.MustCompile(`(?i)(api_key["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:api_key]`},
{regexp.MustCompile(`(?i)(secret["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:secret]`},
{regexp.MustCompile(`(?i)(password["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:password]`},
{regexp.MustCompile(`(?i)(tsig_key_secret["']?\s*[:=]\s*["']?)[^"'\s,}]+`), `$1[REDACTED:tsig_secret]`},
{regexp.MustCompile(`(?i)(authorization["']?\s*[:=]\s*["']?)(Bearer\s+)?[^"'\s,}]+`), `$1[REDACTED:auth]`},
{regexp.MustCompile(`(?i)Bearer\s+[A-Za-z0-9\-_=]+\.?[A-Za-z0-9\-_=]*\.?[A-Za-z0-9\-_=]*`), `Bearer [REDACTED:bearer]`},
}
func RedactSensitiveData(input string) string {
result := input
for _, sp := range sensitivePatterns {
result = sp.pattern.ReplaceAllString(result, sp.replace)
}
return result
}
// Apply to all log output
func (l *Logger) LogWithRedaction(level, msg string, fields map[string]any) {
// Redact message
msg = RedactSensitiveData(msg)
// Redact field values
for key, value := range fields {
if str, ok := value.(string); ok {
fields[key] = RedactSensitiveData(str)
}
}
l.underlying.Log(level, msg, fields)
}
```
**Enforcement:**
- All plugin code MUST use the redacting logger
- Pre-commit hooks SHOULD scan for potential credential logging
- Security tests MUST verify no secrets appear in logs
### 9.5 Audit Logging
All custom plugin operations MUST be logged (with redaction applied):
```go
type PluginAuditEvent struct {
Timestamp time.Time
PluginType string // "webhook", "script", "rfc2136", "manual"
Action string // "create_record", "delete_record", "verify"
ProviderID uint
Domain string
Success bool
Duration time.Duration
ErrorMsg string // Redacted before logging
Details map[string]any // Redacted credentials
}
```
---
## 10. Implementation Phases
### Phase 1: Manual Plugin (Week 1)
| Task | Hours | Owner |
|------|-------|-------|
| ManualProvider implementation | 4 | Backend |
| Manual challenge data model | 2 | Backend |
| Challenge verification endpoint | 3 | Backend |
| Polling endpoint (10s interval) | 2 | Backend |
| Manual challenge UI component | 6 | Frontend |
| Challenge cleanup scheduled task | 2 | Backend |
| Unit tests | 4 | QA |
| Integration tests | 3 | QA |
| i18n translation keys | 2 | Frontend |
| Documentation | 2 | Docs |
| **Total** | **32** | |
| **With 20% buffer** | **32** | |
**Deliverables:**
- [ ] `backend/pkg/dnsprovider/custom/manual.go`
- [ ] `backend/internal/services/manual_challenge_service.go`
- [ ] `frontend/src/components/ManualDNSChallenge.tsx`
- [ ] API endpoints for challenge lifecycle (including `/poll`)
- [ ] Translation keys in `frontend/src/locales/*/translation.json`:
- `dnsProvider.manual.title`
- `dnsProvider.manual.instructions`
- `dnsProvider.manual.recordName`
- `dnsProvider.manual.recordValue`
- `dnsProvider.manual.copyButton`
- `dnsProvider.manual.verifyButton`
- `dnsProvider.manual.checkDnsButton`
- `dnsProvider.manual.timeRemaining`
- `dnsProvider.manual.status.pending`
- `dnsProvider.manual.status.verified`
- `dnsProvider.manual.status.expired`
- `dnsProvider.manual.status.failed`
- `dnsProvider.manual.errors.*`
- [ ] User guide: `docs/features/manual-dns-challenge.md`
### Phase 2: RFC 2136 Plugin (Week 2)
| Task | Hours | Owner |
|------|-------|-------|
| RFC2136Provider implementation | 4 | Backend |
| TSIG credential validation | 3 | Backend |
| Caddy module integration research | 2 | Backend |
| **Dockerfile update (xcaddy + rfc2136)** | 2 | DevOps |
| RFC 2136 form UI | 4 | Frontend |
| i18n translation keys | 1 | Frontend |
| Unit tests | 3 | QA |
| Integration tests (with BIND container) | 4 | QA |
| Documentation + BIND setup guide | 3 | Docs |
| **Total** | **28** | |
| **With 20% buffer** | **28** | |
**Deliverables:**
- [ ] `backend/pkg/dnsprovider/custom/rfc2136.go`
- [ ] Caddy config generation for RFC 2136
- [ ] **Dockerfile modification:**
```dockerfile
# Multi-stage build: Caddy with RFC 2136 module
FROM caddy:2-builder AS caddy-builder
RUN xcaddy build \
--with github.com/caddy-dns/rfc2136
# Copy custom Caddy binary to final image
COPY --from=caddy-builder /usr/bin/caddy /usr/bin/caddy
```
- [ ] `frontend/src/components/RFC2136Form.tsx`
- [ ] Translation keys for RFC 2136 provider
- [ ] User guide: `docs/features/rfc2136-dns.md`
- [ ] BIND9 setup guide: `docs/guides/bind9-acme-setup.md`
### Phase 3: Webhook Plugin (Week 3)
| Task | Hours | Owner |
|------|-------|-------|
| WebhookProvider implementation | 5 | Backend |
| HTTP client with retry logic | 3 | Backend |
| Rate limiting + circuit breaker | 3 | Backend |
| SSRF validation (use existing) | 1 | Backend |
| Webhook form UI | 4 | Frontend |
| i18n translation keys | 1 | Frontend |
| Unit tests | 3 | QA |
| Integration tests (mock webhook server) | 3 | QA |
| Security tests (SSRF) | 2 | QA |
| Example webhook implementations | 2 | Docs |
| Documentation | 2 | Docs |
| **Total** | **30** | |
| **With 20% buffer** | **30** | |
**Deliverables:**
- [ ] `backend/pkg/dnsprovider/custom/webhook.go`
- [ ] `backend/internal/services/webhook_client.go`
- [ ] `frontend/src/components/WebhookForm.tsx`
- [ ] Translation keys for Webhook provider
- [ ] Example: `examples/webhook-server/nodejs/`
- [ ] Example: `examples/webhook-server/python/`
- [ ] User guide: `docs/features/webhook-dns.md`
### Phase 4: Script Plugin (Week 4, Optional)
| Task | Hours | Owner |
|------|-------|-------|
| ScriptProvider implementation | 4 | Backend |
| Secure execution sandbox | 4 | Backend |
| Security review | 3 | Security |
| Script form UI | 3 | Frontend |
| Unit tests | 3 | QA |
| Security tests | 4 | QA |
| Example scripts | 2 | Docs |
| Documentation | 2 | Docs |
| **Total** | **25** | |
**Deliverables:**
- [ ] `backend/pkg/dnsprovider/custom/script.go`
- [ ] `backend/internal/services/script_executor.go`
- [ ] `frontend/src/components/ScriptForm.tsx`
- [ ] Example: `examples/scripts/nsupdate.sh`
- [ ] Example: `examples/scripts/cloudns.sh`
- [ ] User guide: `docs/features/script-dns.md`
- [ ] Security guide: `docs/guides/script-plugin-security.md`
---
## 11. Testing Strategy
### 11.1 Unit Tests
Each provider requires tests for:
- Credential validation
- Config generation
- Error handling
- Timeout behavior
```go
// backend/pkg/dnsprovider/custom/webhook_test.go
func TestWebhookProvider_ValidateCredentials(t *testing.T) {
tests := []struct {
name string
creds map[string]string
wantErr bool
}{
{"valid with auth", map[string]string{"create_url": "https://...", "delete_url": "https://...", "auth_header": "X-Key", "auth_value": "secret"}, false},
{"valid without auth", map[string]string{"create_url": "https://...", "delete_url": "https://..."}, false},
{"missing create_url", map[string]string{"delete_url": "https://..."}, true},
{"http not allowed", map[string]string{"create_url": "http://...", "delete_url": "http://..."}, true},
{"internal IP blocked", map[string]string{"create_url": "https://192.168.1.1/dns", "delete_url": "https://192.168.1.1/dns"}, true},
}
// ...
}
```
### 11.2 Integration Tests
| Test Scenario | Components | Method |
|---------------|------------|--------|
| Manual challenge flow | Backend + Frontend | E2E with Playwright |
| RFC 2136 with BIND9 | Backend + BIND container | Docker Compose |
| Webhook with mock server | Backend + Mock HTTP | httptest |
| Script execution | Backend + Test scripts | Isolated container |
#### Manual Plugin E2E Scenarios (Playwright)
| Scenario | Description | Expected Result |
|----------|-------------|-----------------|
| Countdown timeout | User does not create DNS record | UI shows "Expired" after timeout, challenge marked expired |
| Copy buttons | User clicks "Copy" for record name/value | Values copied to clipboard, toast notification shown |
| DNS propagation success | User creates record, clicks "Verify" | After retries, status changes to "Verified" |
| DNS propagation failure | User creates wrong record | After max retries, shows "DNS record not found" |
| Cancel challenge | User clicks "Cancel Challenge" | Challenge marked as cancelled, UI returns to provider list |
| Refresh during challenge | User refreshes page during pending challenge | Challenge state persisted, countdown continues from correct time |
### 11.3 Additional Required Test Scenarios
#### Webhook Tests
| Scenario | Description | Expected Result |
|----------|-------------|----------------|
| Retry exhaustion | Webhook returns 500 for all 3 retry attempts | `WEBHOOK_TIMEOUT` error after final retry |
| Response too large | Webhook returns >1MB response | `WEBHOOK_RESPONSE_TOO_LARGE` error (413) |
| DNS rebinding | URL resolves to internal IP on second resolution | Request blocked, `SSRF_DETECTED` error |
| Idempotency replay | Same `request_id` sent twice | Second request returns cached response |
#### Circuit Breaker Tests
| Scenario | Description | Expected Result |
|----------|-------------|----------------|
| Open state transition | 5 consecutive failures | Circuit opens, `PROVIDER_CIRCUIT_OPEN` (503) |
| Half-open state | Wait 5 minutes after open | Next request allowed (test request) |
| Reset on success | Successful request in half-open | Circuit fully closes, counter resets |
| Stay open on failure | Failed request in half-open | Circuit remains open for another 5 minutes |
#### Script Tests
| Scenario | Description | Expected Result |
|----------|-------------|----------------|
| Timeout boundary (pass) | Script completes in 59 seconds | Success, output captured |
| Timeout boundary (fail) | Script runs for 61 seconds | `SCRIPT_TIMEOUT` error (504) |
| Invalid argument chars | Argument contains `; rm -rf /` | `INVALID_SCRIPT_ARGUMENT` error (400) |
| Symlink escape | Script path is symlink to `/etc/passwd` | `SCRIPT_PATH_INVALID` error (400) |
| Resource limit breach | Script tries to fork 100 processes | Script killed, resource limit error |
#### Manual Challenge Tests
| Scenario | Description | Expected Result |
|----------|-------------|----------------|
| Concurrent verify race | Two users verify same FQDN simultaneously | Only one succeeds, other gets `CHALLENGE_IN_PROGRESS` |
| CSRF token mismatch | POST without valid CSRF token | 403 Forbidden |
| Challenge ownership | User A tries to access User B's challenge | 403 Forbidden, audit log entry |
| Predictable ID attack | Attempt to enumerate challenge IDs | No information leakage, 404 for non-existent |
#### RFC 2136 Tests
| Scenario | Description | Expected Result |
|----------|-------------|----------------|
| Network timeout | DNS server unreachable | Timeout error with retry logic |
| Connection refused | DNS server port closed | `TSIG_AUTH_FAILED` or connection error |
| TSIG key mismatch | Wrong TSIG secret configured | `TSIG_AUTH_FAILED` (401) |
| Zone transfer denied | Server rejects update | Appropriate error message with zone info |
### 11.4 Security Tests
| Test | Tool | Target |
|------|------|--------|
| SSRF in webhook URLs | Custom test suite | WebhookProvider |
| Path traversal in scripts | Custom test suite | ScriptProvider |
| Credential leakage in logs | Log analysis | All providers |
| TSIG key handling | Memory dump analysis | RFC2136Provider |
### 11.5 Coverage Requirements
- Backend: ≥85% coverage
- Frontend: ≥85% coverage
- New provider code: ≥90% coverage
---
## 12. Documentation Requirements
### 12.1 User Documentation
| Document | Audience | Location |
|----------|----------|----------|
| Custom DNS Providers Overview | All users | `docs/features/custom-dns-providers.md` |
| Manual DNS Challenge Guide | Beginners | `docs/features/manual-dns-challenge.md` |
| RFC 2136 Setup Guide | Self-hosted DNS admins | `docs/features/rfc2136-dns.md` |
| Webhook Integration Guide | DevOps teams | `docs/features/webhook-dns.md` |
| Script Plugin Guide | Power users | `docs/features/script-dns.md` |
### 12.2 Technical Documentation
| Document | Audience | Location |
|----------|----------|----------|
| Custom Plugin Architecture | Contributors | `docs/development/custom-plugin-architecture.md` |
| Webhook API Specification | Integration devs | `docs/api/webhook-dns-api.md` |
| RFC 2136 Protocol Details | Network engineers | `docs/technical/rfc2136-implementation.md` |
### 12.3 Setup Guides
| Guide | Audience | Location |
|-------|----------|----------|
| BIND9 ACME Setup | Self-hosted users | `docs/guides/bind9-acme-setup.md` |
| PowerDNS ACME Setup | Self-hosted users | `docs/guides/powerdns-acme-setup.md` |
| Building Webhook Endpoints | Developers | `docs/guides/webhook-development.md` |
### 12.4 Operations and Security Documentation (Required)
The following documentation MUST be created as part of implementation:
| Document | Audience | Location | Priority |
|----------|----------|----------|----------|
| Custom DNS Plugin Troubleshooting | Support, Users | `docs/troubleshooting/custom-dns-plugins.md` | High |
| Custom DNS Security Hardening | Security, Admins | `docs/security/custom-dns-hardening.md` | High |
| Custom DNS Monitoring Guide | Operations | `docs/operations/custom-dns-monitoring.md` | Medium |
**Required Content for `docs/troubleshooting/custom-dns-plugins.md`:**
- Common error codes and resolutions
- Webhook debugging checklist
- Script execution troubleshooting
- RFC 2136 connection issues
- Manual challenge timeout scenarios
- Log analysis procedures
**Required Content for `docs/security/custom-dns-hardening.md`:**
- Webhook endpoint security best practices
- Script plugin security checklist
- TSIG key management procedures
- Network segmentation recommendations
- Audit logging configuration
- Incident response procedures
**Required Content for `docs/operations/custom-dns-monitoring.md`:**
- Key metrics to monitor (success rate, latency, errors)
- Alerting thresholds and recommendations
- Dashboard examples (Grafana/Prometheus)
- Capacity planning guidelines
- Runbook templates for common issues
---
## 13. Estimated Effort
### Summary by Phase
| Phase | Description | Hours | Hours (with 20% buffer) | Calendar |
|-------|-------------|-------|-------------------------|----------|
| 1 | Manual Plugin | 27 | 32 | 1 week |
| 2 | RFC 2136 Plugin | 23 | 28 | 1 week |
| 3 | Webhook Plugin | 25 | 30 | 1 week |
| **Total (Phases 1-3)** | **Core Features** | **75** | **90** | **3 weeks** |
| 4 | Script Plugin (Future) | 25 | 30 | 1 week |
| **Total (All Phases)** | **Including Future** | **100** | **120** | **4 weeks** |
**Note:** Phase 4 (Script Plugin) is conditional on community demand (>20 GitHub issues). See "Future Work" section.
### Effort by Role
| Role | Phase 1 | Phase 2 | Phase 3 | Phase 4* | Total |
|------|---------|---------|---------|----------|-------|
| Backend | 11h | 11h | 12h | 8h | 42h |
| Frontend | 8h | 5h | 5h | 3h | 21h |
| QA | 7h | 7h | 8h | 7h | 29h |
| Docs | 2h | 3h | 4h | 4h | 13h |
| DevOps | 0h | 2h | 0h | 0h | 2h |
| Security | 0h | 0h | 1h | 3h | 4h |
*Phase 4 effort is conditional
### MVP (Minimum Viable Product)
**MVP = Phase 1 (Manual Plugin)**
- Time: 32 hours / 1 week (with buffer)
- Unblocks: All users with unsupported DNS providers
- Risk: Low
---
## 14. Decisions and Open Questions
### Decisions Made
1. **Caddy Module Strategy for RFC 2136**
**DECIDED: Option B — RFC 2136 module will be included in Charon's Caddy build.**
Rationale: Best user experience. Users should not need to rebuild Caddy themselves. The Dockerfile will be updated in Phase 2 to use xcaddy with the `github.com/caddy-dns/rfc2136` module.
### Must Decide Before Implementation
1. **Script Plugin Security Model**
- Should scripts run in a separate container/sandbox?
- What environment variables should be available?
- Should we allow network access from scripts?
- **Recommendation:** No network by default, minimal env, document risks
2. **Manual Challenge Persistence**
- Store challenge details in database or session?
- How long to retain completed challenges?
- **Recommendation:** Database with 24-hour TTL cleanup (see Section 6.4)
3. **Webhook Retry Strategy**
- Exponential backoff vs. fixed interval?
- Max retries before failure?
- **Recommendation:** Exponential backoff (1s, 2s, 4s), max 3 retries
### Nice to Decide
1. **UI Location for Custom Plugins**
- Same page as built-in providers?
- Separate "Custom Integrations" section?
- **Recommendation:** Same page, grouped by category
2. **Telemetry for Custom Plugins**
- Should we track usage of custom plugin types?
- Privacy considerations?
- **Recommendation:** Opt-in anonymous usage stats
3. **Plugin Marketplace (Future)**
- Community-contributed webhook templates?
- Pre-configured RFC 2136 profiles?
- **Recommendation:** Defer to Phase 5+
---
## 15. Appendix
### A. Related Documents
- [Phase 5 Custom Plugins Spec](phase5_custom_plugins_spec.md) - Go plugin architecture (external .so files)
- [DNS Challenge Backend Research](dns_challenge_backend_research.md) - Original DNS-01 implementation notes
- [DNS Challenge Future Features](dns_challenge_future_features.md) - Roadmap context
### B. External References
- [RFC 2136: Dynamic Updates in DNS](https://datatracker.ietf.org/doc/html/rfc2136)
- [RFC 2845: TSIG Authentication](https://datatracker.ietf.org/doc/html/rfc2845)
- [Caddy DNS Challenge Docs](https://caddyserver.com/docs/automatic-https#dns-challenge)
- [Let's Encrypt DNS-01 Challenge](https://letsencrypt.org/docs/challenge-types/#dns-01-challenge)
### C. Example Webhook Payload
```json
{
"action": "create",
"fqdn": "_acme-challenge.example.com",
"domain": "example.com",
"subdomain": "_acme-challenge",
"value": "gZrH7wL9t3kM2nP4qX5yR8sT0uV1wZ2aB3cD4eF5gH6iJ7kL",
"ttl": 300,
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-01-08T15:30:00Z",
"charon_version": "1.2.0",
"certificate_domains": ["*.example.com", "example.com"]
}
```
### D. Example BIND9 TSIG Configuration
```zone
// /etc/bind/named.conf.local
key "acme-update-key" {
algorithm hmac-sha256;
secret "base64-encoded-secret-here==";
};
zone "example.com" {
type master;
file "/var/lib/bind/db.example.com";
update-policy {
grant acme-update-key name _acme-challenge.example.com. TXT;
};
};
```
---
## 16. Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-08 | Planning Agent | Initial specification |
| 1.1 | 2026-01-08 | Planning Agent | Supervisor review: addressed 13 issues (see below) |
| 1.2 | 2026-01-11 | Planning Agent | Supervisor review: addressed 9 critical/high priority findings (see Section 18) |
---
## 17. Supervisor Review Summary
This specification was revised to address all 13 issues identified during Supervisor review:
### Critical Issues (Fixed)
| # | Issue | Resolution |
|---|-------|------------|
| 1 | SSRF Duplication | Section 9.2 updated to reference existing `security.ValidateExternalURL()` in `backend/internal/security/url_validator.go` |
| 2 | Script Security Insufficient | Section 9.3 enhanced with rlimit enforcement, seccomp documentation, minimal PATH, and `timeout` command |
| 3 | Missing Caddy Integration Detail | Added Section 3.3.1-3.3.4 with sequence diagram, state machine, error handling, and communication protocol |
### High Severity Issues (Fixed)
| # | Issue | Resolution |
|---|-------|------------|
| 4 | RFC 2136 Caddy Module | Section 4.3 updated with DECISION; Phase 2 includes Dockerfile deliverable |
| 5 | WebSocket vs Polling | Section 4.4 updated; chose polling (10s interval) with rationale; polling endpoint added to API |
| 6 | Webhook Rate Limiting | Section 4.1 updated with rate limits (10/min) and circuit breaker (5 failures → 5 min disable) |
### Medium Severity Issues (Fixed)
| # | Issue | Resolution |
|---|-------|------------|
| 7 | Phase 4 Scope Creep | Phase 4 moved to "Future Work" section with explicit Go/No-Go gate (>20 GitHub issues) |
| 8 | Missing Error Codes | Section 7.3 added with comprehensive error code table |
| 9 | Time Estimates Buffer | Section 13 updated: Phase 1→32h, Phase 2→28h, Phase 3→30h (all +20%) |
| 10 | Open Question #1 | Section 14 changed to "Decisions and Open Questions"; Option B confirmed as DECIDED |
### Low Severity Issues (Fixed)
| # | Issue | Resolution |
|---|-------|------------|
| 11 | i18n Keys | Phase 1 deliverables updated with translation keys for `frontend/src/locales/*/translation.json` |
| 12 | E2E Test Scenarios | Section 11.2 expanded with Manual Plugin E2E scenarios table |
| 13 | Cleanup Mechanism | Section 6.4 added with cron-based cleanup using existing `robfig/cron/v3` pattern |
---
*This document has completed Supervisor review and is ready for technical review and stakeholder approval.*
---
## 18. Supervisor Review Summary (v1.2)
This specification was revised on January 11, 2026 to address 9 critical/high priority findings:
### Security Enhancements
| # | Finding | Resolution |
|---|---------|------------|
| 1 | Missing concurrent challenge handling | Section 3.3.5 added with database locking (`SELECT ... FOR UPDATE`), queueing behavior, and `CHALLENGE_IN_PROGRESS` error |
| 2 | Webhook DNS rebinding vulnerability | Section 4.1 updated: URLs validated at both configuration AND execution time |
| 3 | Missing webhook response size limit | Section 4.1 updated: `MaxWebhookResponseSize = 1MB`, new error code added |
| 4 | Missing webhook TLS skip option | Section 4.1 updated: `insecure_skip_verify` config with prominent warning |
| 5 | Webhook idempotency missing | Section 4.1 updated: `request_id` requirement for deduplication |
| 6 | Script argument sanitization weak | Section 4.2 updated: strict `[a-zA-Z0-9._=-]` pattern, new error code |
| 7 | Symlink escape vulnerability | Section 4.2 updated: `filepath.EvalSymlinks()` MUST be called before prefix check |
| 8 | Resource limits optional | Section 4.2 updated: rlimits now MANDATORY with specific values |
| 9 | Environment variable leakage | Section 4.2 updated: explicit environment clearing before script execution |
| 10 | RFC 2136 hmac-md5 insecure | Section 4.3 updated: `hmac-md5` marked DEPRECATED with removal warning |
| 11 | TSIG secret memory exposure | Section 4.3 updated: secure memory handling with memguard pattern |
| 12 | Manual challenge session binding missing | Section 4.4 updated: challenge-user binding, CSRF validation, UUIDv4 IDs |
| 13 | Log credential exposure | Section 9.4 added: comprehensive redaction patterns for 7 sensitive fields |
### Error Codes Added (Section 7.3)
| Code | HTTP Status | Description |
|------|-------------|-------------|
| `CHALLENGE_IN_PROGRESS` | 409 | Another challenge active for FQDN |
| `WEBHOOK_RESPONSE_TOO_LARGE` | 413 | Response exceeded 1MB limit |
| `INVALID_SCRIPT_ARGUMENT` | 400 | Invalid characters in script argument |
### Testing Scenarios Added (Section 11.3)
- Webhook retry exhaustion tests
- Circuit breaker state transition tests
- Script timeout boundary tests (59s pass, 61s fail)
- Manual challenge concurrent verify race condition test
- RFC 2136 network error tests
### Documentation Requirements Added (Section 12.4)
- `docs/troubleshooting/custom-dns-plugins.md`
- `docs/security/custom-dns-hardening.md`
- `docs/operations/custom-dns-monitoring.md`
---
*This document has been updated to address all supervisor review findings from January 11, 2026.*