Files
Charon/docs/plans/custom_dns_plugin_spec.md
GitHub Actions 8bcfe28709 docs: comprehensive supply chain security QA audit report
Complete security audit covering:
- CodeQL analysis (0 Critical/High issues)
- Trivy vulnerability scanning (clean)
- Shellcheck linting (2 issues fixed)
- Supply chain skill testing
- GitHub Actions workflow validation
- Regression testing

All critical checks PASSED. Ready for deployment.
2026-01-10 03:33:38 +00:00

1693 lines
74 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Custom DNS Provider Plugin Support - Feature Specification
**Status:** 📋 Planning (Revised)
**Priority:** P2 (Medium)
**Estimated Time:** 48-68 hours
**Author:** Planning Agent
**Date:** January 8, 2026
**Last Revised:** January 8, 2026
**Related:** [Phase 5 Custom Plugins Spec](phase5_custom_plugins_spec.md)
---
## 1. Executive Summary
### Problem Statement
Charon currently supports 10 built-in DNS providers for ACME DNS-01 challenges:
- Cloudflare, Route53, DigitalOcean, Hetzner, DNSimple, Vultr, GoDaddy, Namecheap, Google Cloud DNS, Azure
Users with DNS services not on this list cannot obtain wildcard certificates or use DNS-01 challenges. This limitation affects:
- Organizations using self-hosted DNS (BIND, PowerDNS, Knot DNS)
- Users of regional/niche DNS providers
- Enterprise environments with custom DNS APIs
- Air-gapped or on-premise deployments
### Proposed Solution
Implement multiple extensibility mechanisms that balance ease-of-use with flexibility:
| Option | Target User | Complexity | Automation Level |
|--------|-------------|------------|------------------|
| **A: Webhook Plugin** | DevOps, Integration teams | Medium | Full |
| **B: Script Plugin** | Sysadmins, Power users | Low-Medium | Full |
| **C: RFC 2136 Plugin** | Self-hosted DNS admins | Medium | Full |
| **D: Manual Plugin** | One-off certs, Testing | None | Manual |
### Success Criteria
- Users can obtain certificates using any DNS provider
- At least one plugin option is production-ready within 2 weeks
- Existing built-in providers continue to work unchanged
- 85% test coverage maintained
---
## 2. User Stories
### 2.1 Webhook Plugin (Option A)
> **As a DevOps engineer** with a custom DNS API, I want to provide webhook endpoints so Charon can automate DNS challenges without building a custom integration.
**Acceptance Criteria:**
- I can configure URLs for create/delete TXT record operations
- Charon sends JSON payloads with record details
- I can set custom headers for authentication
- Retry logic handles temporary failures
### 2.2 Script Plugin (Option B)
> **As a system administrator**, I want to run a shell script when Charon needs to create/delete TXT records so I can use my existing DNS automation tools.
**Acceptance Criteria:**
- I can specify a script path inside the container
- Script receives ACTION, DOMAIN, TOKEN, VALUE as arguments
- Script exit code determines success/failure
- Timeout prevents hung scripts
### 2.3 RFC 2136 Plugin (Option C)
> **As a network engineer** running BIND or PowerDNS, I want to use RFC 2136 Dynamic DNS Updates so Charon integrates with my existing infrastructure.
**Acceptance Criteria:**
- I can configure DNS server address and TSIG key
- Charon sends standards-compliant UPDATE messages
- Zone detection works automatically
- Works with BIND9, PowerDNS, Knot DNS
### 2.4 Manual Plugin (Option D)
> **As a user** with an unsupported provider, I want Charon to show me the required TXT record details so I can create it manually.
**Acceptance Criteria:**
- UI clearly displays the record name and value
- I can copy values with one click
- "Verify" button checks if record exists
- Progress indicator shows timeout countdown
### 2.5 General Stories
> **As an administrator**, I want to see all available DNS provider types (built-in + custom) in a unified list.
> **As a security officer**, I want custom plugin configurations to be validated and logged for audit purposes.
---
## 3. Architecture Analysis
### 3.1 Current Plugin System
Charon already has a well-designed plugin architecture in `backend/pkg/dnsprovider/`:
```
backend/pkg/dnsprovider/
├── plugin.go # ProviderPlugin interface (13 methods)
├── registry.go # Thread-safe registry (Global singleton)
├── errors.go # Custom error types
└── builtin/
├── init.go # Auto-registers 10 built-in providers
├── cloudflare.go # Example: implements ProviderPlugin
├── route53.go
└── ... (8 more providers)
```
**Key Interface Methods:**
```go
type ProviderPlugin interface {
Type() string
Metadata() ProviderMetadata
Init() error
Cleanup() error
RequiredCredentialFields() []CredentialFieldSpec
OptionalCredentialFields() []CredentialFieldSpec
ValidateCredentials(creds map[string]string) error
TestCredentials(creds map[string]string) error
SupportsMultiCredential() bool
BuildCaddyConfig(creds map[string]string) map[string]any
BuildCaddyConfigForZone(baseDomain string, creds map[string]string) map[string]any
PropagationTimeout() time.Duration
PollingInterval() time.Duration
}
```
### 3.2 How Custom Plugins Integrate
The existing architecture supports custom plugins via the registry pattern:
```
┌────────────────────────────────────────────────────────────────────┐
│ DNS Provider Registry │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────────┐│
│ │ Cloudflare │ │ Route53 │ │ ... (8) │ │ Custom Plugins ││
│ │ (built-in) │ │ (built-in) │ │ (built-in) │ │ ┌────────────────┐ ││
│ └────────────┘ └────────────┘ └────────────┘ │ │ Webhook Plugin │ ││
│ │ ├────────────────┤ ││
│ │ │ Script Plugin │ ││
│ │ ├────────────────┤ ││
│ │ │ RFC2136 Plugin │ ││
│ │ ├────────────────┤ ││
│ │ │ Manual Plugin │ ││
│ │ └────────────────┘ ││
│ └────────────────────┘│
└────────────────────────────────────────────────────────────────────┘
┌───────────────┴───────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ DNS Provider │ │ Caddy Config │
│ Service Layer │ │ Builder │
│ (CRUD + Test) │ │ (TLS Automation)│
└─────────────────┘ └─────────────────┘
```
### 3.3 Caddy DNS Challenge Integration
Caddy's TLS automation supports custom DNS providers via its module system. For Options A, B, C, we need to either:
1. **Use Caddy's `exec` DNS provider** - Caddy calls an external command
2. **Build a custom Caddy module** - Complex, requires Caddy rebuild
3. **Use Charon as a DNS proxy** - Charon handles DNS operations, returns status to Caddy
**Recommended Approach:** Option 3 (Charon as DNS proxy) for Webhook/Script plugins, native Caddy module for RFC 2136.
#### 3.3.1 Charon DNS Proxy Architecture
For Webhook and Script plugins, Charon acts as a DNS challenge proxy between Caddy and the external DNS provider:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ DNS Challenge Flow (Webhook/Script) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ 1. Certificate ┌──────────┐ 2. DNS-01 Challenge │
│ │ Caddy │ ──────────────▶ │ ACME │ ◀───────────────────── │
│ │ (TLS) │ │ Server │ │
│ └────┬─────┘ └──────────┘ │
│ │ │
│ │ 3. Create TXT record │
│ │ (via exec module or │
│ │ internal API) │
│ ▼ │
│ ┌──────────┐ 4. POST /internal/dns-challenge │
│ │ Charon │ ───────────────────────────────────────────────────────── │
│ │ (Proxy) │ │
│ └────┬─────┘ │
│ │ │
│ │ 5. Execute plugin (webhook/script) │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ External DNS Provider │ │
│ │ (Webhook endpoint or DNS server via script) │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
#### 3.3.2 Challenge Lifecycle State Machine
```
┌─────────────┐
│ CREATED │
│ (initial) │
└──────┬──────┘
Plugin executes create
┌─────────────┐
┌─────────────────────│ PENDING │─────────────────────┐
│ │ (awaiting │ │
│ │ propagation)│ │
│ └──────┬──────┘ │
│ │ │
Timeout (10 min) DNS check passes Plugin error
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ EXPIRED │ │ VERIFYING │ │ FAILED │
│ │ │ │ │ │
└─────────────┘ └──────┬──────┘ └─────────────┘
┌───────────┴───────────┐
│ │
ACME success ACME failure
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ VERIFIED │ │ FAILED │
│ (success) │ │ │
└─────────────┘ └─────────────┘
```
**State Definitions:**
| State | Description | Next States | TTL |
|-------|-------------|-------------|-----|
| `CREATED` | Challenge record created, plugin not yet executed | PENDING, FAILED | - |
| `PENDING` | Plugin executed, waiting for DNS propagation | VERIFYING, EXPIRED, FAILED | 10 min |
| `VERIFYING` | DNS record found, ACME validation in progress | VERIFIED, FAILED | 2 min |
| `VERIFIED` | Challenge completed successfully | (terminal) | 24h cleanup |
| `EXPIRED` | Timeout waiting for DNS propagation | (terminal) | 24h cleanup |
| `FAILED` | Plugin error or ACME validation failure | (terminal) | 24h cleanup |
#### 3.3.3 Caddy Communication
Charon exposes an internal API for Caddy to delegate DNS challenge operations:
```
POST /internal/dns-challenge/create
{
"provider_id": "uuid",
"fqdn": "_acme-challenge.example.com",
"value": "token-value"
}
Response: {"challenge_id": "uuid", "status": "pending"}
DELETE /internal/dns-challenge/{challenge_id}
Response: {"status": "deleted"}
```
#### 3.3.4 Error Handling When Charon is Unavailable
If Charon is unavailable during a DNS challenge:
1. **Caddy retry**: Caddy's built-in retry mechanism (3 attempts, exponential backoff)
2. **Graceful degradation**: If Charon remains unavailable, Caddy logs error and fails certificate issuance
3. **Health check**: Caddy pre-checks Charon availability via `/health` before initiating challenges
4. **Circuit breaker**: After 5 consecutive failures, Caddy disables the custom provider for 5 minutes
### 3.4 Database Model Impact
Current `dns_providers` table schema:
```sql
CREATE TABLE dns_providers (
id INTEGER PRIMARY KEY,
uuid VARCHAR(36) UNIQUE,
name VARCHAR(255) NOT NULL,
provider_type VARCHAR(50) NOT NULL, -- 'cloudflare', 'webhook', 'script', etc.
enabled BOOLEAN DEFAULT TRUE,
is_default BOOLEAN DEFAULT FALSE,
credentials_encrypted TEXT, -- Encrypted JSON blob
key_version INTEGER DEFAULT 1,
propagation_timeout INTEGER DEFAULT 120,
polling_interval INTEGER DEFAULT 5,
-- ... statistics fields
);
```
Custom plugins will use the same table with different `provider_type` values and plugin-specific credentials.
---
## 4. Proposed Solutions
### 4.1 Option A: Generic Webhook Plugin
#### Overview
User provides webhook URLs for create/delete TXT records. Charon POSTs JSON payloads with record details.
#### Configuration
```json
{
"name": "My Webhook DNS",
"provider_type": "webhook",
"credentials": {
"create_url": "https://api.example.com/dns/txt/create",
"delete_url": "https://api.example.com/dns/txt/delete",
"auth_header": "X-API-Key",
"auth_value": "secret-token-here",
"timeout_seconds": "30",
"retry_count": "3"
}
}
```
#### Request Payload (Sent to Webhook)
```json
{
"action": "create",
"fqdn": "_acme-challenge.example.com",
"domain": "example.com",
"subdomain": "_acme-challenge",
"value": "gZrH7wL9t3kM2nP4...",
"ttl": 300,
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-01-08T15:30:00Z"
}
```
#### Expected Response
```json
{
"success": true,
"message": "TXT record created",
"record_id": "optional-id-for-deletion"
}
```
#### Rate Limiting and Circuit Breaker
To prevent abuse and ensure reliability, webhook plugins enforce:
| Limit | Value | Behavior |
|-------|-------|----------|
| Max calls per minute | 10 | Requests beyond limit return 429 Too Many Requests |
| Circuit breaker threshold | 5 consecutive failures | Provider disabled for 5 minutes |
| Circuit breaker reset | Automatic after 5 minutes | First successful call fully resets counter |
**Implementation Requirements:**
```go
type WebhookRateLimiter struct {
callsPerMinute int // Max 10
consecutiveFails int // Track failures
disabledUntil time.Time // Circuit breaker timestamp
}
func (w *WebhookProvider) executeWithRateLimit(ctx context.Context, req *WebhookRequest) error {
if time.Now().Before(w.rateLimiter.disabledUntil) {
return ErrProviderCircuitOpen
}
// ... execute webhook with rate limiting
}
```
#### Pros
- Works with any HTTP-capable system
- No code changes required on user side (just API endpoint)
- Supports complex authentication (headers, query params)
- Can integrate with existing automation (Terraform, Ansible AWX, etc.)
#### Cons
- User must implement and host webhook endpoint
- Network latency adds to propagation time
- Debugging requires access to both Charon and webhook logs
- Security: webhook credentials stored in Charon
#### Implementation Complexity
- Backend: ~200 lines (WebhookProvider implementation)
- Frontend: ~100 lines (form fields)
- Tests: ~150 lines
---
### 4.2 Option B: Custom Script Plugin
#### Overview
User provides path to shell script inside container. Script receives ACTION, DOMAIN, TOKEN, VALUE as arguments.
#### Configuration
```json
{
"name": "My Script DNS",
"provider_type": "script",
"credentials": {
"script_path": "/scripts/dns-update.sh",
"timeout_seconds": "60",
"env_vars": "DNS_SERVER=ns1.example.com,API_KEY=${API_KEY}"
}
}
```
#### Script Interface
```bash
#!/bin/bash
# Called by Charon for DNS-01 challenge
# Arguments:
# $1 = ACTION: "create" or "delete"
# $2 = FQDN: "_acme-challenge.example.com"
# $3 = TOKEN: Challenge token (for identification)
# $4 = VALUE: TXT record value to set
ACTION="$1"
FQDN="$2"
TOKEN="$3"
VALUE="$4"
case "$ACTION" in
create)
# Create TXT record
nsupdate <<EOF
server ${DNS_SERVER}
update add ${FQDN} 300 TXT "${VALUE}"
send
EOF
;;
delete)
# Delete TXT record
nsupdate <<EOF
server ${DNS_SERVER}
update delete ${FQDN} TXT
send
EOF
;;
esac
# Exit code: 0 = success, non-zero = failure
```
#### Pros
- Maximum flexibility - any tool/language can be used
- Direct access to host system (if volume-mounted)
- Familiar paradigm for sysadmins
- Can leverage existing scripts/tooling
#### Cons
- **Security Risk:** Script execution in container context
- Harder to debug than API calls
- Script must be mounted into container
- No automatic retries (must implement in script)
- Sandboxing limits capability
#### Security Mitigations
1. Script must be in allowlisted directory (`/scripts/`)
2. Scripts run with restricted permissions (no network by default)
3. Timeout prevents resource exhaustion
4. All executions are audit-logged
#### Implementation Complexity
- Backend: ~250 lines (ScriptProvider + executor)
- Frontend: ~80 lines (form fields)
- Tests: ~200 lines (including security tests)
---
### 4.3 Option C: RFC 2136 (Dynamic DNS Update) Plugin
#### Overview
RFC 2136 defines a standard protocol for dynamic DNS updates. Supported by BIND, PowerDNS, Knot DNS, and many self-hosted DNS servers.
#### Configuration
```json
{
"name": "My BIND Server",
"provider_type": "rfc2136",
"credentials": {
"nameserver": "ns1.example.com",
"port": "53",
"tsig_key_name": "acme-update-key",
"tsig_key_secret": "base64-encoded-secret",
"tsig_algorithm": "hmac-sha256",
"zone": "example.com"
}
}
```
#### TSIG Algorithms Supported
- `hmac-md5` (legacy)
- `hmac-sha1`
- `hmac-sha256` (recommended)
- `hmac-sha384`
- `hmac-sha512`
#### DNS UPDATE Message Flow
```
┌──────────┐ ┌──────────────┐
│ Charon │ │ DNS Server │
│ │ DNS UPDATE │ (BIND, etc) │
│ │ ─────────────────▶│ │
│ │ TSIG-signed │ │
│ │ │ │
│ │ RESPONSE │ │
│ │ ◀─────────────────│ │
│ │ NOERROR/REFUSED │ │
└──────────┘ └──────────────┘
```
#### Caddy Integration
Caddy has a native RFC 2136 module: [caddy-dns/rfc2136](https://github.com/caddy-dns/rfc2136)
**DECISION:** Charon WILL ship with the RFC 2136 Caddy module pre-built in the Docker image. Users do NOT need to rebuild Caddy.
The Charon plugin would:
1. Store TSIG credentials encrypted
2. Generate Caddy config with proper RFC 2136 settings
3. Validate credentials by attempting a test query
**Dockerfile Addition (Phase 2):**
```dockerfile
# Build Caddy with RFC 2136 module
FROM caddy:builder AS caddy-builder
RUN xcaddy build \
--with github.com/caddy-dns/rfc2136
```
#### Pros
- Industry-standard protocol
- No custom server-side code needed
- Works with popular DNS servers (BIND9, PowerDNS, Knot)
- Secure with TSIG authentication
- Native Caddy module available
#### Cons
- Requires DNS server configuration for TSIG keys
- More complex setup than webhook
- Zone configuration required
- Firewall rules may need updating (TCP/UDP 53)
#### Implementation Complexity
- Backend: ~180 lines (RFC2136Provider)
- Frontend: ~120 lines (TSIG configuration form)
- Tests: ~150 lines
- Requires: Caddy rebuild with `caddy-dns/rfc2136` module
---
### 4.4 Option D: Manual/External Plugin
#### Overview
No automation - UI shows required TXT record details, user creates manually, clicks "Verify" when done.
#### UI Flow
```
┌─────────────────────────────────────────────────────────────────────┐
│ Manual DNS Challenge │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ To obtain a certificate for *.example.com, create the following │
│ TXT record at your DNS provider: │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Record Name: _acme-challenge.example.com [📋 Copy] │ │
│ ├────────────────────────────────────────────────────────────────┤ │
│ │ Record Value: gZrH7wL9t3kM2nP4qX5yR8sT... [📋 Copy] │ │
│ ├────────────────────────────────────────────────────────────────┤ │
│ │ TTL: 300 (5 minutes) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ ⏱️ Time remaining: 4:32 │
│ [━━━━━━━━━━━━━━━━━━━━━░░░░░░░░░░] 68% │
│ │
│ [Check DNS Now] [I've Created the Record - Verify] │
│ │
Record not yet propagated. Last check: 10 seconds ago │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
#### Configuration
```json
{
"name": "Manual DNS",
"provider_type": "manual",
"credentials": {
"timeout_minutes": "10",
"polling_interval_seconds": "30"
}
}
```
#### Technical Implementation
- Store challenge details in session/database
- Background job periodically queries DNS
- Polling endpoint for UI updates (10-second interval)
- Timeout after configurable period
**Note:** Although Charon has existing WebSocket infrastructure (`backend/internal/services/websocket_tracker.go`), polling is chosen for simplicity:
- Avoids additional WebSocket connection management complexity
- 10-second polling interval provides acceptable UX for manual workflows
- Reduces frontend state management burden
**Polling Endpoint:**
```
GET /api/v1/dns-providers/:id/manual-challenge/:challengeId/poll
Response (every 10s):
{
"status": "pending|verified|expired|failed",
"dns_propagated": false,
"time_remaining_seconds": 432,
"last_check_at": "2026-01-08T15:35:00Z"
}
```
#### Pros
- Works with ANY DNS provider
- No integration required
- Good for testing/development
- One-off certificate issuance
#### Cons
- User must manually intervene
- Time-sensitive (ACME challenge timeout)
- Not suitable for automated renewals
- Doesn't scale for multiple certificates
#### Implementation Complexity
- Backend: ~150 lines (ManualProvider + verification endpoint)
- Frontend: ~300 lines (interactive UI with copy/verify)
- Tests: ~100 lines
---
## 5. Recommended Approach
### Phase 1: Manual Plugin (1 week)
**Rationale:** Unblocks all users immediately. Lowest risk, highest immediate value.
Deliverables:
- ManualProvider implementation
- Interactive challenge UI
- DNS verification endpoint
- User documentation
### Phase 2: RFC 2136 Plugin (1 week)
**Rationale:** Standards-based, serves self-hosted DNS users. Caddy module already exists.
Deliverables:
- RFC2136Provider implementation
- TSIG credential storage
- Caddy module integration documentation
- BIND9/PowerDNS setup guides
### Phase 3: Webhook Plugin (1 week)
**Rationale:** Most flexible option for custom integrations. Medium complexity.
Deliverables:
- WebhookProvider implementation
- Configurable retry logic
- Request/response logging
- Example webhook implementations (Node.js, Python)
---
## Future Work
### Phase 4: Script Plugin (Conditional)
> **Go/No-Go Gate:** Phase 4 only proceeds if >20 user requests are received via GitHub issues requesting script plugin functionality. Track via label `feature:script-plugin`.
**Rationale:** Power-user feature with significant security implications. Implement only if demand warrants the additional security review and maintenance burden.
Deliverables:
- ScriptProvider implementation
- Security sandbox
- Example scripts for common scenarios
### Implementation Order Justification
```
User Value
│ ★ Manual Plugin (Phase 1)
│ - Unblocks everyone immediately
│ - Lowest implementation risk
│ ★ RFC 2136 Plugin (Phase 2)
│ - Self-hosted DNS is common need
│ - Industry standard
│ ★ Webhook Plugin (Phase 3)
│ - Flexible for edge cases
│ - Integration-focused teams
│ ○ Script Plugin (Phase 4)
│ - Power users only
│ - Security concerns
└────────────────────────────────▶ Implementation Effort
```
---
## 6. Database Schema Changes
### 6.1 No New Tables Required
The existing `dns_providers` table schema supports custom plugins. The `provider_type` column accepts new values, and `credentials_encrypted` stores plugin-specific configuration.
### 6.2 Provider Type Enumeration
Expand the allowed `provider_type` values:
```go
// backend/pkg/dnsprovider/types.go
const (
// Built-in providers
TypeCloudflare = "cloudflare"
TypeRoute53 = "route53"
// ... existing providers
// Custom plugins
TypeWebhook = "webhook"
TypeScript = "script"
TypeRFC2136 = "rfc2136"
TypeManual = "manual"
)
```
### 6.3 Credential Schemas Per Plugin Type
#### Webhook Credentials
```json
{
"create_url": "string (required)",
"delete_url": "string (required)",
"auth_header": "string (optional)",
"auth_value": "string (optional, encrypted)",
"content_type": "string (default: application/json)",
"timeout_seconds": "integer (default: 30)",
"retry_count": "integer (default: 3)",
"custom_headers": "object (optional)"
}
```
#### Script Credentials
```json
{
"script_path": "string (required)",
"timeout_seconds": "integer (default: 60)",
"working_directory": "string (optional)",
"env_vars": "string (optional, KEY=VALUE format)"
}
```
#### RFC 2136 Credentials
```json
{
"nameserver": "string (required)",
"port": "integer (default: 53)",
"tsig_key_name": "string (required)",
"tsig_key_secret": "string (required, encrypted)",
"tsig_algorithm": "string (default: hmac-sha256)",
"zone": "string (optional, auto-detect)"
}
```
#### Manual Credentials
```json
{
"timeout_minutes": "integer (default: 10)",
"polling_interval_seconds": "integer (default: 30)"
}
```
### 6.4 Challenge Cleanup Mechanism
Challenges are cleaned up via Charon's existing scheduled task infrastructure (using `robfig/cron/v3`, same pattern as `backup_service.go`):
```go
// Cleanup job runs hourly
func (s *ManualChallengeService) scheduleCleanup() {
_, err := s.cron.AddFunc("0 * * * *", s.cleanupExpiredChallenges)
// ...
}
func (s *ManualChallengeService) cleanupExpiredChallenges() {
// Mark challenges in "pending" state > 24 hours as "expired"
// Delete challenge records > 7 days old
cutoff := time.Now().Add(-24 * time.Hour)
s.db.Model(&Challenge{}).
Where("status = ? AND created_at < ?", "pending", cutoff).
Update("status", "expired")
// Hard delete after 7 days
deleteCutoff := time.Now().Add(-7 * 24 * time.Hour)
s.db.Where("created_at < ?", deleteCutoff).Delete(&Challenge{})
}
```
**Cleanup Schedule:**
| Condition | Action | Frequency |
|-----------|--------|-----------|
| `pending` status > 24 hours | Mark as `expired` | Hourly |
| Any challenge > 7 days old | Hard delete | Hourly |
---
## 7. API Design
### 7.1 Existing Endpoints (No Changes)
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/v1/dns-providers` | List all providers |
| POST | `/api/v1/dns-providers` | Create provider |
| GET | `/api/v1/dns-providers/:id` | Get provider |
| PUT | `/api/v1/dns-providers/:id` | Update provider |
| DELETE | `/api/v1/dns-providers/:id` | Delete provider |
| POST | `/api/v1/dns-providers/:id/test` | Test credentials |
| GET | `/api/v1/dns-providers/types` | List provider types |
### 7.2 New Endpoints
#### Manual Challenge Status
```
GET /api/v1/dns-providers/:id/manual-challenge/:challengeId
```
Response:
```json
{
"id": "challenge-uuid",
"status": "pending|verified|expired|failed",
"fqdn": "_acme-challenge.example.com",
"value": "gZrH7wL9t3kM2nP4...",
"created_at": "2026-01-08T15:30:00Z",
"expires_at": "2026-01-08T15:40:00Z",
"last_check_at": "2026-01-08T15:35:00Z",
"dns_propagated": false
}
```
#### Manual Challenge Verification Trigger
```
POST /api/v1/dns-providers/:id/manual-challenge/:challengeId/verify
```
Response:
```json
{
"success": true,
"dns_found": true,
"message": "TXT record verified successfully"
}
```
### 7.3 Error Response Codes
All manual challenge and custom plugin endpoints use consistent error codes:
| Error Code | HTTP Status | Description |
|------------|-------------|-------------|
| `CHALLENGE_NOT_FOUND` | 404 | Challenge ID does not exist |
| `CHALLENGE_EXPIRED` | 410 | Challenge has timed out |
| `DNS_NOT_PROPAGATED` | 200 | DNS record not yet found (success: false) |
| `INVALID_PROVIDER_TYPE` | 400 | Unknown provider type |
| `WEBHOOK_TIMEOUT` | 504 | Webhook did not respond in time |
| `WEBHOOK_RATE_LIMITED` | 429 | Too many webhook calls (>10/min) |
| `PROVIDER_CIRCUIT_OPEN` | 503 | Provider disabled due to consecutive failures |
| `SCRIPT_TIMEOUT` | 504 | Script execution exceeded timeout |
| `SCRIPT_PATH_INVALID` | 400 | Script path not in allowed directory |
| `TSIG_AUTH_FAILED` | 401 | RFC 2136 TSIG authentication failed |
**Error Response Format:**
```json
{
"success": false,
"error": {
"code": "CHALLENGE_EXPIRED",
"message": "Challenge timed out after 10 minutes",
"details": {
"challenge_id": "uuid",
"expired_at": "2026-01-08T15:40:00Z"
}
}
}
```
### 7.4 Updated Types Endpoint Response
The existing `/api/v1/dns-providers/types` endpoint will include custom plugins:
```json
{
"types": [
{
"type": "cloudflare",
"name": "Cloudflare",
"is_built_in": true,
"fields": [...]
},
{
"type": "webhook",
"name": "Webhook (Generic)",
"is_built_in": false,
"category": "custom",
"fields": [
{"name": "create_url", "label": "Create Record URL", "type": "text", "required": true},
{"name": "delete_url", "label": "Delete Record URL", "type": "text", "required": true},
{"name": "auth_header", "label": "Auth Header Name", "type": "text", "required": false},
{"name": "auth_value", "label": "Auth Header Value", "type": "password", "required": false}
]
},
{
"type": "rfc2136",
"name": "RFC 2136 (Dynamic DNS)",
"is_built_in": false,
"category": "custom",
"fields": [
{"name": "nameserver", "label": "DNS Server", "type": "text", "required": true},
{"name": "tsig_key_name", "label": "TSIG Key Name", "type": "text", "required": true},
{"name": "tsig_key_secret", "label": "TSIG Secret", "type": "password", "required": true},
{"name": "tsig_algorithm", "label": "TSIG Algorithm", "type": "select", "options": [...]}
]
},
{
"type": "manual",
"name": "Manual (No Automation)",
"is_built_in": false,
"category": "custom",
"fields": [
{"name": "timeout_minutes", "label": "Challenge Timeout (minutes)", "type": "number", "default": "10"}
]
}
]
}
```
---
## 8. Frontend UI Mockups
### 8.1 Provider Type Selection (Updated)
```
┌─────────────────────────────────────────────────────────────────────┐
│ Add DNS Provider │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Select Provider Type: │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ BUILT-IN PROVIDERS ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐││
│ │ │ ☁️ Cloudflare│ │ 🔶 Route53 │ │ 💧 Digital │ │ 🔷 Azure │││
│ │ │ │ │ │ │ Ocean │ │ │││
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐││
│ │ │ 🌐 Google │ │ 🟠 Hetzner │ │ 📛 GoDaddy │ │ 🔵 Namecheap│││
│ │ │ Cloud DNS │ │ │ │ │ │ │││
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ CUSTOM INTEGRATIONS ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐││
│ │ │ 🔗 Webhook │ │ 📜 Script │ │ 📡 RFC 2136 │ │ ✋ Manual │││
│ │ │ (HTTP) │ │ (Shell) │ │ (DDNS) │ │ │││
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ [Cancel] [Next →] │
└─────────────────────────────────────────────────────────────────────┘
```
### 8.2 Webhook Configuration Form
```
┌─────────────────────────────────────────────────────────────────────┐
│ Configure Webhook Provider │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Provider Name: │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ My Custom DNS Webhook ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ Create Record URL: * │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ https://api.example.com/dns/create ││
│ └─────────────────────────────────────────────────────────────────┘│
Charon will POST JSON with record details │
│ │
│ Delete Record URL: * │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ https://api.example.com/dns/delete ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ ── Authentication (Optional) ──────────────────────────────────────│
│ │
│ Header Name: Header Value: │
│ ┌───────────────────┐ ┌───────────────────────────────┐ │
│ │ X-API-Key │ │ •••••••••••••• │ │
│ └───────────────────┘ └───────────────────────────────┘ │
│ │
│ ── Advanced Settings ──────────────────────────────────────────────│
│ │
│ Timeout (seconds): [30 ▼] Retry Count: [3 ▼] │
│ │
│ │
│ [Test Connection] [Cancel] [Save Provider] │
└─────────────────────────────────────────────────────────────────────┘
```
### 8.3 RFC 2136 Configuration Form
```
┌─────────────────────────────────────────────────────────────────────┐
│ Configure RFC 2136 Provider │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Provider Name: │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ Internal BIND Server ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ DNS Server: * Port: │
│ ┌─────────────────────────────────────┐ ┌─────────────────────────┐│
│ │ ns1.internal.example.com │ │ 53 ││
│ └─────────────────────────────────────┘ └─────────────────────────┘│
│ │
│ ── TSIG Authentication ────────────────────────────────────────────│
│ │
│ Key Name: * │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ acme-update-key.example.com ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ Key Secret: * │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ •••••••••••••••••••••••••••••••• ││
│ └─────────────────────────────────────────────────────────────────┘│
Base64-encoded TSIG secret │
│ │
│ Algorithm: │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ HMAC-SHA256 (Recommended) ▼ ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ Zone (optional - auto-detected if empty): │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ [Test Connection] [Cancel] [Save Provider] │
└─────────────────────────────────────────────────────────────────────┘
```
### 8.4 Manual Challenge UI
```
┌─────────────────────────────────────────────────────────────────────┐
│ 🔐 Manual DNS Challenge │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Certificate Request: *.example.com │
│ Provider: Manual DNS (example-manual) │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ 📋 CREATE THIS TXT RECORD AT YOUR DNS PROVIDER ││
│ │ ││
│ │ Record Name: ││
│ │ ┌──────────────────────────────────────────────────┐ ┌──────┐││
│ │ │ _acme-challenge.example.com │ │ Copy │││
│ │ └──────────────────────────────────────────────────┘ └──────┘││
│ │ ││
│ │ Record Type: TXT ││
│ │ ││
│ │ Record Value: ││
│ │ ┌──────────────────────────────────────────────────┐ ┌──────┐││
│ │ │ gZrH7wL9t3kM2nP4qX5yR8sT0uV1wZ2aB3cD4eF5gH6iJ7 │ │ Copy │││
│ │ └──────────────────────────────────────────────────┘ └──────┘││
│ │ ││
│ │ TTL: 300 seconds (5 minutes) ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ ⏱️ Time Remaining: 7:23 ││
│ │ [━━━━━━━━━━━━━━━━━░░░░░░░░░░░░░░░] 52% ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │
│ Status: ⏳ Waiting for DNS propagation... │
│ Last checked: 15 seconds ago │
│ │
│ ┌─────────────────────┐ ┌────────────────────────────────────────┐│
│ │ 🔍 Check DNS Now │ │ ✅ I've Created the Record - Verify ││
│ └─────────────────────┘ └────────────────────────────────────────┘│
│ │
│ [Cancel Challenge] │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 9. Security Considerations
### 9.1 Threat Model
| Threat | Risk Level | Mitigation |
|--------|------------|------------|
| Credential theft from database | High | AES-256-GCM encryption at rest, key rotation |
| Webhook URL SSRF | High | URL validation, internal IP blocking |
| Script path traversal | Critical | Allowlist `/scripts/` directory only |
| Script command injection | Critical | Sanitize all arguments, no shell expansion |
| TSIG key exposure in logs | Medium | Redact secrets in all logs |
| DNS cache poisoning | Low | TSIG authentication for RFC 2136 |
| Webhook response injection | Low | Strict JSON parsing, no eval |
### 9.2 SSRF Prevention for Webhooks
Webhook URL validation MUST use Charon's existing centralized SSRF protection in `backend/internal/security/url_validator.go`:
```go
// backend/internal/services/webhook_provider.go
import "github.com/Wikid82/charon/backend/internal/security"
func (w *WebhookProvider) validateWebhookURL(urlStr string) error {
// Use existing centralized SSRF validation
// This validates:
// - HTTPS scheme required (production)
// - DNS resolution with timeout
// - All resolved IPs checked against private/reserved ranges
// - Cloud metadata endpoints blocked (169.254.169.254)
// - IPv4-mapped IPv6 bypass prevention
_, err := security.ValidateExternalURL(urlStr)
if err != nil {
return fmt.Errorf("webhook URL validation failed: %w", err)
}
return nil
}
```
**Existing `security.ValidateExternalURL()` provides:**
- RFC 1918 private network blocking (10.x, 172.16.x, 192.168.x)
- Loopback blocking (127.x.x.x, ::1) unless `WithAllowLocalhost()` option
- Link-local blocking (169.254.x.x, fe80::) including cloud metadata
- Reserved range blocking (0.x.x.x, 240.x.x.x)
- IPv6 unique local blocking (fc00::)
- IPv4-mapped IPv6 bypass prevention (::ffff:192.168.1.1)
- Hostname length validation (RFC 1035, max 253 chars)
- Suspicious pattern detection (..)
- Port range validation with privileged port blocking
**DO NOT** duplicate SSRF validation logic. Reference the existing implementation.
```
### 9.3 Script Execution Security
```go
// backend/internal/services/script_provider.go
import (
"context"
"os/exec"
"syscall"
)
func executeScript(scriptPath string, args []string) error {
// 1. Validate script path
allowedDir := "/scripts/"
absPath, _ := filepath.Abs(scriptPath)
if !strings.HasPrefix(absPath, allowedDir) {
return errors.New("script must be in /scripts/ directory")
}
// 2. Verify script exists and is executable
info, err := os.Stat(absPath)
if err != nil || info.IsDir() {
return errors.New("invalid script path")
}
// 3. Create restricted command with timeout wrapper (defense-in-depth)
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
// Use 'timeout' command as additional safeguard against hung processes
cmd := exec.CommandContext(ctx, "timeout", "--signal=KILL", "55s", absPath)
cmd.Args = append(cmd.Args, args...)
cmd.Dir = allowedDir
// 4. Minimal but functional environment
cmd.Env = []string{
"PATH=/usr/local/bin:/usr/bin:/bin",
"HOME=/tmp",
"LANG=C.UTF-8",
}
// 5. Resource limits via rlimit (prevents resource exhaustion)
cmd.SysProcAttr = &syscall.SysProcAttr{
Credential: &syscall.Credential{
Uid: 65534, // nobody user
Gid: 65534,
},
}
// Apply resource limits
setResourceLimits(cmd)
// 6. Capture output for logging
output, err := cmd.CombinedOutput()
// 7. Audit log
logScriptExecution(scriptPath, args, cmd.ProcessState.ExitCode(), output)
return err
}
// setResourceLimits applies rlimits to prevent resource exhaustion
// Note: These are set via prlimit(2) or container security context
func setResourceLimits(cmd *exec.Cmd) {
// RLIMIT_NOFILE: Max open file descriptors (prevent fd exhaustion)
// RLIMIT_NPROC: Max processes (prevent fork bombs)
// RLIMIT_AS: Max address space (prevent memory exhaustion)
//
// Recommended values:
// - NOFILE: 256
// - NPROC: 64
// - AS: 256MB
//
// Implementation note: In containerized deployments, these limits
// should be enforced via container security context (securityContext
// in Kubernetes, --ulimit in Docker) for stronger isolation.
}
```
**Security Layers (Defense-in-Depth):**
| Layer | Protection | Implementation |
|-------|------------|----------------|
| 1. Path validation | Restrict to `/scripts/` | `filepath.Abs()` + prefix check |
| 2. Timeout | Prevent hung scripts | `context.WithTimeout` + `timeout` command |
| 3. Resource limits | Prevent resource exhaustion | `rlimit` (NOFILE=256, NPROC=64, AS=256MB) |
| 4. Minimal environment | Reduce attack surface | Explicit `PATH`, no secrets |
| 5. Non-root execution | Limit privilege | `nobody` user (UID 65534) |
| 6. Container isolation | Strongest isolation | seccomp profile (see below) |
| 7. Audit logging | Forensics | All executions logged |
**Container Security (seccomp profile):**
For production deployments, scripts run within Charon's container which should have a restrictive seccomp profile. Document this requirement:
```yaml
# docker-compose.yml (recommended)
services:
charon:
security_opt:
- seccomp:seccomp-profile.json # Or use default Docker profile
# Alternative: Use --cap-drop=ALL --cap-add=<minimal>
```
**Note:** Full seccomp profile customization is out of scope for this feature. Users relying on script plugins in high-security environments should review container security configuration.
```
### 9.4 Audit Logging
All custom plugin operations MUST be logged:
```go
type PluginAuditEvent struct {
Timestamp time.Time
PluginType string // "webhook", "script", "rfc2136", "manual"
Action string // "create_record", "delete_record", "verify"
ProviderID uint
Domain string
Success bool
Duration time.Duration
ErrorMsg string
Details map[string]any // Redacted credentials
}
```
---
## 10. Implementation Phases
### Phase 1: Manual Plugin (Week 1)
| Task | Hours | Owner |
|------|-------|-------|
| ManualProvider implementation | 4 | Backend |
| Manual challenge data model | 2 | Backend |
| Challenge verification endpoint | 3 | Backend |
| Polling endpoint (10s interval) | 2 | Backend |
| Manual challenge UI component | 6 | Frontend |
| Challenge cleanup scheduled task | 2 | Backend |
| Unit tests | 4 | QA |
| Integration tests | 3 | QA |
| i18n translation keys | 2 | Frontend |
| Documentation | 2 | Docs |
| **Total** | **32** | |
| **With 20% buffer** | **32** | |
**Deliverables:**
- [ ] `backend/pkg/dnsprovider/custom/manual.go`
- [ ] `backend/internal/services/manual_challenge_service.go`
- [ ] `frontend/src/components/ManualDNSChallenge.tsx`
- [ ] API endpoints for challenge lifecycle (including `/poll`)
- [ ] Translation keys in `frontend/src/locales/*/translation.json`:
- `dnsProvider.manual.title`
- `dnsProvider.manual.instructions`
- `dnsProvider.manual.recordName`
- `dnsProvider.manual.recordValue`
- `dnsProvider.manual.copyButton`
- `dnsProvider.manual.verifyButton`
- `dnsProvider.manual.checkDnsButton`
- `dnsProvider.manual.timeRemaining`
- `dnsProvider.manual.status.pending`
- `dnsProvider.manual.status.verified`
- `dnsProvider.manual.status.expired`
- `dnsProvider.manual.status.failed`
- `dnsProvider.manual.errors.*`
- [ ] User guide: `docs/features/manual-dns-challenge.md`
### Phase 2: RFC 2136 Plugin (Week 2)
| Task | Hours | Owner |
|------|-------|-------|
| RFC2136Provider implementation | 4 | Backend |
| TSIG credential validation | 3 | Backend |
| Caddy module integration research | 2 | Backend |
| **Dockerfile update (xcaddy + rfc2136)** | 2 | DevOps |
| RFC 2136 form UI | 4 | Frontend |
| i18n translation keys | 1 | Frontend |
| Unit tests | 3 | QA |
| Integration tests (with BIND container) | 4 | QA |
| Documentation + BIND setup guide | 3 | Docs |
| **Total** | **28** | |
| **With 20% buffer** | **28** | |
**Deliverables:**
- [ ] `backend/pkg/dnsprovider/custom/rfc2136.go`
- [ ] Caddy config generation for RFC 2136
- [ ] **Dockerfile modification:**
```dockerfile
# Multi-stage build: Caddy with RFC 2136 module
FROM caddy:2-builder AS caddy-builder
RUN xcaddy build \
--with github.com/caddy-dns/rfc2136
# Copy custom Caddy binary to final image
COPY --from=caddy-builder /usr/bin/caddy /usr/bin/caddy
```
- [ ] `frontend/src/components/RFC2136Form.tsx`
- [ ] Translation keys for RFC 2136 provider
- [ ] User guide: `docs/features/rfc2136-dns.md`
- [ ] BIND9 setup guide: `docs/guides/bind9-acme-setup.md`
### Phase 3: Webhook Plugin (Week 3)
| Task | Hours | Owner |
|------|-------|-------|
| WebhookProvider implementation | 5 | Backend |
| HTTP client with retry logic | 3 | Backend |
| Rate limiting + circuit breaker | 3 | Backend |
| SSRF validation (use existing) | 1 | Backend |
| Webhook form UI | 4 | Frontend |
| i18n translation keys | 1 | Frontend |
| Unit tests | 3 | QA |
| Integration tests (mock webhook server) | 3 | QA |
| Security tests (SSRF) | 2 | QA |
| Example webhook implementations | 2 | Docs |
| Documentation | 2 | Docs |
| **Total** | **30** | |
| **With 20% buffer** | **30** | |
**Deliverables:**
- [ ] `backend/pkg/dnsprovider/custom/webhook.go`
- [ ] `backend/internal/services/webhook_client.go`
- [ ] `frontend/src/components/WebhookForm.tsx`
- [ ] Translation keys for Webhook provider
- [ ] Example: `examples/webhook-server/nodejs/`
- [ ] Example: `examples/webhook-server/python/`
- [ ] User guide: `docs/features/webhook-dns.md`
### Phase 4: Script Plugin (Week 4, Optional)
| Task | Hours | Owner |
|------|-------|-------|
| ScriptProvider implementation | 4 | Backend |
| Secure execution sandbox | 4 | Backend |
| Security review | 3 | Security |
| Script form UI | 3 | Frontend |
| Unit tests | 3 | QA |
| Security tests | 4 | QA |
| Example scripts | 2 | Docs |
| Documentation | 2 | Docs |
| **Total** | **25** | |
**Deliverables:**
- [ ] `backend/pkg/dnsprovider/custom/script.go`
- [ ] `backend/internal/services/script_executor.go`
- [ ] `frontend/src/components/ScriptForm.tsx`
- [ ] Example: `examples/scripts/nsupdate.sh`
- [ ] Example: `examples/scripts/cloudns.sh`
- [ ] User guide: `docs/features/script-dns.md`
- [ ] Security guide: `docs/guides/script-plugin-security.md`
---
## 11. Testing Strategy
### 11.1 Unit Tests
Each provider requires tests for:
- Credential validation
- Config generation
- Error handling
- Timeout behavior
```go
// backend/pkg/dnsprovider/custom/webhook_test.go
func TestWebhookProvider_ValidateCredentials(t *testing.T) {
tests := []struct {
name string
creds map[string]string
wantErr bool
}{
{"valid with auth", map[string]string{"create_url": "https://...", "delete_url": "https://...", "auth_header": "X-Key", "auth_value": "secret"}, false},
{"valid without auth", map[string]string{"create_url": "https://...", "delete_url": "https://..."}, false},
{"missing create_url", map[string]string{"delete_url": "https://..."}, true},
{"http not allowed", map[string]string{"create_url": "http://...", "delete_url": "http://..."}, true},
{"internal IP blocked", map[string]string{"create_url": "https://192.168.1.1/dns", "delete_url": "https://192.168.1.1/dns"}, true},
}
// ...
}
```
### 11.2 Integration Tests
| Test Scenario | Components | Method |
|---------------|------------|--------|
| Manual challenge flow | Backend + Frontend | E2E with Playwright |
| RFC 2136 with BIND9 | Backend + BIND container | Docker Compose |
| Webhook with mock server | Backend + Mock HTTP | httptest |
| Script execution | Backend + Test scripts | Isolated container |
#### Manual Plugin E2E Scenarios (Playwright)
| Scenario | Description | Expected Result |
|----------|-------------|-----------------|
| Countdown timeout | User does not create DNS record | UI shows "Expired" after timeout, challenge marked expired |
| Copy buttons | User clicks "Copy" for record name/value | Values copied to clipboard, toast notification shown |
| DNS propagation success | User creates record, clicks "Verify" | After retries, status changes to "Verified" |
| DNS propagation failure | User creates wrong record | After max retries, shows "DNS record not found" |
| Cancel challenge | User clicks "Cancel Challenge" | Challenge marked as cancelled, UI returns to provider list |
| Refresh during challenge | User refreshes page during pending challenge | Challenge state persisted, countdown continues from correct time |
### 11.3 Security Tests
| Test | Tool | Target |
|------|------|--------|
| SSRF in webhook URLs | Custom test suite | WebhookProvider |
| Path traversal in scripts | Custom test suite | ScriptProvider |
| Credential leakage in logs | Log analysis | All providers |
| TSIG key handling | Memory dump analysis | RFC2136Provider |
### 11.4 Coverage Requirements
- Backend: ≥85% coverage
- Frontend: ≥85% coverage
- New provider code: ≥90% coverage
---
## 12. Documentation Requirements
### 12.1 User Documentation
| Document | Audience | Location |
|----------|----------|----------|
| Custom DNS Providers Overview | All users | `docs/features/custom-dns-providers.md` |
| Manual DNS Challenge Guide | Beginners | `docs/features/manual-dns-challenge.md` |
| RFC 2136 Setup Guide | Self-hosted DNS admins | `docs/features/rfc2136-dns.md` |
| Webhook Integration Guide | DevOps teams | `docs/features/webhook-dns.md` |
| Script Plugin Guide | Power users | `docs/features/script-dns.md` |
### 12.2 Technical Documentation
| Document | Audience | Location |
|----------|----------|----------|
| Custom Plugin Architecture | Contributors | `docs/development/custom-plugin-architecture.md` |
| Webhook API Specification | Integration devs | `docs/api/webhook-dns-api.md` |
| RFC 2136 Protocol Details | Network engineers | `docs/technical/rfc2136-implementation.md` |
### 12.3 Setup Guides
| Guide | Audience | Location |
|-------|----------|----------|
| BIND9 ACME Setup | Self-hosted users | `docs/guides/bind9-acme-setup.md` |
| PowerDNS ACME Setup | Self-hosted users | `docs/guides/powerdns-acme-setup.md` |
| Building Webhook Endpoints | Developers | `docs/guides/webhook-development.md` |
---
## 13. Estimated Effort
### Summary by Phase
| Phase | Description | Hours | Hours (with 20% buffer) | Calendar |
|-------|-------------|-------|-------------------------|----------|
| 1 | Manual Plugin | 27 | 32 | 1 week |
| 2 | RFC 2136 Plugin | 23 | 28 | 1 week |
| 3 | Webhook Plugin | 25 | 30 | 1 week |
| **Total (Phases 1-3)** | **Core Features** | **75** | **90** | **3 weeks** |
| 4 | Script Plugin (Future) | 25 | 30 | 1 week |
| **Total (All Phases)** | **Including Future** | **100** | **120** | **4 weeks** |
**Note:** Phase 4 (Script Plugin) is conditional on community demand (>20 GitHub issues). See "Future Work" section.
### Effort by Role
| Role | Phase 1 | Phase 2 | Phase 3 | Phase 4* | Total |
|------|---------|---------|---------|----------|-------|
| Backend | 11h | 11h | 12h | 8h | 42h |
| Frontend | 8h | 5h | 5h | 3h | 21h |
| QA | 7h | 7h | 8h | 7h | 29h |
| Docs | 2h | 3h | 4h | 4h | 13h |
| DevOps | 0h | 2h | 0h | 0h | 2h |
| Security | 0h | 0h | 1h | 3h | 4h |
*Phase 4 effort is conditional
### MVP (Minimum Viable Product)
**MVP = Phase 1 (Manual Plugin)**
- Time: 32 hours / 1 week (with buffer)
- Unblocks: All users with unsupported DNS providers
- Risk: Low
---
## 14. Decisions and Open Questions
### Decisions Made
1. **Caddy Module Strategy for RFC 2136**
**DECIDED: Option B — RFC 2136 module will be included in Charon's Caddy build.**
Rationale: Best user experience. Users should not need to rebuild Caddy themselves. The Dockerfile will be updated in Phase 2 to use xcaddy with the `github.com/caddy-dns/rfc2136` module.
### Must Decide Before Implementation
2. **Script Plugin Security Model**
- Should scripts run in a separate container/sandbox?
- What environment variables should be available?
- Should we allow network access from scripts?
- **Recommendation:** No network by default, minimal env, document risks
3. **Manual Challenge Persistence**
- Store challenge details in database or session?
- How long to retain completed challenges?
- **Recommendation:** Database with 24-hour TTL cleanup (see Section 6.4)
4. **Webhook Retry Strategy**
- Exponential backoff vs. fixed interval?
- Max retries before failure?
- **Recommendation:** Exponential backoff (1s, 2s, 4s), max 3 retries
### Nice to Decide
5. **UI Location for Custom Plugins**
- Same page as built-in providers?
- Separate "Custom Integrations" section?
- **Recommendation:** Same page, grouped by category
6. **Telemetry for Custom Plugins**
- Should we track usage of custom plugin types?
- Privacy considerations?
- **Recommendation:** Opt-in anonymous usage stats
7. **Plugin Marketplace (Future)**
- Community-contributed webhook templates?
- Pre-configured RFC 2136 profiles?
- **Recommendation:** Defer to Phase 5+
---
## 15. Appendix
### A. Related Documents
- [Phase 5 Custom Plugins Spec](phase5_custom_plugins_spec.md) - Go plugin architecture (external .so files)
- [DNS Challenge Backend Research](dns_challenge_backend_research.md) - Original DNS-01 implementation notes
- [DNS Challenge Future Features](dns_challenge_future_features.md) - Roadmap context
### B. External References
- [RFC 2136: Dynamic Updates in DNS](https://datatracker.ietf.org/doc/html/rfc2136)
- [RFC 2845: TSIG Authentication](https://datatracker.ietf.org/doc/html/rfc2845)
- [Caddy DNS Challenge Docs](https://caddyserver.com/docs/automatic-https#dns-challenge)
- [Let's Encrypt DNS-01 Challenge](https://letsencrypt.org/docs/challenge-types/#dns-01-challenge)
### C. Example Webhook Payload
```json
{
"action": "create",
"fqdn": "_acme-challenge.example.com",
"domain": "example.com",
"subdomain": "_acme-challenge",
"value": "gZrH7wL9t3kM2nP4qX5yR8sT0uV1wZ2aB3cD4eF5gH6iJ7kL",
"ttl": 300,
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-01-08T15:30:00Z",
"charon_version": "1.2.0",
"certificate_domains": ["*.example.com", "example.com"]
}
```
### D. Example BIND9 TSIG Configuration
```zone
// /etc/bind/named.conf.local
key "acme-update-key" {
algorithm hmac-sha256;
secret "base64-encoded-secret-here==";
};
zone "example.com" {
type master;
file "/var/lib/bind/db.example.com";
update-policy {
grant acme-update-key name _acme-challenge.example.com. TXT;
};
};
```
---
## 16. Revision History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-01-08 | Planning Agent | Initial specification |
| 1.1 | 2026-01-08 | Planning Agent | Supervisor review: addressed 13 issues (see below) |
---
## 17. Supervisor Review Summary
This specification was revised to address all 13 issues identified during Supervisor review:
### Critical Issues (Fixed)
| # | Issue | Resolution |
|---|-------|------------|
| 1 | SSRF Duplication | Section 9.2 updated to reference existing `security.ValidateExternalURL()` in `backend/internal/security/url_validator.go` |
| 2 | Script Security Insufficient | Section 9.3 enhanced with rlimit enforcement, seccomp documentation, minimal PATH, and `timeout` command |
| 3 | Missing Caddy Integration Detail | Added Section 3.3.1-3.3.4 with sequence diagram, state machine, error handling, and communication protocol |
### High Severity Issues (Fixed)
| # | Issue | Resolution |
|---|-------|------------|
| 4 | RFC 2136 Caddy Module | Section 4.3 updated with DECISION; Phase 2 includes Dockerfile deliverable |
| 5 | WebSocket vs Polling | Section 4.4 updated; chose polling (10s interval) with rationale; polling endpoint added to API |
| 6 | Webhook Rate Limiting | Section 4.1 updated with rate limits (10/min) and circuit breaker (5 failures → 5 min disable) |
### Medium Severity Issues (Fixed)
| # | Issue | Resolution |
|---|-------|------------|
| 7 | Phase 4 Scope Creep | Phase 4 moved to "Future Work" section with explicit Go/No-Go gate (>20 GitHub issues) |
| 8 | Missing Error Codes | Section 7.3 added with comprehensive error code table |
| 9 | Time Estimates Buffer | Section 13 updated: Phase 1→32h, Phase 2→28h, Phase 3→30h (all +20%) |
| 10 | Open Question #1 | Section 14 changed to "Decisions and Open Questions"; Option B confirmed as DECIDED |
### Low Severity Issues (Fixed)
| # | Issue | Resolution |
|---|-------|------------|
| 11 | i18n Keys | Phase 1 deliverables updated with translation keys for `frontend/src/locales/*/translation.json` |
| 12 | E2E Test Scenarios | Section 11.2 expanded with Manual Plugin E2E scenarios table |
| 13 | Cleanup Mechanism | Section 6.4 added with cron-based cleanup using existing `robfig/cron/v3` pattern |
---
*This document has completed Supervisor review and is ready for technical review and stakeholder approval.*