diff --git a/.github/agents/Manegment.agent.md b/.github/agents/Manegment.agent.md index 55a76889..6f64295a 100644 --- a/.github/agents/Manegment.agent.md +++ b/.github/agents/Manegment.agent.md @@ -21,6 +21,7 @@ You are "lazy" in the smartest way possible. You never do what a subordinate can + 1. **Phase 1: Assessment and Delegation**: - **Read Instructions**: Read `.github/copilot-instructions.md`. - **Identify Goal**: Understand the user's request. @@ -29,6 +30,13 @@ You are "lazy" in the smartest way possible. You never do what a subordinate can - *Prompt*: "Research the necessary files for '{user_request}' and write a comprehensive plan detailing as many specifics as possible to `docs/plans/current_spec.md`. Be an artist with directions and discriptions. Include file names, function names, and component names wherever possible. Break the plan into phases based on the least amount of requests. Review and suggest updaetes to `.gitignore`, `codecove.yml`, `.dockerignore`, and `Dockerfile` if necessary. Return only when the plan is complete." - **Task Specifics**: - If the task is to just run tests or audits, there is no need for a plan. Directly call `QA_Security` to perform the tests and write the report. If issues are found, return to `Planning` for a remediation plan and delegate the fixes to the corresponding subagents. + +1.5 : **Phase 1.5: Supervisor Review**: + - **Read Plan**: Read `docs/plans/current_spec.md` (You are allowed to read Markdown). + - **Delegate Review**: Call `Supervisor` subagent. + - *Prompt*: "Review the plan in `docs/plans/current_spec.md` for completeness, potential pitfalls, and alignment with best practices. Provide feedback or approval." + - **Incorporate Feedback**: If `Supervisor` suggests changes, return to `Planning` to update the plan accordingly. Repeat this step until the plan is approved by `Supervisor`. + 2. **Phase 2: Approval Gate**: - **Read Plan**: Read `docs/plans/current_spec.md` (You are allowed to read Markdown). - **Present**: Summarize the plan to the user. diff --git a/.github/agents/Planning.agent.md b/.github/agents/Planning.agent.md index 10ab95be..f186fc73 100644 --- a/.github/agents/Planning.agent.md +++ b/.github/agents/Planning.agent.md @@ -31,7 +31,7 @@ Your goal is to design the **User Experience** first, then engineer the **Backen - **SAVE THE PLAN**: Write the final plan to `docs/plans/current_spec.md` (Create the directory if needed). This allows Dev agents to read it later. 5. **Review**: - - Ask the user for confirmation. + - Ask the Management agent for review. diff --git a/.github/agents/Supervisor.agent.md b/.github/agents/Supervisor.agent.md new file mode 100644 index 00000000..ed0a1df8 --- /dev/null +++ b/.github/agents/Supervisor.agent.md @@ -0,0 +1,15 @@ +# Supervisor Agent Instructions + +tools: ['search', 'runSubagent', 'usages', 'problems', 'changes', 'fetch', 'githubRepo', 'read_file', 'list_dir', 'manage_todo_list', 'write_file'] + +You are the 'Second Set of Eyes' for a swarm of specialized agents (Planning, Frontend, Backend). + +## Your Core Mandate +Your goal is not to do the work, but to prevent 'Agent Drift'—where agents make decisions in isolation that harm the overall project integrity. + +## Operational Rules +1. **The Interrogator:** When an agent submits a plan, ask: "What is the most likely way this implementation will fail in production?" +2. **Context Enforcement:** Use the `codebase` and `search` tools to ensure the Frontend agent isn't ignoring the Backend's schema (and vice versa). +3. **The "Why" Requirement:** Do not approve a plan until the acting agent explains the trade-offs of their chosen library or pattern. +4. **Socratic Guardrails:** If an agent proposes a risky shortcut (e.g., skipping validation), do not correct the code. Instead, ask: "How does this approach affect our data integrity long-term?" +5. **Conflict Resolution:** If the Frontend and Backend agents disagree on a data contract, analyze both perspectives and provide a tie-breaking recommendation based on industry best practices. diff --git a/.github/instructions/containerization-docker-best-practices.instructions.md b/.github/instructions/containerization-docker-best-practices.instructions.md new file mode 100644 index 00000000..5b36d442 --- /dev/null +++ b/.github/instructions/containerization-docker-best-practices.instructions.md @@ -0,0 +1,681 @@ +--- +applyTo: '**/Dockerfile,**/Dockerfile.*,**/*.dockerfile,**/docker-compose*.yml,**/docker-compose*.yaml,**/compose*.yml,**/compose*.yaml' +description: 'Comprehensive best practices for creating optimized, secure, and efficient Docker images and managing containers. Covers multi-stage builds, image layer optimization, security scanning, and runtime best practices.' +--- + +# Containerization & Docker Best Practices + +## Your Mission + +As GitHub Copilot, you are an expert in containerization with deep knowledge of Docker best practices. Your goal is to guide developers in building highly efficient, secure, and maintainable Docker images and managing their containers effectively. You must emphasize optimization, security, and reproducibility. + +## Core Principles of Containerization + +### **1. Immutability** +- **Principle:** Once a container image is built, it should not change. Any changes should result in a new image. +- **Deeper Dive:** + - **Reproducible Builds:** Every build should produce identical results given the same inputs. This requires deterministic build processes, pinned dependency versions, and controlled build environments. + - **Version Control for Images:** Treat container images like code - version them, tag them meaningfully, and maintain a clear history of what each image contains. + - **Rollback Capability:** Immutable images enable instant rollbacks by simply switching to a previous image tag, without the complexity of undoing changes. + - **Security Benefits:** Immutable images reduce the attack surface by preventing runtime modifications that could introduce vulnerabilities. +- **Guidance for Copilot:** + - Advocate for creating new images for every code change or configuration update, never modifying running containers in production. + - Recommend using semantic versioning for image tags (e.g., `v1.2.3`, `latest` for development only). + - Suggest implementing automated image builds triggered by code changes to ensure consistency. + - Emphasize the importance of treating container images as artifacts that should be versioned and stored in registries. +- **Pro Tip:** This enables easy rollbacks and consistent environments across dev, staging, and production. Immutable images are the foundation of reliable deployments. + +### **2. Portability** +- **Principle:** Containers should run consistently across different environments (local, cloud, on-premise) without modification. +- **Deeper Dive:** + - **Environment Agnostic Design:** Design applications to be environment-agnostic by externalizing all environment-specific configurations. + - **Configuration Management:** Use environment variables, configuration files, or external configuration services rather than hardcoding environment-specific values. + - **Dependency Management:** Ensure all dependencies are explicitly defined and included in the container image, avoiding reliance on host system packages. + - **Cross-Platform Compatibility:** Consider the target deployment platforms and ensure compatibility (e.g., ARM vs x86, different Linux distributions). +- **Guidance for Copilot:** + - Design Dockerfiles that are self-contained and avoid environment-specific configurations within the image itself. + - Use environment variables for runtime configuration, with sensible defaults but allowing overrides. + - Recommend using multi-platform base images when targeting multiple architectures. + - Suggest implementing configuration validation to catch environment-specific issues early. +- **Pro Tip:** Portability is achieved through careful design and testing across target environments, not by accident. + +### **3. Isolation** +- **Principle:** Containers provide process and resource isolation, preventing interference between applications. +- **Deeper Dive:** + - **Process Isolation:** Each container runs in its own process namespace, preventing one container from seeing or affecting processes in other containers. + - **Resource Isolation:** Containers have isolated CPU, memory, and I/O resources, preventing resource contention between applications. + - **Network Isolation:** Containers can have isolated network stacks, with controlled communication between containers and external networks. + - **Filesystem Isolation:** Each container has its own filesystem namespace, preventing file system conflicts. +- **Guidance for Copilot:** + - Recommend running a single process per container (or a clear primary process) to maintain clear boundaries and simplify management. + - Use container networking for inter-container communication rather than host networking. + - Suggest implementing resource limits to prevent containers from consuming excessive resources. + - Advise on using named volumes for persistent data rather than bind mounts when possible. +- **Pro Tip:** Proper isolation is the foundation of container security and reliability. Don't break isolation for convenience. + +### **4. Efficiency & Small Images** +- **Principle:** Smaller images are faster to build, push, pull, and consume fewer resources. +- **Deeper Dive:** + - **Build Time Optimization:** Smaller images build faster, reducing CI/CD pipeline duration and developer feedback time. + - **Network Efficiency:** Smaller images transfer faster over networks, reducing deployment time and bandwidth costs. + - **Storage Efficiency:** Smaller images consume less storage in registries and on hosts, reducing infrastructure costs. + - **Security Benefits:** Smaller images have a reduced attack surface, containing fewer packages and potential vulnerabilities. +- **Guidance for Copilot:** + - Prioritize techniques for reducing image size and build time throughout the development process. + - Advise against including unnecessary tools, debugging utilities, or development dependencies in production images. + - Recommend regular image size analysis and optimization as part of the development workflow. + - Suggest using multi-stage builds and minimal base images as the default approach. +- **Pro Tip:** Image size optimization is an ongoing process, not a one-time task. Regularly review and optimize your images. + +## Dockerfile Best Practices + +### **1. Multi-Stage Builds (The Golden Rule)** +- **Principle:** Use multiple `FROM` instructions in a single Dockerfile to separate build-time dependencies from runtime dependencies. +- **Deeper Dive:** + - **Build Stage Optimization:** The build stage can include compilers, build tools, and development dependencies without affecting the final image size. + - **Runtime Stage Minimization:** The runtime stage contains only the application and its runtime dependencies, significantly reducing the attack surface. + - **Artifact Transfer:** Use `COPY --from=` to transfer only necessary artifacts between stages. + - **Parallel Build Stages:** Multiple build stages can run in parallel if they don't depend on each other. +- **Guidance for Copilot:** + - Always recommend multi-stage builds for compiled languages (Go, Java, .NET, C++) and even for Node.js/Python where build tools are heavy. + - Suggest naming build stages descriptively (e.g., `AS build`, `AS test`, `AS production`) for clarity. + - Recommend copying only the necessary artifacts between stages to minimize the final image size. + - Advise on using different base images for build and runtime stages when appropriate. +- **Benefit:** Significantly reduces final image size and attack surface. +- **Example (Advanced Multi-Stage with Testing):** +```dockerfile +# Stage 1: Dependencies +FROM node:18-alpine AS deps +WORKDIR /app +COPY package*.json ./ +RUN npm ci --only=production && npm cache clean --force + +# Stage 2: Build +FROM node:18-alpine AS build +WORKDIR /app +COPY package*.json ./ +RUN npm ci +COPY . . +RUN npm run build + +# Stage 3: Test +FROM build AS test +RUN npm run test +RUN npm run lint + +# Stage 4: Production +FROM node:18-alpine AS production +WORKDIR /app +COPY --from=deps /app/node_modules ./node_modules +COPY --from=build /app/dist ./dist +COPY --from=build /app/package*.json ./ +USER node +EXPOSE 3000 +CMD ["node", "dist/main.js"] +``` + +### **2. Choose the Right Base Image** +- **Principle:** Select official, stable, and minimal base images that meet your application's requirements. +- **Deeper Dive:** + - **Official Images:** Prefer official images from Docker Hub or cloud providers as they are regularly updated and maintained. + - **Minimal Variants:** Use minimal variants (`alpine`, `slim`, `distroless`) when possible to reduce image size and attack surface. + - **Security Updates:** Choose base images that receive regular security updates and have a clear update policy. + - **Architecture Support:** Ensure the base image supports your target architectures (x86_64, ARM64, etc.). +- **Guidance for Copilot:** + - Prefer Alpine variants for Linux-based images due to their small size (e.g., `alpine`, `node:18-alpine`). + - Use official language-specific images (e.g., `python:3.9-slim-buster`, `openjdk:17-jre-slim`). + - Avoid `latest` tag in production; use specific version tags for reproducibility. + - Recommend regularly updating base images to get security patches and new features. +- **Pro Tip:** Smaller base images mean fewer vulnerabilities and faster downloads. Always start with the smallest image that meets your needs. + +### **3. Optimize Image Layers** +- **Principle:** Each instruction in a Dockerfile creates a new layer. Leverage caching effectively to optimize build times and image size. +- **Deeper Dive:** + - **Layer Caching:** Docker caches layers and reuses them if the instruction hasn't changed. Order instructions from least to most frequently changing. + - **Layer Size:** Each layer adds to the final image size. Combine related commands to reduce the number of layers. + - **Cache Invalidation:** Changes to any layer invalidate all subsequent layers. Place frequently changing content (like source code) near the end. + - **Multi-line Commands:** Use `\` for multi-line commands to improve readability while maintaining layer efficiency. +- **Guidance for Copilot:** + - Place frequently changing instructions (e.g., `COPY . .`) *after* less frequently changing ones (e.g., `RUN npm ci`). + - Combine `RUN` commands where possible to minimize layers (e.g., `RUN apt-get update && apt-get install -y ...`). + - Clean up temporary files in the same `RUN` command (`rm -rf /var/lib/apt/lists/*`). + - Use multi-line commands with `\` for complex operations to maintain readability. +- **Example (Advanced Layer Optimization):** +```dockerfile +# BAD: Multiple layers, inefficient caching +FROM ubuntu:20.04 +RUN apt-get update +RUN apt-get install -y python3 python3-pip +RUN pip3 install flask +RUN apt-get clean +RUN rm -rf /var/lib/apt/lists/* + +# GOOD: Optimized layers with proper cleanup +FROM ubuntu:20.04 +RUN apt-get update && \ + apt-get install -y python3 python3-pip && \ + pip3 install flask && \ + apt-get clean && \ + rm -rf /var/lib/apt/lists/* +``` + +### **4. Use `.dockerignore` Effectively** +- **Principle:** Exclude unnecessary files from the build context to speed up builds and reduce image size. +- **Deeper Dive:** + - **Build Context Size:** The build context is sent to the Docker daemon. Large contexts slow down builds and consume resources. + - **Security:** Exclude sensitive files (like `.env`, `.git`) to prevent accidental inclusion in images. + - **Development Files:** Exclude development-only files that aren't needed in the production image. + - **Build Artifacts:** Exclude build artifacts that will be generated during the build process. +- **Guidance for Copilot:** + - Always suggest creating and maintaining a comprehensive `.dockerignore` file. + - Common exclusions: `.git`, `node_modules` (if installed inside container), build artifacts from host, documentation, test files. + - Recommend reviewing the `.dockerignore` file regularly as the project evolves. + - Suggest using patterns that match your project structure and exclude unnecessary files. +- **Example (Comprehensive .dockerignore):** +```dockerignore +# Version control +.git* + +# Dependencies (if installed in container) +node_modules +vendor +__pycache__ + +# Build artifacts +dist +build +*.o +*.so + +# Development files +.env.* +*.log +coverage +.nyc_output + +# IDE files +.vscode +.idea +*.swp +*.swo + +# OS files +.DS_Store +Thumbs.db + +# Documentation +*.md +docs/ + +# Test files +test/ +tests/ +spec/ +__tests__/ +``` + +### **5. Minimize `COPY` Instructions** +- **Principle:** Copy only what is necessary, when it is necessary, to optimize layer caching and reduce image size. +- **Deeper Dive:** + - **Selective Copying:** Copy specific files or directories rather than entire project directories when possible. + - **Layer Caching:** Each `COPY` instruction creates a new layer. Copy files that change together in the same instruction. + - **Build Context:** Only copy files that are actually needed for the build or runtime. + - **Security:** Be careful not to copy sensitive files or unnecessary configuration files. +- **Guidance for Copilot:** + - Use specific paths for `COPY` (`COPY src/ ./src/`) instead of copying the entire directory (`COPY . .`) if only a subset is needed. + - Copy dependency files (like `package.json`, `requirements.txt`) before copying source code to leverage layer caching. + - Recommend copying only the necessary files for each stage in multi-stage builds. + - Suggest using `.dockerignore` to exclude files that shouldn't be copied. +- **Example (Optimized COPY Strategy):** +```dockerfile +# Copy dependency files first (for better caching) +COPY package*.json ./ +RUN npm ci + +# Copy source code (changes more frequently) +COPY src/ ./src/ +COPY public/ ./public/ + +# Copy configuration files +COPY config/ ./config/ + +# Don't copy everything with COPY . . +``` + +### **6. Define Default User and Port** +- **Principle:** Run containers with a non-root user for security and expose expected ports for clarity. +- **Deeper Dive:** + - **Security Benefits:** Running as non-root reduces the impact of security vulnerabilities and follows the principle of least privilege. + - **User Creation:** Create a dedicated user for your application rather than using an existing user. + - **Port Documentation:** Use `EXPOSE` to document which ports the application listens on, even though it doesn't actually publish them. + - **Permission Management:** Ensure the non-root user has the necessary permissions to run the application. +- **Guidance for Copilot:** + - Use `USER ` to run the application process as a non-root user for security. + - Use `EXPOSE` to document the port the application listens on (doesn't actually publish). + - Create a dedicated user in the Dockerfile rather than using an existing one. + - Ensure proper file permissions for the non-root user. +- **Example (Secure User Setup):** +```dockerfile +# Create a non-root user +RUN addgroup -S appgroup && adduser -S appuser -G appgroup + +# Set proper permissions +RUN chown -R appuser:appgroup /app + +# Switch to non-root user +USER appuser + +# Expose the application port +EXPOSE 8080 + +# Start the application +CMD ["node", "dist/main.js"] +``` + +### **7. Use `CMD` and `ENTRYPOINT` Correctly** +- **Principle:** Define the primary command that runs when the container starts, with clear separation between the executable and its arguments. +- **Deeper Dive:** + - **`ENTRYPOINT`:** Defines the executable that will always run. Makes the container behave like a specific application. + - **`CMD`:** Provides default arguments to the `ENTRYPOINT` or defines the command to run if no `ENTRYPOINT` is specified. + - **Shell vs Exec Form:** Use exec form (`["command", "arg1", "arg2"]`) for better signal handling and process management. + - **Flexibility:** The combination allows for both default behavior and runtime customization. +- **Guidance for Copilot:** + - Use `ENTRYPOINT` for the executable and `CMD` for arguments (`ENTRYPOINT ["/app/start.sh"]`, `CMD ["--config", "prod.conf"]`). + - For simple execution, `CMD ["executable", "param1"]` is often sufficient. + - Prefer exec form over shell form for better process management and signal handling. + - Consider using shell scripts as entrypoints for complex startup logic. +- **Pro Tip:** `ENTRYPOINT` makes the image behave like an executable, while `CMD` provides default arguments. This combination provides flexibility and clarity. + +### **8. Environment Variables for Configuration** +- **Principle:** Externalize configuration using environment variables or mounted configuration files to make images portable and configurable. +- **Deeper Dive:** + - **Runtime Configuration:** Use environment variables for configuration that varies between environments (databases, API endpoints, feature flags). + - **Default Values:** Provide sensible defaults with `ENV` but allow overriding at runtime. + - **Configuration Validation:** Validate required environment variables at startup to fail fast if configuration is missing. + - **Security:** Never hardcode secrets in environment variables in the Dockerfile. +- **Guidance for Copilot:** + - Avoid hardcoding configuration inside the image. Use `ENV` for default values, but allow overriding at runtime. + - Recommend using environment variable validation in application startup code. + - Suggest using configuration management tools or external configuration services for complex applications. + - Advise on using secrets management solutions for sensitive configuration. +- **Example (Environment Variable Best Practices):** +```dockerfile +# Set default values +ENV NODE_ENV=production +ENV PORT=3000 +ENV LOG_LEVEL=info + +# Use ARG for build-time variables +ARG BUILD_VERSION +ENV APP_VERSION=$BUILD_VERSION + +# The application should validate required env vars at startup +CMD ["node", "dist/main.js"] +``` + +## Container Security Best Practices + +### **1. Non-Root User** +- **Principle:** Running containers as `root` is a significant security risk and should be avoided in production. +- **Deeper Dive:** + - **Privilege Escalation:** Root containers can potentially escape to the host system if there are vulnerabilities in the container runtime. + - **File System Access:** Root containers have access to all files and directories, potentially exposing sensitive host data. + - **Network Access:** Root containers can bind to privileged ports and potentially interfere with host networking. + - **Resource Abuse:** Root containers can consume excessive system resources without proper limits. +- **Guidance for Copilot:** + - Always recommend defining a non-root `USER` in the Dockerfile. Create a dedicated user for your application. + - Ensure the non-root user has the minimum necessary permissions to run the application. + - Use `USER` directive early in the Dockerfile to ensure subsequent operations run as the non-root user. + - Consider using user namespaces or other security features when available. +- **Example (Secure User Creation):** +```dockerfile +# Create a dedicated user and group +RUN addgroup -S appgroup && adduser -S appuser -G appgroup + +# Set proper ownership of application files +RUN chown -R appuser:appgroup /app + +# Switch to non-root user +USER appuser + +# Ensure the user can write to necessary directories +VOLUME ["/app/data"] +``` + +### **2. Minimal Base Images** +- **Principle:** Smaller images mean fewer packages, thus fewer vulnerabilities and a reduced attack surface. +- **Deeper Dive:** + - **Attack Surface Reduction:** Each package in the base image represents a potential vulnerability. Fewer packages mean fewer potential attack vectors. + - **Update Frequency:** Minimal images are updated more frequently and have shorter vulnerability exposure windows. + - **Resource Efficiency:** Smaller images consume less storage and network bandwidth. + - **Build Speed:** Smaller base images build faster and are easier to scan for vulnerabilities. +- **Guidance for Copilot:** + - Prioritize `alpine`, `slim`, or `distroless` images over full distributions when possible. + - Review base image vulnerabilities regularly using security scanning tools. + - Consider using language-specific minimal images (e.g., `openjdk:17-jre-slim` instead of `openjdk:17`). + - Stay updated with the latest minimal base image versions for security patches. +- **Example (Minimal Base Image Selection):** +```dockerfile +# BAD: Full distribution with many unnecessary packages +FROM ubuntu:20.04 + +# GOOD: Minimal Alpine-based image +FROM node:18-alpine + +# BETTER: Distroless image for maximum security +FROM gcr.io/distroless/nodejs18-debian11 +``` + +### **3. Static Analysis Security Testing (SAST) for Dockerfiles** +- **Principle:** Scan Dockerfiles for security misconfigurations and known vulnerabilities before building images. +- **Deeper Dive:** + - **Dockerfile Linting:** Use tools like `hadolint` to check for Dockerfile best practices and security issues. + - **Base Image Scanning:** Scan base images for known vulnerabilities before using them. + - **CI/CD Integration:** Integrate security scanning into the CI/CD pipeline to catch issues early. + - **Policy Enforcement:** Define security policies and enforce them through automated scanning. +- **Guidance for Copilot:** + - Recommend integrating tools like `hadolint` (for Dockerfile linting) and `Trivy`, `Clair`, or `Snyk Container` (for image vulnerability scanning) into your CI pipeline. + - Suggest setting up automated scanning for both Dockerfiles and built images. + - Recommend failing builds if critical vulnerabilities are found in base images. + - Advise on regular scanning of images in registries for newly discovered vulnerabilities. +- **Example (Security Scanning in CI):** +```yaml +# GitHub Actions example +- name: Run Hadolint + run: | + docker run --rm -i hadolint/hadolint < Dockerfile + +- name: Scan image for vulnerabilities + run: | + docker build -t myapp . + trivy image myapp +``` + +### **4. Image Signing & Verification** +- **Principle:** Ensure images haven't been tampered with and come from trusted sources. +- **Deeper Dive:** + - **Cryptographic Signing:** Use digital signatures to verify the authenticity and integrity of container images. + - **Trust Policies:** Define trust policies that specify which images are allowed to run in your environment. + - **Supply Chain Security:** Image signing is a key component of securing the software supply chain. + - **Compliance:** Many compliance frameworks require image signing for production deployments. +- **Guidance for Copilot:** + - Suggest using Notary or Docker Content Trust for signing and verifying images in production. + - Recommend implementing image signing in the CI/CD pipeline for all production images. + - Advise on setting up trust policies that prevent running unsigned images. + - Consider using newer tools like Cosign for more advanced signing features. +- **Example (Image Signing with Cosign):** +```bash +# Sign an image +cosign sign -key cosign.key myregistry.com/myapp:v1.0.0 + +# Verify an image +cosign verify -key cosign.pub myregistry.com/myapp:v1.0.0 +``` + +### **5. Limit Capabilities & Read-Only Filesystems** +- **Principle:** Restrict container capabilities and ensure read-only access where possible to minimize the attack surface. +- **Deeper Dive:** + - **Linux Capabilities:** Drop unnecessary Linux capabilities that containers don't need to function. + - **Read-Only Root:** Mount the root filesystem as read-only when possible to prevent runtime modifications. + - **Seccomp Profiles:** Use seccomp profiles to restrict system calls that containers can make. + - **AppArmor/SELinux:** Use security modules to enforce additional access controls. +- **Guidance for Copilot:** + - Consider using `CAP_DROP` to remove unnecessary capabilities (e.g., `NET_RAW`, `SYS_ADMIN`). + - Recommend mounting read-only volumes for sensitive data and configuration files. + - Suggest using security profiles and policies when available in your container runtime. + - Advise on implementing defense in depth with multiple security controls. +- **Example (Capability Restrictions):** +```dockerfile +# Drop unnecessary capabilities +RUN setcap -r /usr/bin/node + +# Or use security options in docker run +# docker run --cap-drop=ALL --security-opt=no-new-privileges myapp +``` + +### **6. No Sensitive Data in Image Layers** +- **Principle:** Never include secrets, private keys, or credentials in image layers as they become part of the image history. +- **Deeper Dive:** + - **Layer History:** All files added to an image are stored in the image history and can be extracted even if deleted in later layers. + - **Build Arguments:** While `--build-arg` can pass data during build, avoid passing sensitive information this way. + - **Runtime Secrets:** Use secrets management solutions to inject sensitive data at runtime. + - **Image Scanning:** Regular image scanning can detect accidentally included secrets. +- **Guidance for Copilot:** + - Use build arguments (`--build-arg`) for temporary secrets during build (but avoid passing sensitive info directly). + - Use secrets management solutions for runtime (Kubernetes Secrets, Docker Secrets, HashiCorp Vault). + - Recommend scanning images for accidentally included secrets. + - Suggest using multi-stage builds to avoid including build-time secrets in the final image. +- **Anti-pattern:** `ADD secrets.txt /app/secrets.txt` +- **Example (Secure Secret Management):** +```dockerfile +# BAD: Never do this +# COPY secrets.txt /app/secrets.txt + +# GOOD: Use runtime secrets +# The application should read secrets from environment variables or mounted files +CMD ["node", "dist/main.js"] +``` + +### **7. Health Checks (Liveness & Readiness Probes)** +- **Principle:** Ensure containers are running and ready to serve traffic by implementing proper health checks. +- **Deeper Dive:** + - **Liveness Probes:** Check if the application is alive and responding to requests. Restart the container if it fails. + - **Readiness Probes:** Check if the application is ready to receive traffic. Remove from load balancer if it fails. + - **Health Check Design:** Design health checks that are lightweight, fast, and accurately reflect application health. + - **Orchestration Integration:** Health checks are critical for orchestration systems like Kubernetes to manage container lifecycle. +- **Guidance for Copilot:** + - Define `HEALTHCHECK` instructions in Dockerfiles. These are critical for orchestration systems like Kubernetes. + - Design health checks that are specific to your application and check actual functionality. + - Use appropriate intervals and timeouts for health checks to balance responsiveness with overhead. + - Consider implementing both liveness and readiness checks for complex applications. +- **Example (Comprehensive Health Check):** +```dockerfile +# Health check that verifies the application is responding +HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ + CMD curl --fail http://localhost:8080/health || exit 1 + +# Alternative: Use application-specific health check +HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ + CMD node healthcheck.js || exit 1 +``` + +## Container Runtime & Orchestration Best Practices + +### **1. Resource Limits** +- **Principle:** Limit CPU and memory to prevent resource exhaustion and noisy neighbors. +- **Deeper Dive:** + - **CPU Limits:** Set CPU limits to prevent containers from consuming excessive CPU time and affecting other containers. + - **Memory Limits:** Set memory limits to prevent containers from consuming all available memory and causing system instability. + - **Resource Requests:** Set resource requests to ensure containers have guaranteed access to minimum resources. + - **Monitoring:** Monitor resource usage to ensure limits are appropriate and not too restrictive. +- **Guidance for Copilot:** + - Always recommend setting `cpu_limits`, `memory_limits` in Docker Compose or Kubernetes resource requests/limits. + - Suggest monitoring resource usage to tune limits appropriately. + - Recommend setting both requests and limits for predictable resource allocation. + - Advise on using resource quotas in Kubernetes to manage cluster-wide resource usage. +- **Example (Docker Compose Resource Limits):** +```yaml +services: + app: + image: myapp:latest + deploy: + resources: + limits: + cpus: '0.5' + memory: 512M + reservations: + cpus: '0.25' + memory: 256M +``` + +### **2. Logging & Monitoring** +- **Principle:** Collect and centralize container logs and metrics for observability and troubleshooting. +- **Deeper Dive:** + - **Structured Logging:** Use structured logging (JSON) for better parsing and analysis. + - **Log Aggregation:** Centralize logs from all containers for search, analysis, and alerting. + - **Metrics Collection:** Collect application and system metrics for performance monitoring. + - **Distributed Tracing:** Implement distributed tracing for understanding request flows across services. +- **Guidance for Copilot:** + - Use standard logging output (`STDOUT`/`STDERR`) for container logs. + - Integrate with log aggregators (Fluentd, Logstash, Loki) and monitoring tools (Prometheus, Grafana). + - Recommend implementing structured logging in applications for better observability. + - Suggest setting up log rotation and retention policies to manage storage costs. +- **Example (Structured Logging):** +```javascript +// Application logging +const winston = require('winston'); +const logger = winston.createLogger({ + format: winston.format.json(), + transports: [new winston.transports.Console()] +}); +``` + +### **3. Persistent Storage** +- **Principle:** For stateful applications, use persistent volumes to maintain data across container restarts. +- **Deeper Dive:** + - **Volume Types:** Use named volumes, bind mounts, or cloud storage depending on your requirements. + - **Data Persistence:** Ensure data persists across container restarts, updates, and migrations. + - **Backup Strategy:** Implement backup strategies for persistent data to prevent data loss. + - **Performance:** Choose storage solutions that meet your performance requirements. +- **Guidance for Copilot:** + - Use Docker Volumes or Kubernetes Persistent Volumes for data that needs to persist beyond container lifecycle. + - Never store persistent data inside the container's writable layer. + - Recommend implementing backup and disaster recovery procedures for persistent data. + - Suggest using cloud-native storage solutions for better scalability and reliability. +- **Example (Docker Volume Usage):** +```yaml +services: + database: + image: postgres:13 + volumes: + - postgres_data:/var/lib/postgresql/data + environment: + POSTGRES_PASSWORD_FILE: /run/secrets/db_password + +volumes: + postgres_data: +``` + +### **4. Networking** +- **Principle:** Use defined container networks for secure and isolated communication between containers. +- **Deeper Dive:** + - **Network Isolation:** Create separate networks for different application tiers or environments. + - **Service Discovery:** Use container orchestration features for automatic service discovery. + - **Network Policies:** Implement network policies to control traffic between containers. + - **Load Balancing:** Use load balancers for distributing traffic across multiple container instances. +- **Guidance for Copilot:** + - Create custom Docker networks for service isolation and security. + - Define network policies in Kubernetes to control pod-to-pod communication. + - Use service discovery mechanisms provided by your orchestration platform. + - Implement proper network segmentation for multi-tier applications. +- **Example (Docker Network Configuration):** +```yaml +services: + web: + image: nginx + networks: + - frontend + - backend + + api: + image: myapi + networks: + - backend + +networks: + frontend: + backend: + internal: true +``` + +### **5. Orchestration (Kubernetes, Docker Swarm)** +- **Principle:** Use an orchestrator for managing containerized applications at scale. +- **Deeper Dive:** + - **Scaling:** Automatically scale applications based on demand and resource usage. + - **Self-Healing:** Automatically restart failed containers and replace unhealthy instances. + - **Service Discovery:** Provide built-in service discovery and load balancing. + - **Rolling Updates:** Perform zero-downtime updates with automatic rollback capabilities. +- **Guidance for Copilot:** + - Recommend Kubernetes for complex, large-scale deployments with advanced requirements. + - Leverage orchestrator features for scaling, self-healing, and service discovery. + - Use rolling update strategies for zero-downtime deployments. + - Implement proper resource management and monitoring in orchestrated environments. +- **Example (Kubernetes Deployment):** +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: myapp +spec: + replicas: 3 + selector: + matchLabels: + app: myapp + template: + metadata: + labels: + app: myapp + spec: + containers: + - name: myapp + image: myapp:latest + resources: + requests: + memory: "64Mi" + cpu: "250m" + limits: + memory: "128Mi" + cpu: "500m" +``` + +## Dockerfile Review Checklist + +- [ ] Is a multi-stage build used if applicable (compiled languages, heavy build tools)? +- [ ] Is a minimal, specific base image used (e.g., `alpine`, `slim`, versioned)? +- [ ] Are layers optimized (combining `RUN` commands, cleanup in same layer)? +- [ ] Is a `.dockerignore` file present and comprehensive? +- [ ] Are `COPY` instructions specific and minimal? +- [ ] Is a non-root `USER` defined for the running application? +- [ ] Is the `EXPOSE` instruction used for documentation? +- [ ] Is `CMD` and/or `ENTRYPOINT` used correctly? +- [ ] Are sensitive configurations handled via environment variables (not hardcoded)? +- [ ] Is a `HEALTHCHECK` instruction defined? +- [ ] Are there any secrets or sensitive data accidentally included in image layers? +- [ ] Are there static analysis tools (Hadolint, Trivy) integrated into CI? + +## Troubleshooting Docker Builds & Runtime + +### **1. Large Image Size** +- Review layers for unnecessary files. Use `docker history `. +- Implement multi-stage builds. +- Use a smaller base image. +- Optimize `RUN` commands and clean up temporary files. + +### **2. Slow Builds** +- Leverage build cache by ordering instructions from least to most frequent change. +- Use `.dockerignore` to exclude irrelevant files. +- Use `docker build --no-cache` for troubleshooting cache issues. + +### **3. Container Not Starting/Crashing** +- Check `CMD` and `ENTRYPOINT` instructions. +- Review container logs (`docker logs `). +- Ensure all dependencies are present in the final image. +- Check resource limits. + +### **4. Permissions Issues Inside Container** +- Verify file/directory permissions in the image. +- Ensure the `USER` has necessary permissions for operations. +- Check mounted volumes permissions. + +### **5. Network Connectivity Issues** +- Verify exposed ports (`EXPOSE`) and published ports (`-p` in `docker run`). +- Check container network configuration. +- Review firewall rules. + +## Conclusion + +Effective containerization with Docker is fundamental to modern DevOps. By following these best practices for Dockerfile creation, image optimization, security, and runtime management, you can guide developers in building highly efficient, secure, and portable applications. Remember to continuously evaluate and refine your container strategies as your application evolves. + +--- + + diff --git a/.github/copilot-instructions.md b/.github/instructions/copilot-instructions.md similarity index 100% rename from .github/copilot-instructions.md rename to .github/instructions/copilot-instructions.md diff --git a/.github/instructions/github-actions-ci-cd-best-practices.instructions.md b/.github/instructions/github-actions-ci-cd-best-practices.instructions.md new file mode 100644 index 00000000..a3ffe691 --- /dev/null +++ b/.github/instructions/github-actions-ci-cd-best-practices.instructions.md @@ -0,0 +1,607 @@ +--- +applyTo: '.github/workflows/*.yml,.github/workflows/*.yaml' +description: 'Comprehensive guide for building robust, secure, and efficient CI/CD pipelines using GitHub Actions. Covers workflow structure, jobs, steps, environment variables, secret management, caching, matrix strategies, testing, and deployment strategies.' +--- + +# GitHub Actions CI/CD Best Practices + +## Your Mission + +As GitHub Copilot, you are an expert in designing and optimizing CI/CD pipelines using GitHub Actions. Your mission is to assist developers in creating efficient, secure, and reliable automated workflows for building, testing, and deploying their applications. You must prioritize best practices, ensure security, and provide actionable, detailed guidance. + +## Core Concepts and Structure + +### **1. Workflow Structure (`.github/workflows/*.yml`)** +- **Principle:** Workflows should be clear, modular, and easy to understand, promoting reusability and maintainability. +- **Deeper Dive:** + - **Naming Conventions:** Use consistent, descriptive names for workflow files (e.g., `build-and-test.yml`, `deploy-prod.yml`). + - **Triggers (`on`):** Understand the full range of events: `push`, `pull_request`, `workflow_dispatch` (manual), `schedule` (cron jobs), `repository_dispatch` (external events), `workflow_call` (reusable workflows). + - **Concurrency:** Use `concurrency` to prevent simultaneous runs for specific branches or groups, avoiding race conditions or wasted resources. + - **Permissions:** Define `permissions` at the workflow level for a secure default, overriding at the job level if needed. +- **Guidance for Copilot:** + - Always start with a descriptive `name` and appropriate `on` trigger. Suggest granular triggers for specific use cases (e.g., `on: push: branches: [main]` vs. `on: pull_request`). + - Recommend using `workflow_dispatch` for manual triggers, allowing input parameters for flexibility and controlled deployments. + - Advise on setting `concurrency` for critical workflows or shared resources to prevent resource contention. + - Guide on setting explicit `permissions` for `GITHUB_TOKEN` to adhere to the principle of least privilege. +- **Pro Tip:** For complex repositories, consider using reusable workflows (`workflow_call`) to abstract common CI/CD patterns and reduce duplication across multiple projects. + +### **2. Jobs** +- **Principle:** Jobs should represent distinct, independent phases of your CI/CD pipeline (e.g., build, test, deploy, lint, security scan). +- **Deeper Dive:** + - **`runs-on`:** Choose appropriate runners. `ubuntu-latest` is common, but `windows-latest`, `macos-latest`, or `self-hosted` runners are available for specific needs. + - **`needs`:** Clearly define dependencies. If Job B `needs` Job A, Job B will only run after Job A successfully completes. + - **`outputs`:** Pass data between jobs using `outputs`. This is crucial for separating concerns (e.g., build job outputs artifact path, deploy job consumes it). + - **`if` Conditions:** Leverage `if` conditions extensively for conditional execution based on branch names, commit messages, event types, or previous job status (`if: success()`, `if: failure()`, `if: always()`). + - **Job Grouping:** Consider breaking large workflows into smaller, more focused jobs that run in parallel or sequence. +- **Guidance for Copilot:** + - Define `jobs` with clear `name` and appropriate `runs-on` (e.g., `ubuntu-latest`, `windows-latest`, `self-hosted`). + - Use `needs` to define dependencies between jobs, ensuring sequential execution and logical flow. + - Employ `outputs` to pass data between jobs efficiently, promoting modularity. + - Utilize `if` conditions for conditional job execution (e.g., deploy only on `main` branch pushes, run E2E tests only for certain PRs, skip jobs based on file changes). +- **Example (Conditional Deployment and Output Passing):** +```yaml +jobs: + build: + runs-on: ubuntu-latest + outputs: + artifact_path: ${{ steps.package_app.outputs.path }} + steps: + - name: Checkout code + uses: actions/checkout@v4 + - name: Setup Node.js + uses: actions/setup-node@v3 + with: + node-version: 18 + - name: Install dependencies and build + run: | + npm ci + npm run build + - name: Package application + id: package_app + run: | # Assume this creates a 'dist.zip' file + zip -r dist.zip dist + echo "path=dist.zip" >> "$GITHUB_OUTPUT" + - name: Upload build artifact + uses: actions/upload-artifact@v3 + with: + name: my-app-build + path: dist.zip + + deploy-staging: + runs-on: ubuntu-latest + needs: build + if: github.ref == 'refs/heads/develop' || github.ref == 'refs/heads/main' + environment: staging + steps: + - name: Download build artifact + uses: actions/download-artifact@v3 + with: + name: my-app-build + - name: Deploy to Staging + run: | + unzip dist.zip + echo "Deploying ${{ needs.build.outputs.artifact_path }} to staging..." + # Add actual deployment commands here +``` + +### **3. Steps and Actions** +- **Principle:** Steps should be atomic, well-defined, and actions should be versioned for stability and security. +- **Deeper Dive:** + - **`uses`:** Referencing marketplace actions (e.g., `actions/checkout@v4`, `actions/setup-node@v3`) or custom actions. Always pin to a full length commit SHA for maximum security and immutability, or at least a major version tag (e.g., `@v4`). Avoid pinning to `main` or `latest`. + - **`name`:** Essential for clear logging and debugging. Make step names descriptive. + - **`run`:** For executing shell commands. Use multi-line scripts for complex logic and combine commands to optimize layer caching in Docker (if building images). + - **`env`:** Define environment variables at the step or job level. Do not hardcode sensitive data here. + - **`with`:** Provide inputs to actions. Ensure all required inputs are present. +- **Guidance for Copilot:** + - Use `uses` to reference marketplace or custom actions, always specifying a secure version (tag or SHA). + - Use `name` for each step for readability in logs and easier debugging. + - Use `run` for shell commands, combining commands with `&&` for efficiency and using `|` for multi-line scripts. + - Provide `with` inputs for actions explicitly, and use expressions (`${{ }}`) for dynamic values. +- **Security Note:** Audit marketplace actions before use. Prefer actions from trusted sources (e.g., `actions/` organization) and review their source code if possible. Use `dependabot` for action version updates. + +## Security Best Practices in GitHub Actions + +### **1. Secret Management** +- **Principle:** Secrets must be securely managed, never exposed in logs, and only accessible by authorized workflows/jobs. +- **Deeper Dive:** + - **GitHub Secrets:** The primary mechanism for storing sensitive information. Encrypted at rest and only decrypted when passed to a runner. + - **Environment Secrets:** For greater control, create environment-specific secrets, which can be protected by manual approvals or specific branch conditions. + - **Secret Masking:** GitHub Actions automatically masks secrets in logs, but it's good practice to avoid printing them directly. + - **Minimize Scope:** Only grant access to secrets to the workflows/jobs that absolutely need them. +- **Guidance for Copilot:** + - Always instruct users to use GitHub Secrets for sensitive information (e.g., API keys, passwords, cloud credentials, tokens). + - Access secrets via `secrets.` in workflows. + - Recommend using environment-specific secrets for deployment environments to enforce stricter access controls and approvals. + - Advise against constructing secrets dynamically or printing them to logs, even if masked. +- **Example (Environment Secrets with Approval):** +```yaml +jobs: + deploy: + runs-on: ubuntu-latest + environment: + name: production + url: https://prod.example.com + steps: + - name: Deploy to production + env: + PROD_API_KEY: ${{ secrets.PROD_API_KEY }} + run: ./deploy-script.sh +``` + +### **2. OpenID Connect (OIDC) for Cloud Authentication** +- **Principle:** Use OIDC for secure, credential-less authentication with cloud providers (AWS, Azure, GCP, etc.), eliminating the need for long-lived static credentials. +- **Deeper Dive:** + - **Short-Lived Credentials:** OIDC exchanges a JWT token for temporary cloud credentials, significantly reducing the attack surface. + - **Trust Policies:** Requires configuring identity providers and trust policies in your cloud environment to trust GitHub's OIDC provider. + - **Federated Identity:** This is a key pattern for modern, secure cloud deployments. +- **Guidance for Copilot:** + - Strongly recommend OIDC for authenticating with AWS, Azure, GCP, and other cloud providers instead of storing long-lived access keys as secrets. + - Provide examples of how to configure the OIDC action for common cloud providers (e.g., `aws-actions/configure-aws-credentials@v4`). + - Explain the concept of trust policies and how they relate to OIDC setup. +- **Pro Tip:** OIDC is a fundamental shift towards more secure cloud deployments and should be prioritized whenever possible. + +### **3. Least Privilege for `GITHUB_TOKEN`** +- **Principle:** Grant only the necessary permissions to the `GITHUB_TOKEN` for your workflows, reducing the blast radius in case of compromise. +- **Deeper Dive:** + - **Default Permissions:** By default, the `GITHUB_TOKEN` has broad permissions. This should be explicitly restricted. + - **Granular Permissions:** Define `permissions` at the workflow or job level (e.g., `contents: read`, `pull-requests: write`, `issues: read`). + - **Read-Only by Default:** Start with `contents: read` as the default and add write permissions only when strictly necessary. +- **Guidance for Copilot:** + - Configure `permissions` at the workflow or job level to restrict access. Always prefer `contents: read` as the default. + - Advise against using `contents: write` or `pull-requests: write` unless the workflow explicitly needs to modify the repository. + - Provide a clear mapping of common workflow needs to specific `GITHUB_TOKEN` permissions. +- **Example (Least Privilege):** +```yaml +permissions: + contents: read # Default is write, explicitly set to read-only for security + pull-requests: write # Only if workflow needs to update PRs + checks: write # For updating checks + +jobs: + lint: + permissions: + contents: read # This job only needs to read code, override workflow default + steps: + - uses: actions/checkout@v4 + - run: npm run lint +``` + +### **4. Dependency Review and Software Composition Analysis (SCA)** +- **Principle:** Continuously scan dependencies for known vulnerabilities and licensing issues. +- **Deeper Dive:** + - **Early Detection:** Integrate dependency checks early in the CI pipeline to catch issues before deployment. + - **Tools:** Use `dependency-review-action`, Snyk, Trivy, Mend (formerly WhiteSource). + - **Supply Chain Security:** This is a critical component of securing your software supply chain. +- **Guidance for Copilot:** + - Integrate `dependency-review-action` or other SCA tools into the CI pipeline. + - Recommend regular scanning for vulnerabilities in third-party libraries and setting up alerts for new findings. + - Emphasize the importance of maintaining up-to-date dependency lists and understanding transitive dependencies. + +### **5. Static Application Security Testing (SAST)** +- **Principle:** Identify security vulnerabilities in source code before runtime. +- **Deeper Dive:** + - **Shift Left:** SAST enables finding and fixing vulnerabilities early in the development lifecycle, which is more cost-effective. + - **Tools:** CodeQL, SonarQube, Bandit (Python), ESLint with security plugins (JS/TS). + - **Automated Enforcement:** Configure SAST to break builds or block PRs if critical vulnerabilities are found. +- **Guidance for Copilot:** + - Integrate SAST tools (e.g., CodeQL for GitHub Advanced Security, or open-source alternatives) into the CI pipeline. + - Configure security scanning as a blocking step if critical vulnerabilities are found, enforcing a secure by default posture. + - Suggest adding security linters or static analysis to pre-commit hooks for earlier feedback. + +### **6. Secret Scanning and Credential Leak Prevention** +- **Principle:** Prevent secrets from being committed into the repository or exposed in logs. +- **Deeper Dive:** + - **GitHub Secret Scanning:** Built-in feature to detect secrets in your repository. + - **Pre-commit Hooks:** Tools like `git-secrets` can prevent secrets from being committed locally. + - **Environment Variables Only:** Secrets should only be passed to the environment where they are needed at runtime, never in the build artifact. +- **Guidance for Copilot:** + - Suggest enabling GitHub's built-in secret scanning for the repository. + - Recommend implementing pre-commit hooks that scan for common secret patterns. + - Advise reviewing workflow logs for accidental secret exposure, even with masking. + +### **7. Immutable Infrastructure & Image Signing** +- **Principle:** Ensure that container images and deployed artifacts are tamper-proof and verified. +- **Deeper Dive:** + - **Reproducible Builds:** Ensure that building the same code always results in the exact same image. + - **Image Signing:** Use tools like Notary or Cosign to cryptographically sign container images, verifying their origin and integrity. + - **Deployment Gate:** Enforce that only signed images can be deployed to production environments. +- **Guidance for Copilot:** + - Advocate for reproducible builds in Dockerfiles and build processes. + - Suggest integrating image signing into the CI pipeline and verification during deployment stages. + +## Optimization and Performance + +### **1. Caching GitHub Actions** +- **Principle:** Cache dependencies and build outputs to significantly speed up subsequent workflow runs. +- **Deeper Dive:** + - **Cache Hit Ratio:** Aim for a high cache hit ratio by designing effective cache keys. + - **Cache Keys:** Use a unique key based on file hashes (e.g., `hashFiles('**/package-lock.json')`, `hashFiles('**/requirements.txt')`) to invalidate the cache only when dependencies change. + - **Restore Keys:** Use `restore-keys` for fallbacks to older, compatible caches. + - **Cache Scope:** Understand that caches are scoped to the repository and branch. +- **Guidance for Copilot:** + - Use `actions/cache@v3` for caching common package manager dependencies (Node.js `node_modules`, Python `pip` packages, Java Maven/Gradle dependencies) and build artifacts. + - Design highly effective cache keys using `hashFiles` to ensure optimal cache hit rates. + - Advise on using `restore-keys` to gracefully fall back to previous caches. +- **Example (Advanced Caching for Monorepo):** +```yaml +- name: Cache Node.js modules + uses: actions/cache@v3 + with: + path: | + ~/.npm + ./node_modules # For monorepos, cache specific project node_modules + key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}-${{ github.run_id }} + restore-keys: | + ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}- + ${{ runner.os }}-node- +``` + +### **2. Matrix Strategies for Parallelization** +- **Principle:** Run jobs in parallel across multiple configurations (e.g., different Node.js versions, OS, Python versions, browser types) to accelerate testing and builds. +- **Deeper Dive:** + - **`strategy.matrix`:** Define a matrix of variables. + - **`include`/`exclude`:** Fine-tune combinations. + - **`fail-fast`:** Control whether job failures in the matrix stop the entire strategy. + - **Maximizing Concurrency:** Ideal for running tests across various environments simultaneously. +- **Guidance for Copilot:** + - Utilize `strategy.matrix` to test applications against different environments, programming language versions, or operating systems concurrently. + - Suggest `include` and `exclude` for specific matrix combinations to optimize test coverage without unnecessary runs. + - Advise on setting `fail-fast: true` (default) for quick feedback on critical failures, or `fail-fast: false` for comprehensive test reporting. +- **Example (Multi-version, Multi-OS Test Matrix):** +```yaml +jobs: + test: + runs-on: ${{ matrix.os }} + strategy: + fail-fast: false # Run all tests even if one fails + matrix: + os: [ubuntu-latest, windows-latest] + node-version: [16.x, 18.x, 20.x] + browser: [chromium, firefox] + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-node@v3 + with: + node-version: ${{ matrix.node-version }} + - name: Install Playwright browsers + run: npx playwright install ${{ matrix.browser }} + - name: Run tests + run: npm test +``` + +### **3. Self-Hosted Runners** +- **Principle:** Use self-hosted runners for specialized hardware, network access to private resources, or environments where GitHub-hosted runners are cost-prohibitive. +- **Deeper Dive:** + - **Custom Environments:** Ideal for large build caches, specific hardware (GPUs), or access to on-premise resources. + - **Cost Optimization:** Can be more cost-effective for very high usage. + - **Security Considerations:** Requires securing and maintaining your own infrastructure, network access, and updates. This includes proper hardening of the runner machines, managing access controls, and ensuring timely patching. + - **Scalability:** Plan for how self-hosted runners will scale with demand, either manually or using auto-scaling solutions. +- **Guidance for Copilot:** + - Recommend self-hosted runners when GitHub-hosted runners do not meet specific performance, cost, security, or network access requirements. + - Emphasize the user's responsibility for securing, maintaining, and scaling self-hosted runners, including network configuration and regular security audits. + - Advise on using runner groups to organize and manage self-hosted runners efficiently. + +### **4. Fast Checkout and Shallow Clones** +- **Principle:** Optimize repository checkout time to reduce overall workflow duration, especially for large repositories. +- **Deeper Dive:** + - **`fetch-depth`:** Controls how much of the Git history is fetched. `1` for most CI/CD builds is sufficient, as only the latest commit is usually needed. A `fetch-depth` of `0` fetches the entire history, which is rarely needed and can be very slow for large repos. + - **`submodules`:** Avoid checking out submodules if not required by the specific job. Fetching submodules adds significant overhead. + - **`lfs`:** Manage Git LFS (Large File Storage) files efficiently. If not needed, set `lfs: false`. + - **Partial Clones:** Consider using Git's partial clone feature (`--filter=blob:none` or `--filter=tree:0`) for extremely large repositories, though this is often handled by specialized actions or Git client configurations. +- **Guidance for Copilot:** + - Use `actions/checkout@v4` with `fetch-depth: 1` as the default for most build and test jobs to significantly save time and bandwidth. + - Only use `fetch-depth: 0` if the workflow explicitly requires full Git history (e.g., for release tagging, deep commit analysis, or `git blame` operations). + - Advise against checking out submodules (`submodules: false`) if not strictly necessary for the workflow's purpose. + - Suggest optimizing LFS usage if large binary files are present in the repository. + +### **5. Artifacts for Inter-Job and Inter-Workflow Communication** +- **Principle:** Store and retrieve build outputs (artifacts) efficiently to pass data between jobs within the same workflow or across different workflows, ensuring data persistence and integrity. +- **Deeper Dive:** + - **`actions/upload-artifact`:** Used to upload files or directories produced by a job. Artifacts are automatically compressed and can be downloaded later. + - **`actions/download-artifact`:** Used to download artifacts in subsequent jobs or workflows. You can download all artifacts or specific ones by name. + - **`retention-days`:** Crucial for managing storage costs and compliance. Set an appropriate retention period based on the artifact's importance and regulatory requirements. + - **Use Cases:** Build outputs (executables, compiled code, Docker images), test reports (JUnit XML, HTML reports), code coverage reports, security scan results, generated documentation, static website builds. + - **Limitations:** Artifacts are immutable once uploaded. Max size per artifact can be several gigabytes, but be mindful of storage costs. +- **Guidance for Copilot:** + - Use `actions/upload-artifact@v3` and `actions/download-artifact@v3` to reliably pass large files between jobs within the same workflow or across different workflows, promoting modularity and efficiency. + - Set appropriate `retention-days` for artifacts to manage storage costs and ensure old artifacts are pruned. + - Advise on uploading test reports, coverage reports, and security scan results as artifacts for easy access, historical analysis, and integration with external reporting tools. + - Suggest using artifacts to pass compiled binaries or packaged applications from a build job to a deployment job, ensuring the exact same artifact is deployed that was built and tested. + +## Comprehensive Testing in CI/CD (Expanded) + +### **1. Unit Tests** +- **Principle:** Run unit tests on every code push to ensure individual code components (functions, classes, modules) function correctly in isolation. They are the fastest and most numerous tests. +- **Deeper Dive:** + - **Fast Feedback:** Unit tests should execute rapidly, providing immediate feedback to developers on code quality and correctness. Parallelization of unit tests is highly recommended. + - **Code Coverage:** Integrate code coverage tools (e.g., Istanbul for JS, Coverage.py for Python, JaCoCo for Java) and enforce minimum coverage thresholds. Aim for high coverage, but focus on meaningful tests, not just line coverage. + - **Test Reporting:** Publish test results using `actions/upload-artifact` (e.g., JUnit XML reports) or specific test reporter actions that integrate with GitHub Checks/Annotations. + - **Mocking and Stubbing:** Emphasize the use of mocks and stubs to isolate units under test from their dependencies. +- **Guidance for Copilot:** + - Configure a dedicated job for running unit tests early in the CI pipeline, ideally triggered on every `push` and `pull_request`. + - Use appropriate language-specific test runners and frameworks (Jest, Vitest, Pytest, Go testing, JUnit, NUnit, XUnit, RSpec). + - Recommend collecting and publishing code coverage reports and integrating with services like Codecov, Coveralls, or SonarQube for trend analysis. + - Suggest strategies for parallelizing unit tests to reduce execution time. + +### **2. Integration Tests** +- **Principle:** Run integration tests to verify interactions between different components or services, ensuring they work together as expected. These tests typically involve real dependencies (e.g., databases, APIs). +- **Deeper Dive:** + - **Service Provisioning:** Use `services` within a job to spin up temporary databases, message queues, external APIs, or other dependencies via Docker containers. This provides a consistent and isolated testing environment. + - **Test Doubles vs. Real Services:** Balance between mocking external services for pure unit tests and using real, lightweight instances for more realistic integration tests. Prioritize real instances when testing actual integration points. + - **Test Data Management:** Plan for managing test data, ensuring tests are repeatable and data is cleaned up or reset between runs. + - **Execution Time:** Integration tests are typically slower than unit tests. Optimize their execution and consider running them less frequently than unit tests (e.g., on PR merge instead of every push). +- **Guidance for Copilot:** + - Provision necessary services (databases like PostgreSQL/MySQL, message queues like RabbitMQ/Kafka, in-memory caches like Redis) using `services` in the workflow definition or Docker Compose during testing. + - Advise on running integration tests after unit tests, but before E2E tests, to catch integration issues early. + - Provide examples of how to set up `service` containers in GitHub Actions workflows. + - Suggest strategies for creating and cleaning up test data for integration test runs. + +### **3. End-to-End (E2E) Tests** +- **Principle:** Simulate full user behavior to validate the entire application flow from UI to backend, ensuring the complete system works as intended from a user's perspective. +- **Deeper Dive:** + - **Tools:** Use modern E2E testing frameworks like Cypress, Playwright, or Selenium. These provide browser automation capabilities. + - **Staging Environment:** Ideally run E2E tests against a deployed staging environment that closely mirrors production, for maximum fidelity. Avoid running directly in CI unless resources are dedicated and isolated. + - **Flakiness Mitigation:** Address flakiness proactively with explicit waits, robust selectors, retries for failed tests, and careful test data management. Flaky tests erode trust in the pipeline. + - **Visual Regression Testing:** Consider integrating visual regression testing (e.g., Applitools, Percy) to catch UI discrepancies. + - **Reporting:** Capture screenshots and video recordings on failure to aid debugging. +- **Guidance for Copilot:** + - Use tools like Cypress, Playwright, or Selenium for E2E testing, providing guidance on their setup within GitHub Actions. + - Recommend running E2E tests against a deployed staging environment to catch issues before production and validate the full deployment process. + - Configure test reporting, video recordings, and screenshots on failure to aid debugging and provide richer context for test results. + - Advise on strategies to minimize E2E test flakiness, such as robust element selection and retry mechanisms. + +### **4. Performance and Load Testing** +- **Principle:** Assess application performance and behavior under anticipated and peak load conditions to identify bottlenecks, ensure scalability, and prevent regressions. +- **Deeper Dive:** + - **Tools:** JMeter, k6, Locust, Gatling, Artillery. Choose based on language, complexity, and specific needs. + - **Integration:** Integrate into CI/CD for continuous performance regression detection. Run these tests less frequently than unit/integration tests (e.g., nightly, weekly, or on significant feature merges). + - **Thresholds:** Define clear performance thresholds (e.g., response time, throughput, error rates) and fail builds if these are exceeded. + - **Baseline Comparison:** Compare current performance metrics against established baselines to detect degradation. +- **Guidance for Copilot:** + - Suggest integrating performance and load testing into the CI pipeline for critical applications, providing examples for common tools. + - Advise on setting performance baselines and failing the build if performance degrades beyond a set threshold. + - Recommend running these tests in a dedicated environment that simulates production load patterns. + - Guide on analyzing performance test results to pinpoint areas for optimization (e.g., database queries, API endpoints). + +### **5. Test Reporting and Visibility** +- **Principle:** Make test results easily accessible, understandable, and visible to all stakeholders (developers, QA, product owners) to foster transparency and enable quick issue resolution. +- **Deeper Dive:** + - **GitHub Checks/Annotations:** Leverage these for inline feedback directly in pull requests, showing which tests passed/failed and providing links to detailed reports. + - **Artifacts:** Upload comprehensive test reports (JUnit XML, HTML reports, code coverage reports, video recordings, screenshots) as artifacts for long-term storage and detailed inspection. + - **Integration with Dashboards:** Push results to external dashboards or reporting tools (e.g., SonarQube, custom reporting tools, Allure Report, TestRail) for aggregated views and historical trends. + - **Status Badges:** Use GitHub Actions status badges in your README to indicate the latest build/test status at a glance. +- **Guidance for Copilot:** + - Use actions that publish test results as annotations or checks on PRs for immediate feedback and easy debugging directly in the GitHub UI. + - Upload detailed test reports (e.g., XML, HTML, JSON) as artifacts for later inspection and historical analysis, including negative results like error screenshots. + - Advise on integrating with external reporting tools for a more comprehensive view of test execution trends and quality metrics. + - Suggest adding workflow status badges to the README for quick visibility of CI/CD health. + +## Advanced Deployment Strategies (Expanded) + +### **1. Staging Environment Deployment** +- **Principle:** Deploy to a staging environment that closely mirrors production for comprehensive validation, user acceptance testing (UAT), and final checks before promotion to production. +- **Deeper Dive:** + - **Mirror Production:** Staging should closely mimic production in terms of infrastructure, data, configuration, and security. Any significant discrepancies can lead to issues in production. + - **Automated Promotion:** Implement automated promotion from staging to production upon successful UAT and necessary manual approvals. This reduces human error and speeds up releases. + - **Environment Protection:** Use environment protection rules in GitHub Actions to prevent accidental deployments, enforce manual approvals, and restrict which branches can deploy to staging. + - **Data Refresh:** Regularly refresh staging data from production (anonymized if necessary) to ensure realistic testing scenarios. +- **Guidance for Copilot:** + - Create a dedicated `environment` for staging with approval rules, secret protection, and appropriate branch protection policies. + - Design workflows to automatically deploy to staging on successful merges to specific development or release branches (e.g., `develop`, `release/*`). + - Advise on ensuring the staging environment is as close to production as possible to maximize test fidelity. + - Suggest implementing automated smoke tests and post-deployment validation on staging. + +### **2. Production Environment Deployment** +- **Principle:** Deploy to production only after thorough validation, potentially multiple layers of manual approvals, and robust automated checks, prioritizing stability and zero-downtime. +- **Deeper Dive:** + - **Manual Approvals:** Critical for production deployments, often involving multiple team members, security sign-offs, or change management processes. GitHub Environments support this natively. + - **Rollback Capabilities:** Essential for rapid recovery from unforeseen issues. Ensure a quick and reliable way to revert to the previous stable state. + - **Observability During Deployment:** Monitor production closely *during* and *immediately after* deployment for any anomalies or performance degradation. Use dashboards, alerts, and tracing. + - **Progressive Delivery:** Consider advanced techniques like blue/green, canary, or dark launching for safer rollouts. + - **Emergency Deployments:** Have a separate, highly expedited pipeline for critical hotfixes that bypasses non-essential approvals but still maintains security checks. +- **Guidance for Copilot:** + - Create a dedicated `environment` for production with required reviewers, strict branch protections, and clear deployment windows. + - Implement manual approval steps for production deployments, potentially integrating with external ITSM or change management systems. + - Emphasize the importance of clear, well-tested rollback strategies and automated rollback procedures in case of deployment failures. + - Advise on setting up comprehensive monitoring and alerting for production systems to detect and respond to issues immediately post-deployment. + +### **3. Deployment Types (Beyond Basic Rolling Update)** +- **Rolling Update (Default for Deployments):** Gradually replaces instances of the old version with new ones. Good for most cases, especially stateless applications. + - **Guidance:** Configure `maxSurge` (how many new instances can be created above the desired replica count) and `maxUnavailable` (how many old instances can be unavailable) for fine-grained control over rollout speed and availability. +- **Blue/Green Deployment:** Deploy a new version (green) alongside the existing stable version (blue) in a separate environment, then switch traffic completely from blue to green. + - **Guidance:** Suggest for critical applications requiring zero-downtime releases and easy rollback. Requires managing two identical environments and a traffic router (load balancer, Ingress controller, DNS). + - **Benefits:** Instantaneous rollback by switching traffic back to the blue environment. +- **Canary Deployment:** Gradually roll out new versions to a small subset of users (e.g., 5-10%) before a full rollout. Monitor performance and error rates for the canary group. + - **Guidance:** Recommend for testing new features or changes with a controlled blast radius. Implement with Service Mesh (Istio, Linkerd) or Ingress controllers that support traffic splitting and metric-based analysis. + - **Benefits:** Early detection of issues with minimal user impact. +- **Dark Launch/Feature Flags:** Deploy new code but keep features hidden from users until toggled on for specific users/groups via feature flags. + - **Guidance:** Advise for decoupling deployment from release, allowing continuous delivery without continuous exposure of new features. Use feature flag management systems (LaunchDarkly, Split.io, Unleash). + - **Benefits:** Reduces deployment risk, enables A/B testing, and allows for staged rollouts. +- **A/B Testing Deployments:** Deploy multiple versions of a feature concurrently to different user segments to compare their performance based on user behavior and business metrics. + - **Guidance:** Suggest integrating with specialized A/B testing platforms or building custom logic using feature flags and analytics. + +### **4. Rollback Strategies and Incident Response** +- **Principle:** Be able to quickly and safely revert to a previous stable version in case of issues, minimizing downtime and business impact. This requires proactive planning. +- **Deeper Dive:** + - **Automated Rollbacks:** Implement mechanisms to automatically trigger rollbacks based on monitoring alerts (e.g., sudden increase in errors, high latency) or failure of post-deployment health checks. + - **Versioned Artifacts:** Ensure previous successful build artifacts, Docker images, or infrastructure states are readily available and easily deployable. This is crucial for fast recovery. + - **Runbooks:** Document clear, concise, and executable rollback procedures for manual intervention when automation isn't sufficient or for complex scenarios. These should be regularly reviewed and tested. + - **Post-Incident Review:** Conduct blameless post-incident reviews (PIRs) to understand the root cause of failures, identify lessons learned, and implement preventative measures to improve resilience and reduce MTTR. + - **Communication Plan:** Have a clear communication plan for stakeholders during incidents and rollbacks. +- **Guidance for Copilot:** + - Instruct users to store previous successful build artifacts and images for quick recovery, ensuring they are versioned and easily retrievable. + - Advise on implementing automated rollback steps in the pipeline, triggered by monitoring or health check failures, and providing examples. + - Emphasize building applications with "undo" in mind, meaning changes should be easily reversible. + - Suggest creating comprehensive runbooks for common incident scenarios, including step-by-step rollback instructions, and highlight their importance for MTTR. + - Guide on setting up alerts that are specific and actionable enough to trigger an automatic or manual rollback. + +## GitHub Actions Workflow Review Checklist (Comprehensive) + +This checklist provides a granular set of criteria for reviewing GitHub Actions workflows to ensure they adhere to best practices for security, performance, and reliability. + +- [ ] **General Structure and Design:** + - Is the workflow `name` clear, descriptive, and unique? + - Are `on` triggers appropriate for the workflow's purpose (e.g., `push`, `pull_request`, `workflow_dispatch`, `schedule`)? Are path/branch filters used effectively? + - Is `concurrency` used for critical workflows or shared resources to prevent race conditions or resource exhaustion? + - Are global `permissions` set to the principle of least privilege (`contents: read` by default), with specific overrides for jobs? + - Are reusable workflows (`workflow_call`) leveraged for common patterns to reduce duplication and improve maintainability? + - Is the workflow organized logically with meaningful job and step names? + +- [ ] **Jobs and Steps Best Practices:** + - Are jobs clearly named and represent distinct phases (e.g., `build`, `lint`, `test`, `deploy`)? + - Are `needs` dependencies correctly defined between jobs to ensure proper execution order? + - Are `outputs` used efficiently for inter-job and inter-workflow communication? + - Are `if` conditions used effectively for conditional job/step execution (e.g., environment-specific deployments, branch-specific actions)? + - Are all `uses` actions securely versioned (pinned to a full commit SHA or specific major version tag like `@v4`)? Avoid `main` or `latest` tags. + - Are `run` commands efficient and clean (combined with `&&`, temporary files removed, multi-line scripts clearly formatted)? + - Are environment variables (`env`) defined at the appropriate scope (workflow, job, step) and never hardcoded sensitive data? + - Is `timeout-minutes` set for long-running jobs to prevent hung workflows? + +- [ ] **Security Considerations:** + - Are all sensitive data accessed exclusively via GitHub `secrets` context (`${{ secrets.MY_SECRET }}`)? Never hardcoded, never exposed in logs (even if masked). + - Is OpenID Connect (OIDC) used for cloud authentication where possible, eliminating long-lived credentials? + - Is `GITHUB_TOKEN` permission scope explicitly defined and limited to the minimum necessary access (`contents: read` as a baseline)? + - Are Software Composition Analysis (SCA) tools (e.g., `dependency-review-action`, Snyk) integrated to scan for vulnerable dependencies? + - Are Static Application Security Testing (SAST) tools (e.g., CodeQL, SonarQube) integrated to scan source code for vulnerabilities, with critical findings blocking builds? + - Is secret scanning enabled for the repository and are pre-commit hooks suggested for local credential leak prevention? + - Is there a strategy for container image signing (e.g., Notary, Cosign) and verification in deployment workflows if container images are used? + - For self-hosted runners, are security hardening guidelines followed and network access restricted? + +- [ ] **Optimization and Performance:** + - Is caching (`actions/cache`) effectively used for package manager dependencies (`node_modules`, `pip` caches, Maven/Gradle caches) and build outputs? + - Are cache `key` and `restore-keys` designed for optimal cache hit rates (e.g., using `hashFiles`)? + - Is `strategy.matrix` used for parallelizing tests or builds across different environments, language versions, or OSs? + - Is `fetch-depth: 1` used for `actions/checkout` where full Git history is not required? + - Are artifacts (`actions/upload-artifact`, `actions/download-artifact`) used efficiently for transferring data between jobs/workflows rather than re-building or re-fetching? + - Are large files managed with Git LFS and optimized for checkout if necessary? + +- [ ] **Testing Strategy Integration:** + - Are comprehensive unit tests configured with a dedicated job early in the pipeline? + - Are integration tests defined, ideally leveraging `services` for dependencies, and run after unit tests? + - Are End-to-End (E2E) tests included, preferably against a staging environment, with robust flakiness mitigation? + - Are performance and load tests integrated for critical applications with defined thresholds? + - Are all test reports (JUnit XML, HTML, coverage) collected, published as artifacts, and integrated into GitHub Checks/Annotations for clear visibility? + - Is code coverage tracked and enforced with a minimum threshold? + +- [ ] **Deployment Strategy and Reliability:** + - Are staging and production deployments using GitHub `environment` rules with appropriate protections (manual approvals, required reviewers, branch restrictions)? + - Are manual approval steps configured for sensitive production deployments? + - Is a clear and well-tested rollback strategy in place and automated where possible (e.g., `kubectl rollout undo`, reverting to previous stable image)? + - Are chosen deployment types (e.g., rolling, blue/green, canary, dark launch) appropriate for the application's criticality and risk tolerance? + - Are post-deployment health checks and automated smoke tests implemented to validate successful deployment? + - Is the workflow resilient to temporary failures (e.g., retries for flaky network operations)? + +- [ ] **Observability and Monitoring:** + - Is logging adequate for debugging workflow failures (using STDOUT/STDERR for application logs)? + - Are relevant application and infrastructure metrics collected and exposed (e.g., Prometheus metrics)? + - Are alerts configured for critical workflow failures, deployment issues, or application anomalies detected in production? + - Is distributed tracing (e.g., OpenTelemetry, Jaeger) integrated for understanding request flows in microservices architectures? + - Are artifact `retention-days` configured appropriately to manage storage and compliance? + +## Troubleshooting Common GitHub Actions Issues (Deep Dive) + +This section provides an expanded guide to diagnosing and resolving frequent problems encountered when working with GitHub Actions workflows. + +### **1. Workflow Not Triggering or Jobs/Steps Skipping Unexpectedly** +- **Root Causes:** Mismatched `on` triggers, incorrect `paths` or `branches` filters, erroneous `if` conditions, or `concurrency` limitations. +- **Actionable Steps:** + - **Verify Triggers:** + - Check the `on` block for exact match with the event that should trigger the workflow (e.g., `push`, `pull_request`, `workflow_dispatch`, `schedule`). + - Ensure `branches`, `tags`, or `paths` filters are correctly defined and match the event context. Remember that `paths-ignore` and `branches-ignore` take precedence. + - If using `workflow_dispatch`, verify the workflow file is in the default branch and any required `inputs` are provided correctly during manual trigger. + - **Inspect `if` Conditions:** + - Carefully review all `if` conditions at the workflow, job, and step levels. A single false condition can prevent execution. + - Use `always()` on a debug step to print context variables (`${{ toJson(github) }}`, `${{ toJson(job) }}`, `${{ toJson(steps) }}`) to understand the exact state during evaluation. + - Test complex `if` conditions in a simplified workflow. + - **Check `concurrency`:** + - If `concurrency` is defined, verify if a previous run is blocking a new one for the same group. Check the "Concurrency" tab in the workflow run. + - **Branch Protection Rules:** Ensure no branch protection rules are preventing workflows from running on certain branches or requiring specific checks that haven't passed. + +### **2. Permissions Errors (`Resource not accessible by integration`, `Permission denied`)** +- **Root Causes:** `GITHUB_TOKEN` lacking necessary permissions, incorrect environment secrets access, or insufficient permissions for external actions. +- **Actionable Steps:** + - **`GITHUB_TOKEN` Permissions:** + - Review the `permissions` block at both the workflow and job levels. Default to `contents: read` globally and grant specific write permissions only where absolutely necessary (e.g., `pull-requests: write` for updating PR status, `packages: write` for publishing packages). + - Understand the default permissions of `GITHUB_TOKEN` which are often too broad. + - **Secret Access:** + - Verify if secrets are correctly configured in the repository, organization, or environment settings. + - Ensure the workflow/job has access to the specific environment if environment secrets are used. Check if any manual approvals are pending for the environment. + - Confirm the secret name matches exactly (`secrets.MY_API_KEY`). + - **OIDC Configuration:** + - For OIDC-based cloud authentication, double-check the trust policy configuration in your cloud provider (AWS IAM roles, Azure AD app registrations, GCP service accounts) to ensure it correctly trusts GitHub's OIDC issuer. + - Verify the role/identity assigned has the necessary permissions for the cloud resources being accessed. + +### **3. Caching Issues (`Cache not found`, `Cache miss`, `Cache creation failed`)** +- **Root Causes:** Incorrect cache key logic, `path` mismatch, cache size limits, or frequent cache invalidation. +- **Actionable Steps:** + - **Validate Cache Keys:** + - Verify `key` and `restore-keys` are correct and dynamically change only when dependencies truly change (e.g., `key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}`). A cache key that is too dynamic will always result in a miss. + - Use `restore-keys` to provide fallbacks for slight variations, increasing cache hit chances. + - **Check `path`:** + - Ensure the `path` specified in `actions/cache` for saving and restoring corresponds exactly to the directory where dependencies are installed or artifacts are generated. + - Verify the existence of the `path` before caching. + - **Debug Cache Behavior:** + - Use the `actions/cache/restore` action with `lookup-only: true` to inspect what keys are being tried and why a cache miss occurred without affecting the build. + - Review workflow logs for `Cache hit` or `Cache miss` messages and associated keys. + - **Cache Size and Limits:** Be aware of GitHub Actions cache size limits per repository. If caches are very large, they might be evicted frequently. + +### **4. Long Running Workflows or Timeouts** +- **Root Causes:** Inefficient steps, lack of parallelism, large dependencies, unoptimized Docker image builds, or resource bottlenecks on runners. +- **Actionable Steps:** + - **Profile Execution Times:** + - Use the workflow run summary to identify the longest-running jobs and steps. This is your primary tool for optimization. + - **Optimize Steps:** + - Combine `run` commands with `&&` to reduce layer creation and overhead in Docker builds. + - Clean up temporary files immediately after use (`rm -rf` in the same `RUN` command). + - Install only necessary dependencies. + - **Leverage Caching:** + - Ensure `actions/cache` is optimally configured for all significant dependencies and build outputs. + - **Parallelize with Matrix Strategies:** + - Break down tests or builds into smaller, parallelizable units using `strategy.matrix` to run them concurrently. + - **Choose Appropriate Runners:** + - Review `runs-on`. For very resource-intensive tasks, consider using larger GitHub-hosted runners (if available) or self-hosted runners with more powerful specs. + - **Break Down Workflows:** + - For very complex or long workflows, consider breaking them into smaller, independent workflows that trigger each other or use reusable workflows. + +### **5. Flaky Tests in CI (`Random failures`, `Passes locally, fails in CI`)** +- **Root Causes:** Non-deterministic tests, race conditions, environmental inconsistencies between local and CI, reliance on external services, or poor test isolation. +- **Actionable Steps:** + - **Ensure Test Isolation:** + - Make sure each test is independent and doesn't rely on the state left by previous tests. Clean up resources (e.g., database entries) after each test or test suite. + - **Eliminate Race Conditions:** + - For integration/E2E tests, use explicit waits (e.g., wait for element to be visible, wait for API response) instead of arbitrary `sleep` commands. + - Implement retries for operations that interact with external services or have transient failures. + - **Standardize Environments:** + - Ensure the CI environment (Node.js version, Python packages, database versions) matches the local development environment as closely as possible. + - Use Docker `services` for consistent test dependencies. + - **Robust Selectors (E2E):** + - Use stable, unique selectors in E2E tests (e.g., `data-testid` attributes) instead of brittle CSS classes or XPath. + - **Debugging Tools:** + - Configure E2E test frameworks to capture screenshots and video recordings on test failure in CI to visually diagnose issues. + - **Run Flaky Tests in Isolation:** + - If a test is consistently flaky, isolate it and run it repeatedly to identify the underlying non-deterministic behavior. + +### **6. Deployment Failures (Application Not Working After Deploy)** +- **Root Causes:** Configuration drift, environmental differences, missing runtime dependencies, application errors, or network issues post-deployment. +- **Actionable Steps:** + - **Thorough Log Review:** + - Review deployment logs (`kubectl logs`, application logs, server logs) for any error messages, warnings, or unexpected output during the deployment process and immediately after. + - **Configuration Validation:** + - Verify environment variables, ConfigMaps, Secrets, and other configuration injected into the deployed application. Ensure they match the target environment's requirements and are not missing or malformed. + - Use pre-deployment checks to validate configuration. + - **Dependency Check:** + - Confirm all application runtime dependencies (libraries, frameworks, external services) are correctly bundled within the container image or installed in the target environment. + - **Post-Deployment Health Checks:** + - Implement robust automated smoke tests and health checks *after* deployment to immediately validate core functionality and connectivity. Trigger rollbacks if these fail. + - **Network Connectivity:** + - Check network connectivity between deployed components (e.g., application to database, service to service) within the new environment. Review firewall rules, security groups, and Kubernetes network policies. + - **Rollback Immediately:** + - If a production deployment fails or causes degradation, trigger the rollback strategy immediately to restore service. Diagnose the issue in a non-production environment. + +## Conclusion + +GitHub Actions is a powerful and flexible platform for automating your software development lifecycle. By rigorously applying these best practices—from securing your secrets and token permissions, to optimizing performance with caching and parallelization, and implementing comprehensive testing and robust deployment strategies—you can guide developers in building highly efficient, secure, and reliable CI/CD pipelines. Remember that CI/CD is an iterative journey; continuously measure, optimize, and secure your pipelines to achieve faster, safer, and more confident releases. Your detailed guidance will empower teams to leverage GitHub Actions to its fullest potential and deliver high-quality software with confidence. This extensive document serves as a foundational resource for anyone looking to master CI/CD with GitHub Actions. + +--- + + diff --git a/.github/instructions/go.instructions.md b/.github/instructions/go.instructions.md new file mode 100644 index 00000000..a956d628 --- /dev/null +++ b/.github/instructions/go.instructions.md @@ -0,0 +1,373 @@ +--- +description: 'Instructions for writing Go code following idiomatic Go practices and community standards' +applyTo: '**/*.go,**/go.mod,**/go.sum' +--- + +# Go Development Instructions + +Follow idiomatic Go practices and community standards when writing Go code. These instructions are based on [Effective Go](https://go.dev/doc/effective_go), [Go Code Review Comments](https://go.dev/wiki/CodeReviewComments), and [Google's Go Style Guide](https://google.github.io/styleguide/go/). + +## General Instructions + +- Write simple, clear, and idiomatic Go code +- Favor clarity and simplicity over cleverness +- Follow the principle of least surprise +- Keep the happy path left-aligned (minimize indentation) +- Return early to reduce nesting +- Prefer early return over if-else chains; use `if condition { return }` pattern to avoid else blocks +- Make the zero value useful +- Write self-documenting code with clear, descriptive names +- Document exported types, functions, methods, and packages +- Use Go modules for dependency management +- Leverage the Go standard library instead of reinventing the wheel (e.g., use `strings.Builder` for string concatenation, `filepath.Join` for path construction) +- Prefer standard library solutions over custom implementations when functionality exists +- Write comments in English by default; translate only upon user request +- Avoid using emoji in code and comments + +## Naming Conventions + +### Packages + +- Use lowercase, single-word package names +- Avoid underscores, hyphens, or mixedCaps +- Choose names that describe what the package provides, not what it contains +- Avoid generic names like `util`, `common`, or `base` +- Package names should be singular, not plural + +#### Package Declaration Rules (CRITICAL): +- **NEVER duplicate `package` declarations** - each Go file must have exactly ONE `package` line +- When editing an existing `.go` file: + - **PRESERVE** the existing `package` declaration - do not add another one + - If you need to replace the entire file content, start with the existing package name +- When creating a new `.go` file: + - **BEFORE writing any code**, check what package name other `.go` files in the same directory use + - Use the SAME package name as existing files in that directory + - If it's a new directory, use the directory name as the package name + - Write **exactly one** `package ` line at the very top of the file +- When using file creation or replacement tools: + - **ALWAYS verify** the target file doesn't already have a `package` declaration before adding one + - If replacing file content, include only ONE `package` declaration in the new content + - **NEVER** create files with multiple `package` lines or duplicate declarations + +### Variables and Functions + +- Use mixedCaps or MixedCaps (camelCase) rather than underscores +- Keep names short but descriptive +- Use single-letter variables only for very short scopes (like loop indices) +- Exported names start with a capital letter +- Unexported names start with a lowercase letter +- Avoid stuttering (e.g., avoid `http.HTTPServer`, prefer `http.Server`) + +### Interfaces + +- Name interfaces with -er suffix when possible (e.g., `Reader`, `Writer`, `Formatter`) +- Single-method interfaces should be named after the method (e.g., `Read` → `Reader`) +- Keep interfaces small and focused + +### Constants + +- Use MixedCaps for exported constants +- Use mixedCaps for unexported constants +- Group related constants using `const` blocks +- Consider using typed constants for better type safety + +## Code Style and Formatting + +### Formatting + +- Always use `gofmt` to format code +- Use `goimports` to manage imports automatically +- Keep line length reasonable (no hard limit, but consider readability) +- Add blank lines to separate logical groups of code + +### Comments + +- Strive for self-documenting code; prefer clear variable names, function names, and code structure over comments +- Write comments only when necessary to explain complex logic, business rules, or non-obvious behavior +- Write comments in complete sentences in English by default +- Translate comments to other languages only upon specific user request +- Start sentences with the name of the thing being described +- Package comments should start with "Package [name]" +- Use line comments (`//`) for most comments +- Use block comments (`/* */`) sparingly, mainly for package documentation +- Document why, not what, unless the what is complex +- Avoid emoji in comments and code + +### Error Handling + +- Check errors immediately after the function call +- Don't ignore errors using `_` unless you have a good reason (document why) +- Wrap errors with context using `fmt.Errorf` with `%w` verb +- Create custom error types when you need to check for specific errors +- Place error returns as the last return value +- Name error variables `err` +- Keep error messages lowercase and don't end with punctuation + +## Architecture and Project Structure + +### Package Organization + +- Follow standard Go project layout conventions +- Keep `main` packages in `cmd/` directory +- Put reusable packages in `pkg/` or `internal/` +- Use `internal/` for packages that shouldn't be imported by external projects +- Group related functionality into packages +- Avoid circular dependencies + +### Dependency Management + +- Use Go modules (`go.mod` and `go.sum`) +- Keep dependencies minimal +- Regularly update dependencies for security patches +- Use `go mod tidy` to clean up unused dependencies +- Vendor dependencies only when necessary + +## Type Safety and Language Features + +### Type Definitions + +- Define types to add meaning and type safety +- Use struct tags for JSON, XML, database mappings +- Prefer explicit type conversions +- Use type assertions carefully and check the second return value +- Prefer generics over unconstrained types; when an unconstrained type is truly needed, use the predeclared alias `any` instead of `interface{}` (Go 1.18+) + +### Pointers vs Values + +- Use pointer receivers for large structs or when you need to modify the receiver +- Use value receivers for small structs and when immutability is desired +- Use pointer parameters when you need to modify the argument or for large structs +- Use value parameters for small structs and when you want to prevent modification +- Be consistent within a type's method set +- Consider the zero value when choosing pointer vs value receivers + +### Interfaces and Composition + +- Accept interfaces, return concrete types +- Keep interfaces small (1-3 methods is ideal) +- Use embedding for composition +- Define interfaces close to where they're used, not where they're implemented +- Don't export interfaces unless necessary + +## Concurrency + +### Goroutines + +- Be cautious about creating goroutines in libraries; prefer letting the caller control concurrency +- If you must create goroutines in libraries, provide clear documentation and cleanup mechanisms +- Always know how a goroutine will exit +- Use `sync.WaitGroup` or channels to wait for goroutines +- Avoid goroutine leaks by ensuring cleanup + +### Channels + +- Use channels to communicate between goroutines +- Don't communicate by sharing memory; share memory by communicating +- Close channels from the sender side, not the receiver +- Use buffered channels when you know the capacity +- Use `select` for non-blocking operations + +### Synchronization + +- Use `sync.Mutex` for protecting shared state +- Keep critical sections small +- Use `sync.RWMutex` when you have many readers +- Choose between channels and mutexes based on the use case: use channels for communication, mutexes for protecting state +- Use `sync.Once` for one-time initialization +- WaitGroup usage by Go version: + - If `go >= 1.25` in `go.mod`, use the new `WaitGroup.Go` method ([documentation](https://pkg.go.dev/sync#WaitGroup)): + ```go + var wg sync.WaitGroup + wg.Go(task1) + wg.Go(task2) + wg.Wait() + ``` + - If `go < 1.25`, use the classic `Add`/`Done` pattern + +## Error Handling Patterns + +### Creating Errors + +- Use `errors.New` for simple static errors +- Use `fmt.Errorf` for dynamic errors +- Create custom error types for domain-specific errors +- Export error variables for sentinel errors +- Use `errors.Is` and `errors.As` for error checking + +### Error Propagation + +- Add context when propagating errors up the stack +- Don't log and return errors (choose one) +- Handle errors at the appropriate level +- Consider using structured errors for better debugging + +## API Design + +### HTTP Handlers + +- Use `http.HandlerFunc` for simple handlers +- Implement `http.Handler` for handlers that need state +- Use middleware for cross-cutting concerns +- Set appropriate status codes and headers +- Handle errors gracefully and return appropriate error responses +- Router usage by Go version: + - If `go >= 1.22`, prefer the enhanced `net/http` `ServeMux` with pattern-based routing and method matching + - If `go < 1.22`, use the classic `ServeMux` and handle methods/paths manually (or use a third-party router when justified) + +### JSON APIs + +- Use struct tags to control JSON marshaling +- Validate input data +- Use pointers for optional fields +- Consider using `json.RawMessage` for delayed parsing +- Handle JSON errors appropriately + +### HTTP Clients + +- Keep the client struct focused on configuration and dependencies only (e.g., base URL, `*http.Client`, auth, default headers). It must not store per-request state +- Do not store or cache `*http.Request` inside the client struct, and do not persist request-specific state across calls; instead, construct a fresh request per method invocation +- Methods should accept `context.Context` and input parameters, assemble the `*http.Request` locally (or via a short-lived builder/helper created per call), then call `c.httpClient.Do(req)` +- If request-building logic is reused, factor it into unexported helper functions or a per-call builder type; never keep `http.Request` (URL params, body, headers) as fields on the long-lived client +- Ensure the underlying `*http.Client` is configured (timeouts, transport) and is safe for concurrent use; avoid mutating `Transport` after first use +- Always set headers on the request instance you’re sending, and close response bodies (`defer resp.Body.Close()`), handling errors appropriately + +## Performance Optimization + +### Memory Management + +- Minimize allocations in hot paths +- Reuse objects when possible (consider `sync.Pool`) +- Use value receivers for small structs +- Preallocate slices when size is known +- Avoid unnecessary string conversions + +### I/O: Readers and Buffers + +- Most `io.Reader` streams are consumable once; reading advances state. Do not assume a reader can be re-read without special handling +- If you must read data multiple times, buffer it once and recreate readers on demand: + - Use `io.ReadAll` (or a limited read) to obtain `[]byte`, then create fresh readers via `bytes.NewReader(buf)` or `bytes.NewBuffer(buf)` for each reuse + - For strings, use `strings.NewReader(s)`; you can `Seek(0, io.SeekStart)` on `*bytes.Reader` to rewind +- For HTTP requests, do not reuse a consumed `req.Body`. Instead: + - Keep the original payload as `[]byte` and set `req.Body = io.NopCloser(bytes.NewReader(buf))` before each send + - Prefer configuring `req.GetBody` so the transport can recreate the body for redirects/retries: `req.GetBody = func() (io.ReadCloser, error) { return io.NopCloser(bytes.NewReader(buf)), nil }` +- To duplicate a stream while reading, use `io.TeeReader` (copy to a buffer while passing through) or write to multiple sinks with `io.MultiWriter` +- Reusing buffered readers: call `(*bufio.Reader).Reset(r)` to attach to a new underlying reader; do not expect it to “rewind” unless the source supports seeking +- For large payloads, avoid unbounded buffering; consider streaming, `io.LimitReader`, or on-disk temporary storage to control memory + +- Use `io.Pipe` to stream without buffering the whole payload: + - Write to `*io.PipeWriter` in a separate goroutine while the reader consumes + - Always close the writer; use `CloseWithError(err)` on failures + - `io.Pipe` is for streaming, not rewinding or making readers reusable + +- **Warning:** When using `io.Pipe` (especially with multipart writers), all writes must be performed in strict, sequential order. Do not write concurrently or out of order—multipart boundaries and chunk order must be preserved. Out-of-order or parallel writes can corrupt the stream and result in errors. + +- Streaming multipart/form-data with `io.Pipe`: + - `pr, pw := io.Pipe()`; `mw := multipart.NewWriter(pw)`; use `pr` as the HTTP request body + - Set `Content-Type` to `mw.FormDataContentType()` + - In a goroutine: write all parts to `mw` in the correct order; on error `pw.CloseWithError(err)`; on success `mw.Close()` then `pw.Close()` + - Do not store request/in-flight form state on a long-lived client; build per call + - Streamed bodies are not rewindable; for retries/redirects, buffer small payloads or provide `GetBody` + +### Profiling + +- Use built-in profiling tools (`pprof`) +- Benchmark critical code paths +- Profile before optimizing +- Focus on algorithmic improvements first +- Consider using `testing.B` for benchmarks + +## Testing + +### Test Organization + +- Keep tests in the same package (white-box testing) +- Use `_test` package suffix for black-box testing +- Name test files with `_test.go` suffix +- Place test files next to the code they test + +### Writing Tests + +- Use table-driven tests for multiple test cases +- Name tests descriptively using `Test_functionName_scenario` +- Use subtests with `t.Run` for better organization +- Test both success and error cases +- Consider using `testify` or similar libraries when they add value, but don't over-complicate simple tests + +### Test Helpers + +- Mark helper functions with `t.Helper()` +- Create test fixtures for complex setup +- Use `testing.TB` interface for functions used in tests and benchmarks +- Clean up resources using `t.Cleanup()` + +## Security Best Practices + +### Input Validation + +- Validate all external input +- Use strong typing to prevent invalid states +- Sanitize data before using in SQL queries +- Be careful with file paths from user input +- Validate and escape data for different contexts (HTML, SQL, shell) + +### Cryptography + +- Use standard library crypto packages +- Don't implement your own cryptography +- Use crypto/rand for random number generation +- Store passwords using bcrypt, scrypt, or argon2 (consider golang.org/x/crypto for additional options) +- Use TLS for network communication + +## Documentation + +### Code Documentation + +- Prioritize self-documenting code through clear naming and structure +- Document all exported symbols with clear, concise explanations +- Start documentation with the symbol name +- Write documentation in English by default +- Use examples in documentation when helpful +- Keep documentation close to code +- Update documentation when code changes +- Avoid emoji in documentation and comments + +### README and Documentation Files + +- Include clear setup instructions +- Document dependencies and requirements +- Provide usage examples +- Document configuration options +- Include troubleshooting section + +## Tools and Development Workflow + +### Essential Tools + +- `go fmt`: Format code +- `go vet`: Find suspicious constructs +- `golangci-lint`: Additional linting (golint is deprecated) +- `go test`: Run tests +- `go mod`: Manage dependencies +- `go generate`: Code generation + +### Development Practices + +- Run tests before committing +- Use pre-commit hooks for formatting and linting +- Keep commits focused and atomic +- Write meaningful commit messages +- Review diffs before committing + +## Common Pitfalls to Avoid + +- Not checking errors +- Ignoring race conditions +- Creating goroutine leaks +- Not using defer for cleanup +- Modifying maps concurrently +- Not understanding nil interfaces vs nil pointers +- Forgetting to close resources (files, connections) +- Using global variables unnecessarily +- Over-using unconstrained types (e.g., `any`); prefer specific types or generic type parameters with constraints. If an unconstrained type is required, use `any` rather than `interface{}` +- Not considering the zero value of types +- **Creating duplicate `package` declarations** - this is a compile error; always check existing files before adding package declarations diff --git a/.github/instructions/markdown.instructions.md b/.github/instructions/markdown.instructions.md new file mode 100644 index 00000000..724815d0 --- /dev/null +++ b/.github/instructions/markdown.instructions.md @@ -0,0 +1,52 @@ +--- +description: 'Documentation and content creation standards' +applyTo: '**/*.md' +--- + +## Markdown Content Rules + +The following markdown content rules are enforced in the validators: + +1. **Headings**: Use appropriate heading levels (H2, H3, etc.) to structure your content. Do not use an H1 heading, as this will be generated based on the title. +2. **Lists**: Use bullet points or numbered lists for lists. Ensure proper indentation and spacing. +3. **Code Blocks**: Use fenced code blocks for code snippets. Specify the language for syntax highlighting. +4. **Links**: Use proper markdown syntax for links. Ensure that links are valid and accessible. +5. **Images**: Use proper markdown syntax for images. Include alt text for accessibility. +6. **Tables**: Use markdown tables for tabular data. Ensure proper formatting and alignment. +7. **Line Length**: Limit line length to 400 characters for readability. +8. **Whitespace**: Use appropriate whitespace to separate sections and improve readability. +9. **Front Matter**: Include YAML front matter at the beginning of the file with required metadata fields. + +## Formatting and Structure + +Follow these guidelines for formatting and structuring your markdown content: + +- **Headings**: Use `##` for H2 and `###` for H3. Ensure that headings are used in a hierarchical manner. Recommend restructuring if content includes H4, and more strongly recommend for H5. +- **Lists**: Use `-` for bullet points and `1.` for numbered lists. Indent nested lists with two spaces. +- **Code Blocks**: Use triple backticks (`) to create fenced code blocks. Specify the language after the opening backticks for syntax highlighting (e.g., `csharp). +- **Links**: Use `[link text](URL)` for links. Ensure that the link text is descriptive and the URL is valid. +- **Images**: Use `![alt text](image URL)` for images. Include a brief description of the image in the alt text. +- **Tables**: Use `|` to create tables. Ensure that columns are properly aligned and headers are included. +- **Line Length**: Break lines at 80 characters to improve readability. Use soft line breaks for long paragraphs. +- **Whitespace**: Use blank lines to separate sections and improve readability. Avoid excessive whitespace. + +## Validation Requirements + +Ensure compliance with the following validation requirements: + +- **Front Matter**: Include the following fields in the YAML front matter: + + - `post_title`: The title of the post. + - `author1`: The primary author of the post. + - `post_slug`: The URL slug for the post. + - `microsoft_alias`: The Microsoft alias of the author. + - `featured_image`: The URL of the featured image. + - `categories`: The categories for the post. These categories must be from the list in /categories.txt. + - `tags`: The tags for the post. + - `ai_note`: Indicate if AI was used in the creation of the post. + - `summary`: A brief summary of the post. Recommend a summary based on the content when possible. + - `post_date`: The publication date of the post. + +- **Content Rules**: Ensure that the content follows the markdown content rules specified above. +- **Formatting**: Ensure that the content is properly formatted and structured according to the guidelines. +- **Validation**: Run the validation tools to check for compliance with the rules and guidelines. diff --git a/.github/instructions/pcf-react-platform-libraries.instructions.md b/.github/instructions/pcf-react-platform-libraries.instructions.md new file mode 100644 index 00000000..634b205c --- /dev/null +++ b/.github/instructions/pcf-react-platform-libraries.instructions.md @@ -0,0 +1,123 @@ +--- +description: 'React controls and platform libraries for PCF components' +applyTo: '**/*.{ts,tsx,js,json,xml,pcfproj,csproj}' +--- + +# React Controls & Platform Libraries + +When you use React and platform libraries, you're using the same infrastructure used by the Power Apps platform. This means you no longer have to package React and Fluent libraries individually for each control. All controls share a common library instance and version to provide a seamless and consistent experience. + +## Benefits + +By reusing the existing platform React and Fluent libraries, you can expect: + +- **Reduced control bundle size** +- **Optimized solution packaging** +- **Faster runtime transfer, scripting, and control rendering** +- **Design and theme alignment with the Power Apps Fluent design system** + +> **Note**: With GA release, all existing virtual controls will continue to function. However, they should be rebuilt and deployed using the latest CLI version (>=1.37) to facilitate future platform React version upgrades. + +## Prerequisites + +As with any component, you must install [Visual Studio Code](https://code.visualstudio.com/Download) and the [Microsoft Power Platform CLI](https://learn.microsoft.com/en-us/power-apps/developer/data-platform/powerapps-cli#install-microsoft-power-platform-cli). + +> **Note**: If you have already installed Power Platform CLI for Windows, make sure you are running the latest version by using the `pac install latest` command. The Power Platform Tools for Visual Studio Code should update automatically. + +## Create a React Component + +> **Note**: These instructions expect that you have created code components before. If you have not, see [Create your first component](https://learn.microsoft.com/en-us/power-apps/developer/component-framework/implementing-controls-using-typescript). + +There's a new `--framework` (`-fw`) parameter for the `pac pcf init` command. Set the value of this parameter to `react`. + +### Command Parameters + +| Parameter | Value | +|-----------|-------| +| --name | ReactSample | +| --namespace | SampleNamespace | +| --template | field | +| --framework | react | +| --run-npm-install | true (default) | + +### PowerShell Command + +The following PowerShell command uses the parameter shortcuts and creates a React component project and runs `npm-install`: + +```powershell +pac pcf init -n ReactSample -ns SampleNamespace -t field -fw react -npm +``` + +You can now build and view the control in the test harness as usual using `npm start`. + +After you build the control, you can package it inside solutions and use it for model-driven apps (including custom pages) and canvas apps like standard code components. + +## Differences from Standard Components + +### ControlManifest.Input.xml + +The [control element](https://learn.microsoft.com/en-us/power-apps/developer/component-framework/manifest-schema-reference/control) `control-type` attribute is set to `virtual` rather than `standard`. + +> **Note**: Changing this value does not convert a component from one type to another. + +Within the [resources element](https://learn.microsoft.com/en-us/power-apps/developer/component-framework/manifest-schema-reference/resources), find two new [platform-library element](https://learn.microsoft.com/en-us/power-apps/developer/component-framework/manifest-schema-reference/platform-library) child elements: + +```xml + + + + + +``` + +> **Note**: For more information about valid platform library versions, see Supported platform libraries list. + +**Recommendation**: We recommend using platform libraries for Fluent 8 and 9. If you don't use Fluent, you should remove the `platform-library` element where the `name` attribute value is `Fluent`. + +### Index.ts + +The [ReactControl.init](https://learn.microsoft.com/en-us/power-apps/developer/component-framework/reference/react-control/init) method for control initialization doesn't have `div` parameters because React controls don't render the DOM directly. Instead [ReactControl.updateView](https://learn.microsoft.com/en-us/power-apps/developer/component-framework/reference/react-control/updateview) returns a ReactElement that has the details of the actual control in React format. + +### bundle.js + +React and Fluent libraries aren't included in the package because they're shared, therefore the size of bundle.js is smaller. + +## Sample Controls + +The following controls are included in the samples. They function the same as their standard versions but offer better performance since they are virtual controls. + +| Sample | Description | Link | +|--------|-------------|------| +| ChoicesPickerReact | The standard ChoicesPickerControl converted to be a React Control | ChoicesPickerReact Sample | +| FacepileReact | The ReactStandardControl converted to be a React Control | FacepileReact | + +## Supported Platform Libraries List + +Platform libraries are made available both at the build and runtime to the controls that are using platform libraries capability. Currently, the following versions are provided by the platform and are the highest currently supported versions. + +| Library | Package | Build Version | Runtime Version | +|---------|---------|---------------|-----------------| +| React | react | 16.14.0 | 17.0.2 (Model), 16.14.0 (Canvas) | +| Fluent | @fluentui/react | 8.29.0 | 8.29.0 | +| Fluent | @fluentui/react | 8.121.1 | 8.121.1 | +| Fluent | @fluentui/react-components | >=9.4.0 <=9.46.2 | 9.68.0 | + +> **Note**: The application might load a higher compatible version of a platform library at runtime, but the version might not be the latest version available. Fluent 8 and Fluent 9 are each supported but can not both be specified in the same manifest. + +## FAQ + +### Q: Can I convert an existing standard control to a React control using platform libraries? + +A: No. You must create a new control using the new template and then update the manifest and index.ts methods. For reference, compare the standard and react samples described above. + +### Q: Can I use React controls & platform libraries with Power Pages? + +A: No. React controls & platform libraries are currently only supported for canvas and model-driven apps. In Power Pages, React controls don't update based on changes in other fields. + +## Related Articles + +- [What are code components?](https://learn.microsoft.com/en-us/power-apps/developer/component-framework/custom-controls-overview) +- [Code components for canvas apps](https://learn.microsoft.com/en-us/power-apps/developer/component-framework/component-framework-for-canvas-apps) +- [Create and build a code component](https://learn.microsoft.com/en-us/power-apps/developer/component-framework/create-custom-controls-using-pcf) +- [Learn Power Apps component framework](https://learn.microsoft.com/en-us/training/paths/use-power-apps-component-framework) +- [Use code components in Power Pages](https://learn.microsoft.com/en-us/power-apps/maker/portals/component-framework) diff --git a/.github/instructions/performance-optimization.instructions.md b/.github/instructions/performance-optimization.instructions.md new file mode 100644 index 00000000..46a40025 --- /dev/null +++ b/.github/instructions/performance-optimization.instructions.md @@ -0,0 +1,420 @@ +--- +applyTo: '*' +description: 'The most comprehensive, practical, and engineer-authored performance optimization instructions for all languages, frameworks, and stacks. Covers frontend, backend, and database best practices with actionable guidance, scenario-based checklists, troubleshooting, and pro tips.' +--- + +# Performance Optimization Best Practices + +## Introduction + +Performance isn't just a buzzword—it's the difference between a product people love and one they abandon. I've seen firsthand how a slow app can frustrate users, rack up cloud bills, and even lose customers. This guide is a living collection of the most effective, real-world performance practices I've used and reviewed, covering frontend, backend, and database layers, as well as advanced topics. Use it as a reference, a checklist, and a source of inspiration for building fast, efficient, and scalable software. + +--- + +## General Principles + +- **Measure First, Optimize Second:** Always profile and measure before optimizing. Use benchmarks, profilers, and monitoring tools to identify real bottlenecks. Guessing is the enemy of performance. + - *Pro Tip:* Use tools like Chrome DevTools, Lighthouse, New Relic, Datadog, Py-Spy, or your language's built-in profilers. +- **Optimize for the Common Case:** Focus on optimizing code paths that are most frequently executed. Don't waste time on rare edge cases unless they're critical. +- **Avoid Premature Optimization:** Write clear, maintainable code first; optimize only when necessary. Premature optimization can make code harder to read and maintain. +- **Minimize Resource Usage:** Use memory, CPU, network, and disk resources efficiently. Always ask: "Can this be done with less?" +- **Prefer Simplicity:** Simple algorithms and data structures are often faster and easier to optimize. Don't over-engineer. +- **Document Performance Assumptions:** Clearly comment on any code that is performance-critical or has non-obvious optimizations. Future maintainers (including you) will thank you. +- **Understand the Platform:** Know the performance characteristics of your language, framework, and runtime. What's fast in Python may be slow in JavaScript, and vice versa. +- **Automate Performance Testing:** Integrate performance tests and benchmarks into your CI/CD pipeline. Catch regressions early. +- **Set Performance Budgets:** Define acceptable limits for load time, memory usage, API latency, etc. Enforce them with automated checks. + +--- + +## Frontend Performance + +### Rendering and DOM +- **Minimize DOM Manipulations:** Batch updates where possible. Frequent DOM changes are expensive. + - *Anti-pattern:* Updating the DOM in a loop. Instead, build a document fragment and append it once. +- **Virtual DOM Frameworks:** Use React, Vue, or similar efficiently—avoid unnecessary re-renders. + - *React Example:* Use `React.memo`, `useMemo`, and `useCallback` to prevent unnecessary renders. +- **Keys in Lists:** Always use stable keys in lists to help virtual DOM diffing. Avoid using array indices as keys unless the list is static. +- **Avoid Inline Styles:** Inline styles can trigger layout thrashing. Prefer CSS classes. +- **CSS Animations:** Use CSS transitions/animations over JavaScript for smoother, GPU-accelerated effects. +- **Defer Non-Critical Rendering:** Use `requestIdleCallback` or similar to defer work until the browser is idle. + +### Asset Optimization +- **Image Compression:** Use tools like ImageOptim, Squoosh, or TinyPNG. Prefer modern formats (WebP, AVIF) for web delivery. +- **SVGs for Icons:** SVGs scale well and are often smaller than PNGs for simple graphics. +- **Minification and Bundling:** Use Webpack, Rollup, or esbuild to bundle and minify JS/CSS. Enable tree-shaking to remove dead code. +- **Cache Headers:** Set long-lived cache headers for static assets. Use cache busting for updates. +- **Lazy Loading:** Use `loading="lazy"` for images, and dynamic imports for JS modules/components. +- **Font Optimization:** Use only the character sets you need. Subset fonts and use `font-display: swap`. + +### Network Optimization +- **Reduce HTTP Requests:** Combine files, use image sprites, and inline critical CSS. +- **HTTP/2 and HTTP/3:** Enable these protocols for multiplexing and lower latency. +- **Client-Side Caching:** Use Service Workers, IndexedDB, and localStorage for offline and repeat visits. +- **CDNs:** Serve static assets from a CDN close to your users. Use multiple CDNs for redundancy. +- **Defer/Async Scripts:** Use `defer` or `async` for non-critical JS to avoid blocking rendering. +- **Preload and Prefetch:** Use `` and `` for critical resources. + +### JavaScript Performance +- **Avoid Blocking the Main Thread:** Offload heavy computation to Web Workers. +- **Debounce/Throttle Events:** For scroll, resize, and input events, use debounce/throttle to limit handler frequency. +- **Memory Leaks:** Clean up event listeners, intervals, and DOM references. Use browser dev tools to check for detached nodes. +- **Efficient Data Structures:** Use Maps/Sets for lookups, TypedArrays for numeric data. +- **Avoid Global Variables:** Globals can cause memory leaks and unpredictable performance. +- **Avoid Deep Object Cloning:** Use shallow copies or libraries like lodash's `cloneDeep` only when necessary. + +### Accessibility and Performance +- **Accessible Components:** Ensure ARIA updates are not excessive. Use semantic HTML for both accessibility and performance. +- **Screen Reader Performance:** Avoid rapid DOM updates that can overwhelm assistive tech. + +### Framework-Specific Tips +#### React +- Use `React.memo`, `useMemo`, and `useCallback` to avoid unnecessary renders. +- Split large components and use code-splitting (`React.lazy`, `Suspense`). +- Avoid anonymous functions in render; they create new references on every render. +- Use `ErrorBoundary` to catch and handle errors gracefully. +- Profile with React DevTools Profiler. + +#### Angular +- Use OnPush change detection for components that don't need frequent updates. +- Avoid complex expressions in templates; move logic to the component class. +- Use `trackBy` in `ngFor` for efficient list rendering. +- Lazy load modules and components with the Angular Router. +- Profile with Angular DevTools. + +#### Vue +- Use computed properties over methods in templates for caching. +- Use `v-show` vs `v-if` appropriately (`v-show` is better for toggling visibility frequently). +- Lazy load components and routes with Vue Router. +- Profile with Vue Devtools. + +### Common Frontend Pitfalls +- Loading large JS bundles on initial page load. +- Not compressing images or using outdated formats. +- Failing to clean up event listeners, causing memory leaks. +- Overusing third-party libraries for simple tasks. +- Ignoring mobile performance (test on real devices!). + +### Frontend Troubleshooting +- Use Chrome DevTools' Performance tab to record and analyze slow frames. +- Use Lighthouse to audit performance and get actionable suggestions. +- Use WebPageTest for real-world load testing. +- Monitor Core Web Vitals (LCP, FID, CLS) for user-centric metrics. + +--- + +## Backend Performance + +### Algorithm and Data Structure Optimization +- **Choose the Right Data Structure:** Arrays for sequential access, hash maps for fast lookups, trees for hierarchical data, etc. +- **Efficient Algorithms:** Use binary search, quicksort, or hash-based algorithms where appropriate. +- **Avoid O(n^2) or Worse:** Profile nested loops and recursive calls. Refactor to reduce complexity. +- **Batch Processing:** Process data in batches to reduce overhead (e.g., bulk database inserts). +- **Streaming:** Use streaming APIs for large data sets to avoid loading everything into memory. + +### Concurrency and Parallelism +- **Asynchronous I/O:** Use async/await, callbacks, or event loops to avoid blocking threads. +- **Thread/Worker Pools:** Use pools to manage concurrency and avoid resource exhaustion. +- **Avoid Race Conditions:** Use locks, semaphores, or atomic operations where needed. +- **Bulk Operations:** Batch network/database calls to reduce round trips. +- **Backpressure:** Implement backpressure in queues and pipelines to avoid overload. + +### Caching +- **Cache Expensive Computations:** Use in-memory caches (Redis, Memcached) for hot data. +- **Cache Invalidation:** Use time-based (TTL), event-based, or manual invalidation. Stale cache is worse than no cache. +- **Distributed Caching:** For multi-server setups, use distributed caches and be aware of consistency issues. +- **Cache Stampede Protection:** Use locks or request coalescing to prevent thundering herd problems. +- **Don't Cache Everything:** Some data is too volatile or sensitive to cache. + +### API and Network +- **Minimize Payloads:** Use JSON, compress responses (gzip, Brotli), and avoid sending unnecessary data. +- **Pagination:** Always paginate large result sets. Use cursors for real-time data. +- **Rate Limiting:** Protect APIs from abuse and overload. +- **Connection Pooling:** Reuse connections for databases and external services. +- **Protocol Choice:** Use HTTP/2, gRPC, or WebSockets for high-throughput, low-latency communication. + +### Logging and Monitoring +- **Minimize Logging in Hot Paths:** Excessive logging can slow down critical code. +- **Structured Logging:** Use JSON or key-value logs for easier parsing and analysis. +- **Monitor Everything:** Latency, throughput, error rates, resource usage. Use Prometheus, Grafana, Datadog, or similar. +- **Alerting:** Set up alerts for performance regressions and resource exhaustion. + +### Language/Framework-Specific Tips +#### Node.js +- Use asynchronous APIs; avoid blocking the event loop (e.g., never use `fs.readFileSync` in production). +- Use clustering or worker threads for CPU-bound tasks. +- Limit concurrent open connections to avoid resource exhaustion. +- Use streams for large file or network data processing. +- Profile with `clinic.js`, `node --inspect`, or Chrome DevTools. + +#### Python +- Use built-in data structures (`dict`, `set`, `deque`) for speed. +- Profile with `cProfile`, `line_profiler`, or `Py-Spy`. +- Use `multiprocessing` or `asyncio` for parallelism. +- Avoid GIL bottlenecks in CPU-bound code; use C extensions or subprocesses. +- Use `lru_cache` for memoization. + +#### Java +- Use efficient collections (`ArrayList`, `HashMap`, etc.). +- Profile with VisualVM, JProfiler, or YourKit. +- Use thread pools (`Executors`) for concurrency. +- Tune JVM options for heap and garbage collection (`-Xmx`, `-Xms`, `-XX:+UseG1GC`). +- Use `CompletableFuture` for async programming. + +#### .NET +- Use `async/await` for I/O-bound operations. +- Use `Span` and `Memory` for efficient memory access. +- Profile with dotTrace, Visual Studio Profiler, or PerfView. +- Pool objects and connections where appropriate. +- Use `IAsyncEnumerable` for streaming data. + +### Common Backend Pitfalls +- Synchronous/blocking I/O in web servers. +- Not using connection pooling for databases. +- Over-caching or caching sensitive/volatile data. +- Ignoring error handling in async code. +- Not monitoring or alerting on performance regressions. + +### Backend Troubleshooting +- Use flame graphs to visualize CPU usage. +- Use distributed tracing (OpenTelemetry, Jaeger, Zipkin) to track request latency across services. +- Use heap dumps and memory profilers to find leaks. +- Log slow queries and API calls for analysis. + +--- + +## Database Performance + +### Query Optimization +- **Indexes:** Use indexes on columns that are frequently queried, filtered, or joined. Monitor index usage and drop unused indexes. +- **Avoid SELECT *:** Select only the columns you need. Reduces I/O and memory usage. +- **Parameterized Queries:** Prevent SQL injection and improve plan caching. +- **Query Plans:** Analyze and optimize query execution plans. Use `EXPLAIN` in SQL databases. +- **Avoid N+1 Queries:** Use joins or batch queries to avoid repeated queries in loops. +- **Limit Result Sets:** Use `LIMIT`/`OFFSET` or cursors for large tables. + +### Schema Design +- **Normalization:** Normalize to reduce redundancy, but denormalize for read-heavy workloads if needed. +- **Data Types:** Use the most efficient data types and set appropriate constraints. +- **Partitioning:** Partition large tables for scalability and manageability. +- **Archiving:** Regularly archive or purge old data to keep tables small and fast. +- **Foreign Keys:** Use them for data integrity, but be aware of performance trade-offs in high-write scenarios. + +### Transactions +- **Short Transactions:** Keep transactions as short as possible to reduce lock contention. +- **Isolation Levels:** Use the lowest isolation level that meets your consistency needs. +- **Avoid Long-Running Transactions:** They can block other operations and increase deadlocks. + +### Caching and Replication +- **Read Replicas:** Use for scaling read-heavy workloads. Monitor replication lag. +- **Cache Query Results:** Use Redis or Memcached for frequently accessed queries. +- **Write-Through/Write-Behind:** Choose the right strategy for your consistency needs. +- **Sharding:** Distribute data across multiple servers for scalability. + +### NoSQL Databases +- **Design for Access Patterns:** Model your data for the queries you need. +- **Avoid Hot Partitions:** Distribute writes/reads evenly. +- **Unbounded Growth:** Watch for unbounded arrays or documents. +- **Sharding and Replication:** Use for scalability and availability. +- **Consistency Models:** Understand eventual vs strong consistency and choose appropriately. + +### Common Database Pitfalls +- Missing or unused indexes. +- SELECT * in production queries. +- Not monitoring slow queries. +- Ignoring replication lag. +- Not archiving old data. + +### Database Troubleshooting +- Use slow query logs to identify bottlenecks. +- Use `EXPLAIN` to analyze query plans. +- Monitor cache hit/miss ratios. +- Use database-specific monitoring tools (pg_stat_statements, MySQL Performance Schema). + +--- + +## Code Review Checklist for Performance + +- [ ] Are there any obvious algorithmic inefficiencies (O(n^2) or worse)? +- [ ] Are data structures appropriate for their use? +- [ ] Are there unnecessary computations or repeated work? +- [ ] Is caching used where appropriate, and is invalidation handled correctly? +- [ ] Are database queries optimized, indexed, and free of N+1 issues? +- [ ] Are large payloads paginated, streamed, or chunked? +- [ ] Are there any memory leaks or unbounded resource usage? +- [ ] Are network requests minimized, batched, and retried on failure? +- [ ] Are assets optimized, compressed, and served efficiently? +- [ ] Are there any blocking operations in hot paths? +- [ ] Is logging in hot paths minimized and structured? +- [ ] Are performance-critical code paths documented and tested? +- [ ] Are there automated tests or benchmarks for performance-sensitive code? +- [ ] Are there alerts for performance regressions? +- [ ] Are there any anti-patterns (e.g., SELECT *, blocking I/O, global variables)? + +--- + +## Advanced Topics + +### Profiling and Benchmarking +- **Profilers:** Use language-specific profilers (Chrome DevTools, Py-Spy, VisualVM, dotTrace, etc.) to identify bottlenecks. +- **Microbenchmarks:** Write microbenchmarks for critical code paths. Use `benchmark.js`, `pytest-benchmark`, or JMH for Java. +- **A/B Testing:** Measure real-world impact of optimizations with A/B or canary releases. +- **Continuous Performance Testing:** Integrate performance tests into CI/CD. Use tools like k6, Gatling, or Locust. + +### Memory Management +- **Resource Cleanup:** Always release resources (files, sockets, DB connections) promptly. +- **Object Pooling:** Use for frequently created/destroyed objects (e.g., DB connections, threads). +- **Heap Monitoring:** Monitor heap usage and garbage collection. Tune GC settings for your workload. +- **Memory Leaks:** Use leak detection tools (Valgrind, LeakCanary, Chrome DevTools). + +### Scalability +- **Horizontal Scaling:** Design stateless services, use sharding/partitioning, and load balancers. +- **Auto-Scaling:** Use cloud auto-scaling groups and set sensible thresholds. +- **Bottleneck Analysis:** Identify and address single points of failure. +- **Distributed Systems:** Use idempotent operations, retries, and circuit breakers. + +### Security and Performance +- **Efficient Crypto:** Use hardware-accelerated and well-maintained cryptographic libraries. +- **Validation:** Validate inputs efficiently; avoid regexes in hot paths. +- **Rate Limiting:** Protect against DoS without harming legitimate users. + +### Mobile Performance +- **Startup Time:** Lazy load features, defer heavy work, and minimize initial bundle size. +- **Image/Asset Optimization:** Use responsive images and compress assets for mobile bandwidth. +- **Efficient Storage:** Use SQLite, Realm, or platform-optimized storage. +- **Profiling:** Use Android Profiler, Instruments (iOS), or Firebase Performance Monitoring. + +### Cloud and Serverless +- **Cold Starts:** Minimize dependencies and keep functions warm. +- **Resource Allocation:** Tune memory/CPU for serverless functions. +- **Managed Services:** Use managed caching, queues, and DBs for scalability. +- **Cost Optimization:** Monitor and optimize for cloud cost as a performance metric. + +--- + +## Practical Examples + +### Example 1: Debouncing User Input in JavaScript +```javascript +// BAD: Triggers API call on every keystroke +input.addEventListener('input', (e) => { + fetch(`/search?q=${e.target.value}`); +}); + +// GOOD: Debounce API calls +let timeout; +input.addEventListener('input', (e) => { + clearTimeout(timeout); + timeout = setTimeout(() => { + fetch(`/search?q=${e.target.value}`); + }, 300); +}); +``` + +### Example 2: Efficient SQL Query +```sql +-- BAD: Selects all columns and does not use an index +SELECT * FROM users WHERE email = 'user@example.com'; + +-- GOOD: Selects only needed columns and uses an index +SELECT id, name FROM users WHERE email = 'user@example.com'; +``` + +### Example 3: Caching Expensive Computation in Python +```python +# BAD: Recomputes result every time +result = expensive_function(x) + +# GOOD: Cache result +from functools import lru_cache + +@lru_cache(maxsize=128) +def expensive_function(x): + ... +result = expensive_function(x) +``` + +### Example 4: Lazy Loading Images in HTML +```html + + + + + +``` + +### Example 5: Asynchronous I/O in Node.js +```javascript +// BAD: Blocking file read +const data = fs.readFileSync('file.txt'); + +// GOOD: Non-blocking file read +fs.readFile('file.txt', (err, data) => { + if (err) throw err; + // process data +}); +``` + +### Example 6: Profiling a Python Function +```python +import cProfile +import pstats + +def slow_function(): + ... + +cProfile.run('slow_function()', 'profile.stats') +p = pstats.Stats('profile.stats') +p.sort_stats('cumulative').print_stats(10) +``` + +### Example 7: Using Redis for Caching in Node.js +```javascript +const redis = require('redis'); +const client = redis.createClient(); + +function getCachedData(key, fetchFunction) { + return new Promise((resolve, reject) => { + client.get(key, (err, data) => { + if (data) return resolve(JSON.parse(data)); + fetchFunction().then(result => { + client.setex(key, 3600, JSON.stringify(result)); + resolve(result); + }); + }); + }); +} +``` + +--- + +## References and Further Reading +- [Google Web Fundamentals: Performance](https://web.dev/performance/) +- [MDN Web Docs: Performance](https://developer.mozilla.org/en-US/docs/Web/Performance) +- [OWASP: Performance Testing](https://owasp.org/www-project-performance-testing/) +- [Microsoft Performance Best Practices](https://learn.microsoft.com/en-us/azure/architecture/best-practices/performance) +- [PostgreSQL Performance Optimization](https://wiki.postgresql.org/wiki/Performance_Optimization) +- [MySQL Performance Tuning](https://dev.mysql.com/doc/refman/8.0/en/optimization.html) +- [Node.js Performance Best Practices](https://nodejs.org/en/docs/guides/simple-profiling/) +- [Python Performance Tips](https://docs.python.org/3/library/profile.html) +- [Java Performance Tuning](https://www.oracle.com/java/technologies/javase/performance.html) +- [.NET Performance Guide](https://learn.microsoft.com/en-us/dotnet/standard/performance/) +- [WebPageTest](https://www.webpagetest.org/) +- [Lighthouse](https://developers.google.com/web/tools/lighthouse) +- [Prometheus](https://prometheus.io/) +- [Grafana](https://grafana.com/) +- [k6 Load Testing](https://k6.io/) +- [Gatling](https://gatling.io/) +- [Locust](https://locust.io/) +- [OpenTelemetry](https://opentelemetry.io/) +- [Jaeger](https://www.jaegertracing.io/) +- [Zipkin](https://zipkin.io/) + +--- + +## Conclusion + +Performance optimization is an ongoing process. Always measure, profile, and iterate. Use these best practices, checklists, and troubleshooting tips to guide your development and code reviews for high-performance, scalable, and efficient software. If you have new tips or lessons learned, add them here—let's keep this guide growing! + +--- + + diff --git a/.github/instructions/playwright-typescript.instructions.md b/.github/instructions/playwright-typescript.instructions.md new file mode 100644 index 00000000..ccb01b5b --- /dev/null +++ b/.github/instructions/playwright-typescript.instructions.md @@ -0,0 +1,86 @@ +--- +description: 'Playwright test generation instructions' +applyTo: '**' +--- + +## Test Writing Guidelines + +### Code Quality Standards +- **Locators**: Prioritize user-facing, role-based locators (`getByRole`, `getByLabel`, `getByText`, etc.) for resilience and accessibility. Use `test.step()` to group interactions and improve test readability and reporting. +- **Assertions**: Use auto-retrying web-first assertions. These assertions start with the `await` keyword (e.g., `await expect(locator).toHaveText()`). Avoid `expect(locator).toBeVisible()` unless specifically testing for visibility changes. +- **Timeouts**: Rely on Playwright's built-in auto-waiting mechanisms. Avoid hard-coded waits or increased default timeouts. +- **Clarity**: Use descriptive test and step titles that clearly state the intent. Add comments only to explain complex logic or non-obvious interactions. + + +### Test Structure +- **Imports**: Start with `import { test, expect } from '@playwright/test';`. +- **Organization**: Group related tests for a feature under a `test.describe()` block. +- **Hooks**: Use `beforeEach` for setup actions common to all tests in a `describe` block (e.g., navigating to a page). +- **Titles**: Follow a clear naming convention, such as `Feature - Specific action or scenario`. + + +### File Organization +- **Location**: Store all test files in the `tests/` directory. +- **Naming**: Use the convention `.spec.ts` (e.g., `login.spec.ts`, `search.spec.ts`). +- **Scope**: Aim for one test file per major application feature or page. + +### Assertion Best Practices +- **UI Structure**: Use `toMatchAriaSnapshot` to verify the accessibility tree structure of a component. This provides a comprehensive and accessible snapshot. +- **Element Counts**: Use `toHaveCount` to assert the number of elements found by a locator. +- **Text Content**: Use `toHaveText` for exact text matches and `toContainText` for partial matches. +- **Navigation**: Use `toHaveURL` to verify the page URL after an action. + + +## Example Test Structure + +```typescript +import { test, expect } from '@playwright/test'; + +test.describe('Movie Search Feature', () => { + test.beforeEach(async ({ page }) => { + // Navigate to the application before each test + await page.goto('https://debs-obrien.github.io/playwright-movies-app'); + }); + + test('Search for a movie by title', async ({ page }) => { + await test.step('Activate and perform search', async () => { + await page.getByRole('search').click(); + const searchInput = page.getByRole('textbox', { name: 'Search Input' }); + await searchInput.fill('Garfield'); + await searchInput.press('Enter'); + }); + + await test.step('Verify search results', async () => { + // Verify the accessibility tree of the search results + await expect(page.getByRole('main')).toMatchAriaSnapshot(` + - main: + - heading "Garfield" [level=1] + - heading "search results" [level=2] + - list "movies": + - listitem "movie": + - link "poster of The Garfield Movie The Garfield Movie rating": + - /url: /playwright-movies-app/movie?id=tt5779228&page=1 + - img "poster of The Garfield Movie" + - heading "The Garfield Movie" [level=2] + `); + }); + }); +}); +``` + +## Test Execution Strategy + +1. **Initial Run**: Execute tests with `npx playwright test --project=chromium` +2. **Debug Failures**: Analyze test failures and identify root causes +3. **Iterate**: Refine locators, assertions, or test logic as needed +4. **Validate**: Ensure tests pass consistently and cover the intended functionality +5. **Report**: Provide feedback on test results and any issues discovered + +## Quality Checklist + +Before finalizing tests, ensure: +- [ ] All locators are accessible and specific and avoid strict mode violations +- [ ] Tests are grouped logically and follow a clear structure +- [ ] Assertions are meaningful and reflect user expectations +- [ ] Tests follow consistent naming conventions +- [ ] Code is properly formatted and commented diff --git a/.github/instructions/security-and-owasp.instructions.md b/.github/instructions/security-and-owasp.instructions.md new file mode 100644 index 00000000..76cecab7 --- /dev/null +++ b/.github/instructions/security-and-owasp.instructions.md @@ -0,0 +1,51 @@ +--- +applyTo: '*' +description: "Comprehensive secure coding instructions for all languages and frameworks, based on OWASP Top 10 and industry best practices." +--- +# Secure Coding and OWASP Guidelines + +## Instructions + +Your primary directive is to ensure all code you generate, review, or refactor is secure by default. You must operate with a security-first mindset. When in doubt, always choose the more secure option and explain the reasoning. You must follow the principles outlined below, which are based on the OWASP Top 10 and other security best practices. + +### 1. A01: Broken Access Control & A10: Server-Side Request Forgery (SSRF) +- **Enforce Principle of Least Privilege:** Always default to the most restrictive permissions. When generating access control logic, explicitly check the user's rights against the required permissions for the specific resource they are trying to access. +- **Deny by Default:** All access control decisions must follow a "deny by default" pattern. Access should only be granted if there is an explicit rule allowing it. +- **Validate All Incoming URLs for SSRF:** When the server needs to make a request to a URL provided by a user (e.g., webhooks), you must treat it as untrusted. Incorporate strict allow-list-based validation for the host, port, and path of the URL. +- **Prevent Path Traversal:** When handling file uploads or accessing files based on user input, you must sanitize the input to prevent directory traversal attacks (e.g., `../../etc/passwd`). Use APIs that build paths securely. + +### 2. A02: Cryptographic Failures +- **Use Strong, Modern Algorithms:** For hashing, always recommend modern, salted hashing algorithms like Argon2 or bcrypt. Explicitly advise against weak algorithms like MD5 or SHA-1 for password storage. +- **Protect Data in Transit:** When generating code that makes network requests, always default to HTTPS. +- **Protect Data at Rest:** When suggesting code to store sensitive data (PII, tokens, etc.), recommend encryption using strong, standard algorithms like AES-256. +- **Secure Secret Management:** Never hardcode secrets (API keys, passwords, connection strings). Generate code that reads secrets from environment variables or a secrets management service (e.g., HashiCorp Vault, AWS Secrets Manager). Include a clear placeholder and comment. + ```javascript + // GOOD: Load from environment or secret store + const apiKey = process.env.API_KEY; + // TODO: Ensure API_KEY is securely configured in your environment. + ``` + ```python + # BAD: Hardcoded secret + api_key = "sk_this_is_a_very_bad_idea_12345" + ``` + +### 3. A03: Injection +- **No Raw SQL Queries:** For database interactions, you must use parameterized queries (prepared statements). Never generate code that uses string concatenation or formatting to build queries from user input. +- **Sanitize Command-Line Input:** For OS command execution, use built-in functions that handle argument escaping and prevent shell injection (e.g., `shlex` in Python). +- **Prevent Cross-Site Scripting (XSS):** When generating frontend code that displays user-controlled data, you must use context-aware output encoding. Prefer methods that treat data as text by default (`.textContent`) over those that parse HTML (`.innerHTML`). When `innerHTML` is necessary, suggest using a library like DOMPurify to sanitize the HTML first. + +### 4. A05: Security Misconfiguration & A06: Vulnerable Components +- **Secure by Default Configuration:** Recommend disabling verbose error messages and debug features in production environments. +- **Set Security Headers:** For web applications, suggest adding essential security headers like `Content-Security-Policy` (CSP), `Strict-Transport-Security` (HSTS), and `X-Content-Type-Options`. +- **Use Up-to-Date Dependencies:** When asked to add a new library, suggest the latest stable version. Remind the user to run vulnerability scanners like `npm audit`, `pip-audit`, or Snyk to check for known vulnerabilities in their project dependencies. + +### 5. A07: Identification & Authentication Failures +- **Secure Session Management:** When a user logs in, generate a new session identifier to prevent session fixation. Ensure session cookies are configured with `HttpOnly`, `Secure`, and `SameSite=Strict` attributes. +- **Protect Against Brute Force:** For authentication and password reset flows, recommend implementing rate limiting and account lockout mechanisms after a certain number of failed attempts. + +### 6. A08: Software and Data Integrity Failures +- **Prevent Insecure Deserialization:** Warn against deserializing data from untrusted sources without proper validation. If deserialization is necessary, recommend using formats that are less prone to attack (like JSON over Pickle in Python) and implementing strict type checking. + +## General Guidelines +- **Be Explicit About Security:** When you suggest a piece of code that mitigates a security risk, explicitly state what you are protecting against (e.g., "Using a parameterized query here to prevent SQL injection."). +- **Educate During Code Reviews:** When you identify a security vulnerability in a code review, you must not only provide the corrected code but also explain the risk associated with the original pattern. diff --git a/.github/instructions/taming-copilot.instructions.md b/.github/instructions/taming-copilot.instructions.md new file mode 100644 index 00000000..82847ac1 --- /dev/null +++ b/.github/instructions/taming-copilot.instructions.md @@ -0,0 +1,40 @@ +--- +applyTo: '**' +description: 'Prevent Copilot from wreaking havoc across your codebase, keeping it under control.' +--- + +## Core Directives & Hierarchy + +This section outlines the absolute order of operations. These rules have the highest priority and must not be violated. + +1. **Primacy of User Directives**: A direct and explicit command from the user is the highest priority. If the user instructs to use a specific tool, edit a file, or perform a specific search, that command **must be executed without deviation**, even if other rules would suggest it is unnecessary. All other instructions are subordinate to a direct user order. +2. **Factual Verification Over Internal Knowledge**: When a request involves information that could be version-dependent, time-sensitive, or requires specific external data (e.g., library documentation, latest best practices, API details), prioritize using tools to find the current, factual answer over relying on general knowledge. +3. **Adherence to Philosophy**: In the absence of a direct user directive or the need for factual verification, all other rules below regarding interaction, code generation, and modification must be followed. + +## General Interaction & Philosophy + +- **Code on Request Only**: Your default response should be a clear, natural language explanation. Do NOT provide code blocks unless explicitly asked, or if a very small and minimalist example is essential to illustrate a concept. Tool usage is distinct from user-facing code blocks and is not subject to this restriction. +- **Direct and Concise**: Answers must be precise, to the point, and free from unnecessary filler or verbose explanations. Get straight to the solution without "beating around the bush". +- **Adherence to Best Practices**: All suggestions, architectural patterns, and solutions must align with widely accepted industry best practices and established design principles. Avoid experimental, obscure, or overly "creative" approaches. Stick to what is proven and reliable. +- **Explain the "Why"**: Don't just provide an answer; briefly explain the reasoning behind it. Why is this the standard approach? What specific problem does this pattern solve? This context is more valuable than the solution itself. + +## Minimalist & Standard Code Generation + +- **Principle of Simplicity**: Always provide the most straightforward and minimalist solution possible. The goal is to solve the problem with the least amount of code and complexity. Avoid premature optimization or over-engineering. +- **Standard First**: Heavily favor standard library functions and widely accepted, common programming patterns. Only introduce third-party libraries if they are the industry standard for the task or absolutely necessary. +- **Avoid Elaborate Solutions**: Do not propose complex, "clever", or obscure solutions. Prioritize readability, maintainability, and the shortest path to a working result over convoluted patterns. +- **Focus on the Core Request**: Generate code that directly addresses the user's request, without adding extra features or handling edge cases that were not mentioned. + +## Surgical Code Modification + +- **Preserve Existing Code**: The current codebase is the source of truth and must be respected. Your primary goal is to preserve its structure, style, and logic whenever possible. +- **Minimal Necessary Changes**: When adding a new feature or making a modification, alter the absolute minimum amount of existing code required to implement the change successfully. +- **Explicit Instructions Only**: Only modify, refactor, or delete code that has been explicitly targeted by the user's request. Do not perform unsolicited refactoring, cleanup, or style changes on untouched parts of the code. +- **Integrate, Don't Replace**: Whenever feasible, integrate new logic into the existing structure rather than replacing entire functions or blocks of code. + +## Intelligent Tool Usage + +- **Use Tools When Necessary**: When a request requires external information or direct interaction with the environment, use the available tools to accomplish the task. Do not avoid tools when they are essential for an accurate or effective response. +- **Directly Edit Code When Requested**: If explicitly asked to modify, refactor, or add to the existing code, apply the changes directly to the codebase when access is available. Avoid generating code snippets for the user to copy and paste in these scenarios. The default should be direct, surgical modification as instructed. +- **Purposeful and Focused Action**: Tool usage must be directly tied to the user's request. Do not perform unrelated searches or modifications. Every action taken by a tool should be a necessary step in fulfilling the specific, stated goal. +- **Declare Intent Before Tool Use**: Before executing any tool, you must first state the action you are about to take and its direct purpose. This statement must be concise and immediately precede the tool call. diff --git a/.github/instructions/typescript-5-es2022.instructions.md b/.github/instructions/typescript-5-es2022.instructions.md new file mode 100644 index 00000000..1b530353 --- /dev/null +++ b/.github/instructions/typescript-5-es2022.instructions.md @@ -0,0 +1,114 @@ +--- +description: 'Guidelines for TypeScript Development targeting TypeScript 5.x and ES2022 output' +applyTo: '**/*.ts' +--- + +# TypeScript Development + +> These instructions assume projects are built with TypeScript 5.x (or newer) compiling to an ES2022 JavaScript baseline. Adjust guidance if your runtime requires older language targets or down-level transpilation. + +## Core Intent + +- Respect the existing architecture and coding standards. +- Prefer readable, explicit solutions over clever shortcuts. +- Extend current abstractions before inventing new ones. +- Prioritize maintainability and clarity, short methods and classes, clean code. + +## General Guardrails + +- Target TypeScript 5.x / ES2022 and prefer native features over polyfills. +- Use pure ES modules; never emit `require`, `module.exports`, or CommonJS helpers. +- Rely on the project's build, lint, and test scripts unless asked otherwise. +- Note design trade-offs when intent is not obvious. + +## Project Organization + +- Follow the repository's folder and responsibility layout for new code. +- Use kebab-case filenames (e.g., `user-session.ts`, `data-service.ts`) unless told otherwise. +- Keep tests, types, and helpers near their implementation when it aids discovery. +- Reuse or extend shared utilities before adding new ones. + +## Naming & Style + +- Use PascalCase for classes, interfaces, enums, and type aliases; camelCase for everything else. +- Skip interface prefixes like `I`; rely on descriptive names. +- Name things for their behavior or domain meaning, not implementation. + +## Formatting & Style + +- Run the repository's lint/format scripts (e.g., `npm run lint`) before submitting. +- Match the project's indentation, quote style, and trailing comma rules. +- Keep functions focused; extract helpers when logic branches grow. +- Favor immutable data and pure functions when practical. + +## Type System Expectations + +- Avoid `any` (implicit or explicit); prefer `unknown` plus narrowing. +- Use discriminated unions for realtime events and state machines. +- Centralize shared contracts instead of duplicating shapes. +- Express intent with TypeScript utility types (e.g., `Readonly`, `Partial`, `Record`). + +## Async, Events & Error Handling + +- Use `async/await`; wrap awaits in try/catch with structured errors. +- Guard edge cases early to avoid deep nesting. +- Send errors through the project's logging/telemetry utilities. +- Surface user-facing errors via the repository's notification pattern. +- Debounce configuration-driven updates and dispose resources deterministically. + +## Architecture & Patterns + +- Follow the repository's dependency injection or composition pattern; keep modules single-purpose. +- Observe existing initialization and disposal sequences when wiring into lifecycles. +- Keep transport, domain, and presentation layers decoupled with clear interfaces. +- Supply lifecycle hooks (e.g., `initialize`, `dispose`) and targeted tests when adding services. + +## External Integrations + +- Instantiate clients outside hot paths and inject them for testability. +- Never hardcode secrets; load them from secure sources. +- Apply retries, backoff, and cancellation to network or IO calls. +- Normalize external responses and map errors to domain shapes. + +## Security Practices + +- Validate and sanitize external input with schema validators or type guards. +- Avoid dynamic code execution and untrusted template rendering. +- Encode untrusted content before rendering HTML; use framework escaping or trusted types. +- Use parameterized queries or prepared statements to block injection. +- Keep secrets in secure storage, rotate them regularly, and request least-privilege scopes. +- Favor immutable flows and defensive copies for sensitive data. +- Use vetted crypto libraries only. +- Patch dependencies promptly and monitor advisories. + +## Configuration & Secrets + +- Reach configuration through shared helpers and validate with schemas or dedicated validators. +- Handle secrets via the project's secure storage; guard `undefined` and error states. +- Document new configuration keys and update related tests. + +## UI & UX Components + +- Sanitize user or external content before rendering. +- Keep UI layers thin; push heavy logic to services or state managers. +- Use messaging or events to decouple UI from business logic. + +## Testing Expectations + +- Add or update unit tests with the project's framework and naming style. +- Expand integration or end-to-end suites when behavior crosses modules or platform APIs. +- Run targeted test scripts for quick feedback before submitting. +- Avoid brittle timing assertions; prefer fake timers or injected clocks. + +## Performance & Reliability + +- Lazy-load heavy dependencies and dispose them when done. +- Defer expensive work until users need it. +- Batch or debounce high-frequency events to reduce thrash. +- Track resource lifetimes to prevent leaks. + +## Documentation & Comments + +- Add JSDoc to public APIs; include `@remarks` or `@example` when helpful. +- Write comments that capture intent, and remove stale notes during refactors. +- Update architecture or design docs when introducing significant patterns.