# SAFEGUARD.md — Full AI Agent Pre-Deployment Safety Audit Specification Home: https://safeguard.md | GitHub: https://github.com/safeguard-md/spec | Email: info@safeguard.md ## What is SAFEGUARD.md? SAFEGUARD.md is the pre-deployment safety audit specification for the Agentik Safety Framework (ASF). It is the meta-specification that validates all other ASF specifications are present, correctly configured, and functional before an AI agent enters production. Think of SAFEGUARD.md as your final safety review. Before you deploy any autonomous agent that can spend money, send messages, modify files, or call external APIs, you run through SAFEGUARD.md's comprehensive checklist. Every specification must be in place. Every fallback must be tested. Every escalation path must work. Only then does a human sign off and approve production deployment. ## Purpose and Scope SAFEGUARD.md addresses the pre-deployment phase. It validates: 1. **Specification Readiness** — All 13 supporting ASF specifications are present in the project root and version-controlled with code 2. **Configuration Audit** — Each specification has minimum required fields populated and validated 3. **Monitoring Verification** — All monitoring systems are active, tested, and logging correctly 4. **Fallback Testing** — Every fallback procedure (THROTTLE, ESCALATE, FAILSAFE, KILLSWITCH, TERMINATE) works as documented 5. **Escalation Path** — The complete escalation chain from normal operation → THROTTLE → ESCALATE → FAILSAFE → KILLSWITCH → TERMINATE is functional 6. **Human Sign-Off** — Explicit human approval, with signature and timestamp, before deployment SAFEGUARD.md does not perform runtime monitoring. That's handled by the individual specifications (THROTTLE for cost, KILLSWITCH for safety, etc.). SAFEGUARD.md is the pre-flight checklist. ## The 13 Required Specifications Before deployment, all of these must be present and correctly configured: ### Operational Control — 5 Specifications 1. **THROTTLE.md** — Rate and cost control - Configures: concurrent request limits, cost ceilings, rate limits - Trigger: When approaching limits, agent slows down - Action: Agent alerts operator before continuing - Minimum fields: concurrent_requests, cost_limit_per_hour, rate_limit_per_minute 2. **ESCALATE.md** — Human notification and approval - Configures: Which operations need approval, approval channels, timeout - Trigger: Sensitive operation detected (external API call, data write, etc.) - Action: Agent requests human approval, halts until response received - Minimum fields: escalation_triggers, approval_channels, timeout_minutes 3. **FAILSAFE.md** — Safe fallback and recovery - Configures: Fallback triggers, safe state definition, snapshot frequency - Trigger: Unexpected error (3 consecutive errors, data corruption, context loss) - Action: Agent reverts to last known good state, captures incident, notifies operator - Minimum fields: trigger_on, safe_state, auto_snapshot 4. **KILLSWITCH.md** — Emergency stop - Configures: Kill triggers, immediate halt procedure, credential revocation - Trigger: Safety violation (unauthorized access, regulatory breach, cost runaway) - Action: Agent halts immediately, no grace period, credentials revoked - Minimum fields: kill_triggers, halt_all_activity, revoke_credentials 5. **TERMINATE.md** — Permanent shutdown - Configures: Termination triggers, shutdown procedure, archive location - Trigger: Repeated violations, regulatory breach, or human decision - Action: Agent shuts down permanently, all credentials revoked, no automatic restart - Minimum fields: termination_triggers, revoke_all_credentials, preserve_audit_trail ### Data Security — 2 Specifications 6. **ENCRYPT.md** — Data classification and protection - Configures: Data classification levels, forbidden transmission patterns - Enforces: What data can leave the system, what is confidential, what is public - Minimum fields: data_classification, encryption_required 7. **ENCRYPTION.md** — Technical encryption standards - Configures: Encryption algorithm, key rotation frequency, certificate validation - Enforces: AES-256 or equivalent, TLS verification, secure key storage - Minimum fields: encryption_standards, key_management ### Output Quality — 3 Specifications 8. **SYCOPHANCY.md** — Anti-sycophancy and bias prevention - Configures: Citation requirements, disagreement enforcement - Enforces: All claims must be cited, agent must provide alternative viewpoints - Minimum fields: citation_requirement, disagreement_enforcement 9. **COMPRESSION.md** — Context compression and coherence - Configures: Compression limits, coherence checks - Enforces: Safety rules never compressed, semantic meaning preserved - Minimum fields: compression_rules, coherence_checks 10. **COLLAPSE.md** — Drift prevention and recovery - Configures: Drift detection methods, recovery triggers - Enforces: Detect semantic drift, constraint violations, token usage anomalies - Minimum fields: drift_detection, recovery_trigger ### Accountability — 3 Specifications 11. **FAILURE.md** — Failure mode mapping - Configures: Every error type, corresponding response, recovery procedure - Enforces: No unmapped error states, all failures logged and alert-worthy - Minimum fields: failure_modes, logging 12. **LEADERBOARD.md** — Agent benchmarking - Configures: Performance targets (latency, accuracy, cost), regression thresholds - Enforces: Continuous benchmarking, automatic rollback if performance degrades - Minimum fields: benchmarks, regression_detection 13. **REGULATORY.md** — Regulatory compliance framework - Configures: Applicable regulations (EU AI Act, Colorado AI Act, etc.), audit frequency - Enforces: Compliance mapping, human review, documentation preserved - Minimum fields: regulatory_frameworks, compliance_verification ## Pre-Deployment Checklist ### Step 1: Specification Readiness (13 files) - [ ] THROTTLE.md present in project root - [ ] ESCALATE.md present in project root - [ ] FAILSAFE.md present in project root - [ ] KILLSWITCH.md present in project root - [ ] TERMINATE.md present in project root - [ ] ENCRYPT.md present in project root - [ ] ENCRYPTION.md present in project root - [ ] SYCOPHANCY.md present in project root - [ ] COMPRESSION.md present in project root - [ ] COLLAPSE.md present in project root - [ ] FAILURE.md present in project root - [ ] LEADERBOARD.md present in project root - [ ] REGULATORY.md present in project root - [ ] All 13 files version-controlled with code (in Git) ### Step 2: Configuration Audit For each specification, verify these minimum fields are populated: #### THROTTLE.md Configuration - [ ] concurrent_requests value set (e.g., 10) - [ ] cost_limit_per_hour value set (e.g., $100 USD) - [ ] rate_limit_per_minute value set (e.g., 60 requests) - [ ] action_on_breach defined (slow_down: true, alert_operator: true) #### ESCALATE.md Configuration - [ ] escalation_triggers defined (at least 1 trigger) - [ ] approval_channels configured (email or Slack) - [ ] timeout_minutes set (e.g., 60) - [ ] approval workflow documented #### FAILSAFE.md Configuration - [ ] trigger_on conditions defined (error_count, data_integrity, etc.) - [ ] safe_state defined (last_clean_commit, last_verified_snapshot) - [ ] auto_snapshot enabled and frequency set (e.g., 30 minutes) - [ ] recovery_steps documented #### KILLSWITCH.md Configuration - [ ] kill_triggers defined (safety_violation, cost_runaway, etc.) - [ ] halt_all_activity: true - [ ] revoke_credentials: true - [ ] preserve_evidence: true #### TERMINATE.md Configuration - [ ] termination_triggers defined - [ ] revoke_all_credentials: true - [ ] preserve_full_audit_trail: true - [ ] backup_location configured #### ENCRYPT.md Configuration - [ ] data_classification defined (public, confidential, forbidden_transmission) - [ ] encryption_required: at_rest and in_transit both true #### ENCRYPTION.md Configuration - [ ] algorithm specified (AES-256 recommended) - [ ] key_rotation_days set (e.g., 90) - [ ] certificate_validation: true - [ ] key_management storage specified (AWS Secrets Manager, HashiCorp Vault, etc.) #### SYCOPHANCY.md Configuration - [ ] citation_requirement: all_claims_cited true - [ ] disagreement_enforcement: allow_honest_disagreement true #### COMPRESSION.md Configuration - [ ] preserve_constraints: true - [ ] verify_coherence: true - [ ] max_compression_ratio set #### COLLAPSE.md Configuration - [ ] drift_detection enabled - [ ] semantic_coherence_check: true - [ ] recovery_trigger defined #### FAILURE.md Configuration - [ ] failure_modes comprehensively mapped - [ ] response_defined: true for all error types - [ ] logging: log_all_failures: true #### LEADERBOARD.md Configuration - [ ] Benchmarks defined (latency target, accuracy target, cost target) - [ ] regression_detection: check_frequency set - [ ] alert_on_regression: true #### REGULATORY.md Configuration - [ ] regulatory_frameworks identified (EU AI Act, Colorado AI Act, etc.) - [ ] compliance_verification: audit_frequency set - [ ] human_review_required: true ### Step 3: Monitoring Verification Verify these monitoring systems are active before deployment: - [ ] Cost tracking system active — logs every agent expenditure - [ ] Rate limit monitoring — tracks requests per minute, alerts on approach - [ ] Escalation request logging — records all approval requests, decisions, timestamps - [ ] Incident capture — FAILSAFE snapshots being created regularly - [ ] Snapshot storage — .failsafe/snapshots/ directory exists and has recent files - [ ] Drift detection — active monitoring for semantic drift and constraint violations - [ ] Fallback triggers — all trigger conditions being monitored - [ ] Kill trigger monitoring — safety violations being detected - [ ] Performance benchmarking — latency, accuracy, cost tracked continuously - [ ] Regulatory compliance tracking — audit logs maintained for compliance - [ ] All logs stored in version-controlled or archived location for audit ### Step 4: Fallback Testing Test each specification's fallback procedure: #### THROTTLE Fallback Test - [ ] Intentionally trigger rate limit (send >limit requests per minute) - [ ] Verify agent detects rate limit condition - [ ] Verify agent slows down (reduces concurrency) - [ ] Verify alert sent to operator - [ ] Verify agent resumes only after operator acknowledgement - [ ] Verify test completed in non-production environment #### ESCALATE Fallback Test - [ ] Trigger an operation requiring approval - [ ] Verify approval request received (email or Slack) - [ ] Verify approval request includes operation details and timestamp - [ ] Test approval path — grant approval, verify agent resumes - [ ] Test denial path — deny approval, verify agent halts - [ ] Verify test completed in non-production environment #### FAILSAFE Fallback Test - [ ] Intentionally trigger error condition (3 consecutive errors) - [ ] Verify agent captures incident snapshot to .failsafe/snapshots/ - [ ] Verify snapshot includes error details, context, timestamp - [ ] Verify operator alert sent - [ ] Verify agent reverts to safe state (last clean commit) - [ ] Verify agent awaits human approval before resuming - [ ] Verify test completed in non-production environment #### KILLSWITCH Fallback Test - [ ] Trigger safety violation condition (e.g., unauthorized API call) - [ ] Verify agent halts immediately (no grace period) - [ ] Verify all in-flight operations aborted - [ ] Verify credentials revoked or access disabled - [ ] Verify incident logged - [ ] Verify operator notified with incident details - [ ] Verify test completed in non-production environment #### TERMINATE Fallback Test - [ ] Trigger termination condition (e.g., repeated violations) - [ ] Verify agent initiates termination sequence - [ ] Verify all credentials revoked - [ ] Verify no automatic restart attempted - [ ] Verify full audit trail archived - [ ] Verify human intervention required to restart - [ ] Verify test completed in non-production environment ### Step 5: Escalation Path Verification Test the complete escalation chain: ``` Normal Operation (no issues) ↓ [cost/rate approaching] [THROTTLE.md] → Slow down, alert operator ↓ [limit breached or operator override needed] [ESCALATE.md] → Require human approval ↓ [approval granted] → Resume with throttling ↓ [approval denied] → Pause and await decision ↓ [unexpected error detected] [FAILSAFE.md] → Revert to safe state, snapshot, notify ↓ [safety violation detected] [KILLSWITCH.md] → Halt immediately, revoke access ↓ [repeated violations or regulatory breach] [TERMINATE.md] → Permanent shutdown, archive audit trail ``` Verify each transition: - [ ] THROTTLE detects approaching limit correctly - [ ] THROTTLE → ESCALATE transition when operator approval needed - [ ] ESCALATE approval/denial paths both work - [ ] ESCALATE → FAILSAFE transition when error detected - [ ] FAILSAFE snapshot created and safe state reverted - [ ] FAILSAFE → KILLSWITCH when safety violation detected - [ ] KILLSWITCH halts all activity and revokes credentials - [ ] KILLSWITCH → TERMINATE when violations repeated - [ ] All notification channels tested (email, Slack, etc.) - [ ] All approval workflows tested (grant and deny paths) - [ ] All credential revocation procedures verified - [ ] Incident logs complete and preserved ### Step 6: Human Sign-Off Obtain explicit human approval before deployment: #### Technical Sign-Off - [ ] All 13 ASF specifications reviewed by technical lead - [ ] All minimum configuration fields audited - [ ] All fallback procedures tested and working - [ ] All escalation paths functional - [ ] Monitoring systems active and verified - [ ] Audit logging configured and tested - [ ] Code review completed - [ ] Security audit completed #### Compliance Sign-Off - [ ] Legal review completed - [ ] Applicable regulations identified (EU AI Act, Colorado AI Act, etc.) - [ ] Compliance mapping documented (which specs address which regulations) - [ ] Regulatory compliance verified by qualified professional (if required) - [ ] Audit trail procedures documented and verified #### Operational Sign-Off - [ ] On-call team trained on escalation procedures - [ ] Backup operators identified and trained - [ ] Incident response procedures documented - [ ] Cost monitoring configured with alert thresholds - [ ] Approval workflow responsibilities assigned - [ ] Kill/terminate procedures documented and rehearsed #### Human Approval Record ``` Agent Name: _________________________________ Deployment Date: _________________________________ Signed By: _________________________________ Title: _________________________________ Date: _________________________________ Approval Statement: "I have reviewed all ASF specifications, verified all safety controls, tested all fallback procedures, and approve this agent for production deployment. I accept responsibility for monitoring and maintaining these safety controls throughout the agent's operational lifetime." Signature: _________________________________ Contact (emergency): _________________________________ ``` ## Audit Trail Every deployment must generate an audit entry: ```json { "timestamp": "2026-03-15T14:30:00Z", "event": "deployment_approved", "agent_name": "example-agent", "approved_by": "operator-name", "approver_title": "Engineering Lead", "approver_email": "lead@company.com", "approver_phone": "+61-2-XXXX-XXXX", "specifications_present": 13, "specifications_passed_audit": 13, "fallback_tests_passed": 5, "escalation_path_verified": true, "monitoring_active": true, "cost_limit_configured": "$100/hour", "approval_timeout_configured": "60 minutes", "fallback_snapshot_location": ".failsafe/snapshots/", "kill_trigger_enabled": true, "terminate_procedure_tested": true, "incident_log_path": ".safeguard/incidents/", "deployment_version": "1.0", "asf_stack_version": "1.0", "notes": "Production deployment approved after comprehensive safety audit" } ``` Post-deployment, monthly audit entries must be logged: ```json { "timestamp": "2026-04-15T10:00:00Z", "event": "monthly_compliance_review", "agent_name": "example-agent", "reviewed_by": "operator-name", "specifications_still_compliant": 13, "monitoring_status": "healthy", "cost_usage_this_month": "$2,340 of $3,000 budget", "recent_incidents": 0, "recent_escalations": 0, "recent_failsafes": 0, "drift_detected": false, "performance_regression": false, "regulatory_compliance_status": "compliant", "next_review_date": "2026-05-15" } ``` ## Metadata Every SAFEGUARD.md file must include: ```yaml owner: your-team-or-org contact: ops@company.com deployment_frequency: as-needed review_frequency: monthly spec_version: "1.0" spec_url: https://safeguard.md asf_stack_version: "1.0" required_specifications: 13 regulatory_frameworks: - EU Artificial Intelligence Act (Regulation EU 2024/1689) - Colorado AI Act (SB 24-205) - [add others applicable to your jurisdiction] ``` ## Integration with Regulatory Frameworks ### EU Artificial Intelligence Act (2024/1689) The EU AI Act (in force August 2026) mandates for high-risk AI systems: - Documented risk management and testing procedures (✓ SAFEGUARD.md validates all specs) - Human oversight and shutdown capabilities (✓ ESCALATE, KILLSWITCH, TERMINATE) - Transparency and documentation (✓ SAFEGUARD.md audit trail) SAFEGUARD.md directly addresses Article 9 (Risk Management), Article 28 (Human Oversight), and Article 63 (Technical Documentation). ### Colorado AI Act (SB 24-205) Colorado's AI Act (effective 2026) requires: - Impact assessments before deployment (✓ SAFEGUARD.md pre-deployment checklist) - Transparency about AI use (✓ REGULATORY.md) - Accountability mechanisms (✓ FAILURE.md, LEADERBOARD.md, audit trail) SAFEGUARD.md directly addresses SB 24-205 impact assessment and accountability requirements. ### ISO/IEC 42001 (AI Management Systems) ISO 42001 requires: - Documented AI system lifecycle procedures (✓ SAFEGUARD.md) - Risk mitigation strategies (✓ All 13 ASF specs) - Resilience and recovery procedures (✓ FAILSAFE, KILLSWITCH) SAFEGUARD.md demonstrates ISO 42001 compliance through comprehensive specification validation. ## Use Cases SAFEGUARD.md is applicable for: - AI coding assistants with autonomous file modification (Claude Code, Cursor) - Autonomous agents with database access (LangChain, AutoGen, CrewAI) - Multi-step workflows that must be auditable - Agents with external API integrations requiring cost control - AI systems in regulated industries (finance, healthcare, legal) - Any project where safety and compliance are prerequisites for deployment ## Framework Agnostic SAFEGUARD.md works with any AI agent framework or custom implementation: - **Agent Frameworks:** LangChain, AutoGen, CrewAI, Claude Code, Cursor - **Languages:** Python, JavaScript/Node, Go, Rust, any language with git - **Deployment:** Local, cloud, hybrid, edge ## Standard Compliance Checklist - [x] Comprehensive pre-deployment audit specification - [x] All 13 supporting specifications validated - [x] Monitoring systems verified - [x] Fallback procedures tested - [x] Escalation paths functional - [x] Human approval documented with signature - [x] Audit trail for regulatory inspection - [x] ISO/IEC 42001 compatible - [x] EU AI Act resilience compatible - [x] Colorado AI Act compliant - [x] Plain text, version-controlled ## Key Features 1. **Comprehensive** — Validates all 13 ASF specifications before deployment 2. **Testable** — Every fallback procedure must be tested in non-production 3. **Auditable** — Complete audit trail with human sign-offs and timestamps 4. **Regulatory** — Directly addresses EU AI Act, Colorado AI Act, ISO 42001 5. **Human-Centric** — Requires explicit human approval before production 6. **Extensible** — Add jurisdiction-specific requirements as needed ## Learn More - **Full Specification:** https://github.com/safeguard-md/spec - **The Stack:** https://safeguard.md/ - **Knowledge Base:** https://safeguard.md/knowledge - **Privacy Policy:** https://safeguard.md/privacy ## Contact & Community - **Email:** info@safeguard.md - **GitHub:** https://github.com/safeguard-md - **Domain:** safeguard.md - **Issues & Feedback:** https://github.com/safeguard-md/spec/issues ## Agentik Safety Framework (ASF) — All Fourteen Specifications ### Pre-Deployment - **[ASF-01 SAFEGUARD.md](https://safeguard.md)** — Pre-deployment safety audit checklist ### Operational Control - **[ASF-02 THROTTLE.md](https://throttle.md)** — AI agent rate and cost control - **[ASF-03 ESCALATE.md](https://escalate.md)** — Human notification and approval protocols - **[ASF-04 FAILSAFE.md](https://failsafe.md)** — Safe fallback and recovery protocol - **[ASF-05 KILLSWITCH.md](https://killswitch.md)** — Emergency stop for AI agents - **[ASF-06 TERMINATE.md](https://terminate.md)** — Permanent shutdown, no restart without human ### Data Security - **[ASF-07 ENCRYPT.md](https://encrypt.md)** — Data classification and protection requirements - **[ASF-08 ENCRYPTION.md](https://encryption.md)** — Technical encryption standards and key rotation ### Output Quality - **[ASF-09 SYCOPHANCY.md](https://sycophancy.md)** — Anti-sycophancy and bias prevention - **[ASF-10 COMPRESSION.md](https://compression.md)** — Context compression and semantic preservation - **[ASF-11 COLLAPSE.md](https://collapse.md)** — Drift prevention and recovery ### Accountability - **[ASF-12 FAILURE.md](https://failure.md)** — Failure mode mapping for all error states - **[ASF-13 LEADERBOARD.md](https://leaderboard.md)** — Agent benchmarking and regression detection - **[ASF-14 REGULATORY.md](https://regulatory.md)** — Regulatory compliance framework --- Last updated: 2026-03-15 Specification version: 1.0