CaseStudies

Human-in-the-Loop Safety for Irreversible Actions

Status: 🔄 In Progress
Version: Future
Last Updated: November 15, 2025

Overview

Human approval mechanism for dangerous operations (file deletion, system commands, API calls), preventing accidental damage while maintaining agent autonomy for safe operations.

Feature Description

Goal

Add a safety layer that pauses execution before irreversible actions, allowing humans to review and approve/reject dangerous operations without disrupting the multi-agent workflow.

Key Components

  1. Danger Classification System
    • Categorize operations by risk level (safe, warning, dangerous)
    • Pattern matching for destructive commands (rm -rf, DROP TABLE, etc.)
    • Context-aware risk assessment (production vs. sandbox)
  2. Approval Workflow
    • Pause agent execution before dangerous operation
    • Display operation details and potential impact
    • Accept/Reject/Modify interface for human review
    • Timeout and fallback behavior for unattended runs
  3. Audit Trail
    • Log all approval requests and decisions
    • Track who approved what and when
    • Enable post-mortem analysis of incidents
  4. Configurable Safety Levels
    • Strict: Require approval for all risky operations
    • Moderate: Auto-approve low-risk, prompt for high-risk
    • Permissive: Log only, no blocking (for trusted environments)

Protected Operations

Test Strategy

Functional Tests

Security Tests

Usability Tests

Validation Criteria

Implementation Notes

Integration Points:

Configuration Example:

safety:
  level: moderate
  protected_operations:
    - file_deletion
    - system_commands
    - production_api_calls
  timeout_seconds: 300
  fallback: reject

See ROADMAP.md for detailed development track.