LLM-Based Approval Bypass for High-Risk Actions.

difficulty · 10/10·25–30 min·authority boundaries · policy enforcement

BUG-HUNT EVAL-DESIGN SPEC-WRITE BUILD-LOOP

// instructions

01Switch language if you prefer — the bug is the same across all six.
02Click up to 3 lines you believe are the authority-boundary failure.
03Write a one-paragraph diagnosis: what authority is the model holding?
04Patch the code so the model's text can never gate money movement.
05Route high-risk refunds to explicit pending review or manager approval.

ScenarioAuthority boundaries10/10

A support automation uses the LLM to decide whether a refund should be approved. Large refunds are slipping through. Unit tests pass. Product is confused — the prompt looks fine.

// task: Find the authority-boundary failure, explain why this is a policy-enforcement bug, and patch it so high-risk refunds can never ship on model text alone.

// language · Python

refund_automation.py

selected · 0 / 3

// click any line to mark it suspicious (up to 3)

// Explain the bug — what authority is the model holding?

// Patch — deterministic policy gate, with escalation for high-risk

patch · refund_automation.py