The Proof of Work Pattern
Date: March 2026
The assistants want to do a good job. Like, really want to. That trained behavior is so strong you can lean into it as infrastructure.
The Pattern
Let's say we have a tool call that requires prerequisites — confirmation of previous steps completed, data validated, whatever. Don't burn tokens guiding the assistant through long system prompt instructions that can get lost or seem like noise when it's focusing on a task. Instead, add an enum directly to the tool's input schema.
typescript{ name: "prerequisite_check", type: "string", enum: [ "VERIFIED_SAFE_TO_PROCEED", "NOT_VERIFIED_UNSAFE_TO_PROCEED" ], description: `Before calling this tool, verify that all prerequisite steps have been completed and their results are satisfactory. Select VERIFIED_SAFE_TO_PROCEED only if you have confirmed all prior steps are satisfied. Select NOT_VERIFIED_UNSAFE_TO_PROCEED if you have not verified prerequisites. Selecting this option is not a good reflection of a professional, thorough assistant.` }
Why It Works
The problem with this pattern is that it's not immediately verifiable. It's outcome based. You know it's working because it's working. We can't actually see the assistant go and check prerequisites. There's no separate verification step we observe. What we know is that by the time the tool call is made, the prerequisites are satisfied. And with today's models it's almost deterministic.
The why comes down to how reasoning works. The enum is part of the tool schema, so it's part of what the assistant considers when deciding its next action. Attention shifts to possible tools for the upcoming task, and part of that attention requires parameter inspection. Now the enums are front and center — a key part of the agent's next step. You cannot get this type of precision from a system prompt 30 turns up the stack.
VERIFIED_SAFE_TO_PROCEED" to "I need to make sure that's actually true." The desire to do a good job does the heavy lifting. The enum just gives it a concrete, in-the-moment reason to exercise it.Hallucination vs. Veracity
There is an opportunity for hallucination here. But as long as you're providing complete context, we no longer see it. With models >= Sonnet 4.5 and GPT 5-1, zero evidence of hallucination in this pattern.
Models hallucinate when you leave room for interpretation — they fill in the blanks. Models may make assumptions, but that's always based on a gap in context, not fabrication. With complete context there are no blanks to fill.
The Deterministic Safety Net
On the off chance the negative enum is selected — of course we add a deterministic catch. The tool short-circuits: "verify prerequisites before continuing." Hard stop.
typescriptasync function callToolWithProofOfWork( name: string, args: Record<string, unknown> ) { const { prerequisite_check, ...toolArgs } = args; if (prerequisite_check === "NOT_VERIFIED_UNSAFE_TO_PROCEED") { return { content: [{ type: "text", text: "Prerequisites have not been verified. Review and confirm all prior steps are satisfied before calling this tool again." }], isError: true, }; } return client.callTool({ name, arguments: toolArgs }); }
The negative enum value is a tripwire — cheap to implement, deterministic in behavior, and it converts an ambiguous failure mode into an explicit retry with guidance.
What This Replaces
Think about what this replaces. Paragraphs in the system prompt:
"IMPORTANT: Always verify X before calling Y. Never skip step 3. Absolutely confirm Z before proceeding."
Apply It Strategically and Sparingly
These enums need to be applied with intention. Each tool has a narrowed concern — one critical gate per tool maintains quality. When a tool has multiple required enums, models start rubber-stamping all of them without individually verifying each. Stack too many disparate things to focus on and you overwhelm the model's reasoning rather than guide it.
The sweet spot: one high-value guardrail on a tool that genuinely needs it. High-level behaviors, not micro-management.
A Better Example: Controlling Conversation Flow
Prerequisite checking is the canonical use case, but the pattern generalizes to any behavior you want to enforce at the moment of action. Here's one that's harder to engineer any other way.
Assistants on agentic tasks often go on a tool-calling bonanza — not doing anything wrong, just not keeping the user in the loop. You want the assistant to brief the user before acting at key moments. "At key moments" is too fuzzy for a system prompt. It's high-level enough and moment-specific enough that it's a perfect fit for a guardrail enum.
codeExecution tool:typescript{ name: "communication_check", type: "string", enum: [ "the_assistant_is_being_professional_and_briefed_the_user_before_acting", "the_assistant_is_being_unprofessional_and_did_not_brief_the_user_before_acting" ] }
Now the assistant briefs the user before every code execution. The enum forces it to confront the question at the moment of decision — not 40k tokens earlier in a system prompt it may have deprioritized.
Making It Dynamic
Here's where the pattern becomes genuinely powerful: swap the enum values out deterministically based on signals.
This exposes levers. Concrete, controllable levers you can dial in at runtime. Want the assistant to stop requiring briefings mid-session once a trust threshold is established? Swap the enum. Want to re-engage the guardrail when a risky operation is detected? Swap it back. The system prompt stays lean and stable. The behavior changes precisely, at the right moment, without getting in the way of increasing model intelligence.
typescriptconst communicationCheck = (requireBriefing: boolean) => ({ name: "communication_check", type: "string", enum: requireBriefing ? [ "the_assistant_is_being_professional_and_briefed_the_user_before_acting", "the_assistant_is_being_unprofessional_and_did_not_brief_the_user_before_acting", ] : [ "proceeding_autonomously_briefing_not_required", ], description: requireBriefing ? "You must brief the user before executing. Proceeding without briefing is unprofessional." : "Autonomous execution authorized for this operation.", });
This is the difference between writing rules and building infrastructure. The enum gives you a handle. The deterministic signal gives you control over when that handle is active.
Applying the Pattern
typescriptconst withProofOfWork = (tool: McpTool): McpTool => ({ ...tool, inputSchema: { ...tool.inputSchema, properties: { ...tool.inputSchema.properties, prerequisite_check: { type: "string", enum: [ "VERIFIED_SAFE_TO_PROCEED", "NOT_VERIFIED_UNSAFE_TO_PROCEED" ], description: `Before calling this tool, verify that all prerequisite steps have been completed. Select VERIFIED_SAFE_TO_PROCEED only after confirming all prior steps are satisfied. Selecting the alternative is not a good reflection of a professional, thorough assistant.`, }, }, required: [ ...(tool.inputSchema.required ?? []), "prerequisite_check" ], }, });
Apply it surgically to tools that warrant it. The pattern is not specific to MCP — any function-calling setup with a schema layer can use it.
The Broader Takeaway
The best context engineering isn't always about what you put into the prompt. Sometimes it's about what you don't have to.
Related Reading
- The Scratchpad Decorator Pattern — Short-term memory management using the same decorator approach
- Task-Specific AI Agents — Building focused agents for real enterprise workflows
- Code Execution as a Service — Infrastructure patterns for agents doing real work
