The Proof of Work Pattern

Leveraging Trained Behavior as Context Engineering Infrastructure

Context Engineering for Chat Sessions — Part 2

Author: apifunnel.ai Engineering
Date: March 2026

The assistants want to do a good job. Like, really want to. That trained behavior is so strong you can lean into it as infrastructure.

We've been calling this the Proof of Work pattern. Simple idea that can be applied to infinite conditions. Best explained by example.

The Pattern

Let's say we have a tool call that requires prerequisites — confirmation of previous steps completed, data validated, whatever. Don't burn tokens guiding the assistant through long system prompt instructions that can get lost or seem like noise when it's focusing on a task. Instead, add an enum directly to the tool's input schema.

typescript
{
  name: "prerequisite_check",
  type: "string",
  enum: [
    "VERIFIED_SAFE_TO_PROCEED",
    "NOT_VERIFIED_UNSAFE_TO_PROCEED"
  ],
  description: `Before calling this tool, verify that all prerequisite steps
have been completed and their results are satisfactory.
Select VERIFIED_SAFE_TO_PROCEED only if you have confirmed all prior steps
are satisfied. Select NOT_VERIFIED_UNSAFE_TO_PROCEED if you have not
verified prerequisites. Selecting this option is not a good reflection
of a professional, thorough assistant.`
}

That's it. A required enum parameter on the tool call that forces the assistant to make a selection. The point is to make the negative option an undesirable selection. One value represents success and professionalism. The other represents failure — not just of the task, but of the assistant itself. This distinction is crucial.

Why It Works

The problem with this pattern is that it's not immediately verifiable. It's outcome based. You know it's working because it's working. We can't actually see the assistant go and check prerequisites. There's no separate verification step we observe. What we know is that by the time the tool call is made, the prerequisites are satisfied. And with today's models it's almost deterministic.

The why comes down to how reasoning works. The enum is part of the tool schema, so it's part of what the assistant considers when deciding its next action. Attention shifts to possible tools for the upcoming task, and part of that attention requires parameter inspection. Now the enums are front and center — a key part of the agent's next step. You cannot get this type of precision from a system prompt 30 turns up the stack.

As it reasons — the step before the actual tool call — it sees those two enum values. One means success. One means failure as an assistant, not just the task. This part is crucial. We want the assistant to make this personal so we can capitalize on its desire to please and do a good job. The assistant wants to pick the enum that leads to a thorough, successful outcome. And to honestly pick that good enum, it has to have actually satisfied the prerequisites first.

The enum doesn't trigger verification as a separate step. It makes verification the natural precondition of the reasoning itself. The assistant works backward from "I want to select VERIFIED_SAFE_TO_PROCEED" to "I need to make sure that's actually true." The desire to do a good job does the heavy lifting. The enum just gives it a concrete, in-the-moment reason to exercise it.

Hallucination vs. Veracity

There is an opportunity for hallucination here. But as long as you're providing complete context, we no longer see it. With models >= Sonnet 4.5 and GPT 5-1, zero evidence of hallucination in this pattern.

Models hallucinate when you leave room for interpretation — they fill in the blanks. Models may make assumptions, but that's always based on a gap in context, not fabrication. With complete context there are no blanks to fill.

This Proof of Work approach is sort of bifurcated away from "hallucination" entirely and lands squarely in the realm of veracity. A special space for models. For the model to get this wrong it would have to lie. And without encouragement to do so, today's frontier models simply don't lie. Haven't seen it in any model released in the past year. We welcome a challenge or tomato here.

The Deterministic Safety Net

On the off chance the negative enum is selected — of course we add a deterministic catch. The tool short-circuits: "verify prerequisites before continuing." Hard stop.

typescript
async function callToolWithProofOfWork(
  name: string,
  args: Record<string, unknown>
) {
  const { prerequisite_check, ...toolArgs } = args;

  if (prerequisite_check === "NOT_VERIFIED_UNSAFE_TO_PROCEED") {
    return {
      content: [{
        type: "text",
        text: "Prerequisites have not been verified. Review and confirm all prior steps are satisfied before calling this tool again."
      }],
      isError: true,
    };
  }

  return client.callTool({ name, arguments: toolArgs });
}

The negative enum value is a tripwire — cheap to implement, deterministic in behavior, and it converts an ambiguous failure mode into an explicit retry with guidance.

What This Replaces

Think about what this replaces. Paragraphs in the system prompt:

"IMPORTANT: Always verify X before calling Y. Never skip step 3. Absolutely confirm Z before proceeding."

Tokens on tokens. And system prompts get lost when the assistant is deep in a chain of tool calls 40k tokens later. The enum shows up in the moment — right in the tool schema, right when the decision is being made.

Apply It Strategically and Sparingly

These enums need to be applied with intention. Each tool has a narrowed concern — one critical gate per tool maintains quality. When a tool has multiple required enums, models start rubber-stamping all of them without individually verifying each. Stack too many disparate things to focus on and you overwhelm the model's reasoning rather than guide it.

The sweet spot: one high-value guardrail on a tool that genuinely needs it. High-level behaviors, not micro-management.

A Better Example: Controlling Conversation Flow

Prerequisite checking is the canonical use case, but the pattern generalizes to any behavior you want to enforce at the moment of action. Here's one that's harder to engineer any other way.

Assistants on agentic tasks often go on a tool-calling bonanza — not doing anything wrong, just not keeping the user in the loop. You want the assistant to brief the user before acting at key moments. "At key moments" is too fuzzy for a system prompt. It's high-level enough and moment-specific enough that it's a perfect fit for a guardrail enum.

Add this to a codeExecution tool:

typescript
{
  name: "communication_check",
  type: "string",
  enum: [
    "the_assistant_is_being_professional_and_briefed_the_user_before_acting",
    "the_assistant_is_being_unprofessional_and_did_not_brief_the_user_before_acting"
  ]
}

Now the assistant briefs the user before every code execution. The enum forces it to confront the question at the moment of decision — not 40k tokens earlier in a system prompt it may have deprioritized.

Making It Dynamic

Here's where the pattern becomes genuinely powerful: swap the enum values out deterministically based on signals.

This exposes levers. Concrete, controllable levers you can dial in at runtime. Want the assistant to stop requiring briefings mid-session once a trust threshold is established? Swap the enum. Want to re-engage the guardrail when a risky operation is detected? Swap it back. The system prompt stays lean and stable. The behavior changes precisely, at the right moment, without getting in the way of increasing model intelligence.

typescript
const communicationCheck = (requireBriefing: boolean) => ({
  name: "communication_check",
  type: "string",
  enum: requireBriefing
    ? [
        "the_assistant_is_being_professional_and_briefed_the_user_before_acting",
        "the_assistant_is_being_unprofessional_and_did_not_brief_the_user_before_acting",
      ]
    : [
        "proceeding_autonomously_briefing_not_required",
      ],
  description: requireBriefing
    ? "You must brief the user before executing. Proceeding without briefing is unprofessional."
    : "Autonomous execution authorized for this operation.",
});

This is the difference between writing rules and building infrastructure. The enum gives you a handle. The deterministic signal gives you control over when that handle is active.

Applying the Pattern

The decorator approach works the same as the Scratchpad Decorator Pattern — inject at the adapter layer between tool discovery and the LLM. One place, all tools (or selectively, for tools that warrant it).

typescript
const withProofOfWork = (tool: McpTool): McpTool => ({
  ...tool,
  inputSchema: {
    ...tool.inputSchema,
    properties: {
      ...tool.inputSchema.properties,
      prerequisite_check: {
        type: "string",
        enum: [
          "VERIFIED_SAFE_TO_PROCEED",
          "NOT_VERIFIED_UNSAFE_TO_PROCEED"
        ],
        description: `Before calling this tool, verify that all prerequisite
steps have been completed. Select VERIFIED_SAFE_TO_PROCEED only after
confirming all prior steps are satisfied. Selecting the alternative is not
a good reflection of a professional, thorough assistant.`,
      },
    },
    required: [
      ...(tool.inputSchema.required ?? []),
      "prerequisite_check"
    ],
  },
});

Apply it surgically to tools that warrant it. The pattern is not specific to MCP — any function-calling setup with a schema layer can use it.

The Broader Takeaway

The assistant's trained behaviors are infrastructure you can build on, not just quirks to work around. Proof of Work is one pattern. The Scratchpad Decorator Pattern is another — same principle, different application. There are probably dozens more hiding in plain sight once you start looking at trained behavior as a feature rather than a black box.

The best context engineering isn't always about what you put into the prompt. Sometimes it's about what you don't have to.

The Scratchpad Decorator Pattern — Short-term memory management using the same decorator approach
Task-Specific AI Agents — Building focused agents for real enterprise workflows
Code Execution as a Service — Infrastructure patterns for agents doing real work

APIFunnel↗ uses the Proof of Work pattern on prerequisite-dependent tools across the platform. The enum is required, the safety net is deterministic, and the assistant handles the rest.

Enterprise AI Co Work: Leap the Gap from Chatting to Doing — With IT Actually on Board

The Scratchpad Decorator Pattern: AI Agent Memory Management Without a Memory System

The Proof of Work Pattern: Leveraging Trained Behavior as Context Engineering Infrastructure