Agentic Workflows: A Purist Approach

"Agentic workflow" gets used loosely. Sometimes it means a workflow built by an agent. Sometimes a workflow with an LLM call at one or more nodes. Both are fair. Neither captures what happens when the agent is the workflow.

This is a design pattern we run in production. Not theoretical. It powers recurring operational tasks and the scheduling infrastructure behind our meeting assistant demo. Simple enough to explain in a few paragraphs, reliable enough to run unattended.

The Spectrum

Traditional workflow engine — N8N, Zapier, Airflow. Define nodes, connect edges, compile a static graph. Runs the same way every time. Environment changes or an unanticipated exception? The workflow breaks. Edit the graph, redeploy.

Middle of the spectrum: workflows that embed an LLM at certain nodes. The graph is still static, but individual steps can reason. Useful. Handles variability at specific decision points without rewriting the flow.

Far end — the purist end — the agent is the workflow. No graph. No compiled nodes. The entire execution is a live reasoning session. The assistant receives a context memo, assesses the current state, takes action, and either schedules its next iteration or signals completion.

That last one is what we mean.

Wake-Up Calls

The implementation is straightforward. We call them wake-up calls.

User works through a complex, multi-step task with an assistant. Human guidance along the way — corrections, clarifications, refinements. The assistant learns which data sources matter, what format the output should take, which edge cases to watch for, who to notify and when.

Once it's working, the user says: "Schedule a wake-up call for this. Run it every Monday at 8am."

One instruction. Live chat session becomes a recurring autonomous workflow.

Under the hood:

The assistant calls schedule_wake_up_call with a context memo — natural-language instructions, written by the assistant for its future self.
At the scheduled time, the scheduler fires and delivers the memo into a new turn of the same conversation thread.
The assistant picks up the thread. Full history available — every correction, every refinement, every lesson from the original session and all prior iterations.
Does the work. Assesses conditions. Takes actions across connected systems.
Schedules the next wake-up, or delivers a final response. Binary exit.

No DAG compiler. No node editor. No schema definitions. The workflow lives in the conversation history and the context memo.

The Intelligence Is in the Workflow

Traditional workflow engine: intelligence lives in the nodes, connections between them are static. Step 3 encounters something step 2 didn't anticipate? Workflow breaks.

This pattern: the workflow is intelligence. The assistant reasons about full context on every iteration. Original instructions, human corrections, results of prior runs, current environment state. Decisions made in real time, for this specific iteration, accounting for whatever conditions exist right now.

We call these intelligent workflows. Not because they use an AI model somewhere in the stack — because the execution surface itself has judgment. It handles exceptions it's never seen before. Same way you would. Read the situation, decide what to do.

In-Context Learning as Memory

The conversation thread is the memory. This matters more than it seems.

Third run of a workflow — full transcript of runs one and two in context. Something went wrong on run two? Data source returned an unexpected format, notification went to the wrong channel, calculation was off? That mistake and its correction are right there in the history.

The assistant doesn't repeat mistakes visible in its own conversation. In-context learning, purest form. No fine-tuning, no external memory system, no retrieval pipeline. The history is literally in the stack. Past corrections persist in the same chat session and inform every subsequent execution.

Each iteration makes the next one better. Not through a training loop — through the simplest mechanism available: the assistant reads what happened last time and adjusts.

This is the purest form of memory. Because it's currently in the stack.

The Trade-Offs

Less reliable than compiled code — for now. That's a fact worth stating plainly.

A Python script reconciling two datasets produces the same output every time given the same input. An assistant-driven workflow has a probability distribution. Almost certainly does the right thing. Occasionally does the right thing a slightly different way. Rarely does something unexpected.

The reliability gap is closing, and it's closing fast. Every model release pushes the distribution tighter. Tasks that required careful prompt engineering a year ago work with plain English now. Judgment calls that needed explicit guardrails are handled correctly by default. The trend line is clear.

More expensive per run — also for now. Delivering complete outcomes as a service is not cheap — there's an LLM reasoning through every iteration, not a script executing in milliseconds. But the savings don't come from the cost of the run. They come from redirecting human effort. The agent delivers outcomes, not summaries. The human reviews results instead of doing the work. That math works out quickly for most recurring operational tasks.

Both concerns are real today. Both are "for now" concerns. We're moving at the speed of AI here — and the architecture is out of the way.

Because there's no compiled flow logic between the model and the work, nothing limits increasing model capability. When the model gets smarter, the workflows get smarter. Automatically. No redeployment, no graph updates, no code changes. We're not constructing infrastructure around the flow. The decisions are made real-time by an intelligent agent, and this approach doesn't fight improving model intelligence. It rides it.

Anything that limits intelligence — rigid graphs, compiled flow logic, hardcoded decision trees — is technical debt the moment the next model ships. We'd rather have the problem of "this gets better every quarter without us touching it" than "we need to rebuild the flow logic to take advantage of the new model."

Skills vs. Wake-Up Workflows

We support both. Different tools, different jobs.

Compiled skill: logic frozen at compile time. Fast, predictable, cheap to run. Same thing every time. Right choice for high-frequency, fixed-logic, deterministic tasks.

Wake-up workflow: logic alive at runtime. The assistant reasons about what to do on every iteration. Adapts. Handles exceptions. Gets better over time. Immediately executable — no compile step, no publish, no schema design.

Natural progression: start with a wake-up workflow. Set it up in a single conversation. If the task stabilizes and the steps stop changing, graduate it to a compiled skill later. But in many cases, the adaptability is the point.

A skill is something compilable. A wake-up workflow embeds an agent as the execution engine. The flip side of having an agent that can build a skill — is having an agent that can build a workflow and embed other agents as nodes. Live execution, accounting for all conditions and exceptions.

It's like running a live chat session once a week that gets smarter each iteration.

The Taxonomy

Helps to put this on a scale.

Level 1: Agent-assisted configuration. Agent helps set up a workflow in a traditional engine. Workflow runs without AI. Output is still a static graph.

Level 2: Agent-built skills. Agent writes a compilable, deterministic module. Code runs on its own. Agent's contribution ends at authorship.

Level 3: Agent-embedded workflows. Traditional graph with LLM calls at specific nodes. Graph is static; individual nodes can reason.

Level 4: Agent-as-workflow. No graph. No compiled logic. The agent is the execution engine. Reasons about every step, every iteration, in real time.

The jump from 3 to 4 is an order of magnitude. In level 3, the agent is a component embedded in infrastructure. In level 4, the agent replaces the infrastructure. Workflow definition is a natural-language memo. Execution engine is a reasoning model. Memory is the conversation history.

When to Use This

Not the right choice for everything. Right choice when:

The task benefits from judgment and adaptation, not just execution
Steps may evolve based on changing conditions
You want something running today, not after a development cycle
Exceptions and edge cases are expected, not exceptional
The cost of an occasional imperfect run is lower than the cost of building and maintaining compiled logic

For high-frequency, fixed-logic tasks — compiled skill. For the recurring operational work that occupies most of a team's week — a live agent that reasons its way through is simpler to set up, easier to maintain, and increasingly reliable.

See It

Start a conversation in Kazi, work through a task, ask the assistant to schedule a wake-up call for it. The workflow is live from that moment. No build step, no deployment, no configuration file.

This pattern also powers the calendar polling, meeting preparation, and follow-up chains behind our meeting assistant demo — wake-up workflows running autonomously around every call.

What design patterns are others landing on for recurring agentic work?

Action as a Service: Why Agentic AI That Stops at Summaries Isn't Done Yet