Agentic Workflows: A Purist Approach
The Spectrum
Traditional workflow engine — N8N, Zapier, Airflow. Define nodes, connect edges, compile a static graph. Runs the same way every time. Environment changes or an unanticipated exception? The workflow breaks. Edit the graph, redeploy.
Middle of the spectrum: workflows that embed an LLM at certain nodes. The graph is still static, but individual steps can reason. Useful. Handles variability at specific decision points without rewriting the flow.
That last one is what we mean.
Wake-Up Calls
The implementation is straightforward. We call them wake-up calls.
User works through a complex, multi-step task with an assistant. Human guidance along the way — corrections, clarifications, refinements. The assistant learns which data sources matter, what format the output should take, which edge cases to watch for, who to notify and when.
Once it's working, the user says: "Schedule a wake-up call for this. Run it every Monday at 8am."
One instruction. Live chat session becomes a recurring autonomous workflow.
Under the hood:
- The assistant calls
schedule_wake_up_callwith a context memo — natural-language instructions, written by the assistant for its future self. - At the scheduled time, the scheduler fires and delivers the memo into a new turn of the same conversation thread.
- The assistant picks up the thread. Full history available — every correction, every refinement, every lesson from the original session and all prior iterations.
- Does the work. Assesses conditions. Takes actions across connected systems.
- Schedules the next wake-up, or delivers a final response. Binary exit.
No DAG compiler. No node editor. No schema definitions. The workflow lives in the conversation history and the context memo.
The Intelligence Is in the Workflow
Traditional workflow engine: intelligence lives in the nodes, connections between them are static. Step 3 encounters something step 2 didn't anticipate? Workflow breaks.
We call these intelligent workflows. Not because they use an AI model somewhere in the stack — because the execution surface itself has judgment. It handles exceptions it's never seen before. Same way you would. Read the situation, decide what to do.
In-Context Learning as Memory
The conversation thread is the memory. This matters more than it seems.
Third run of a workflow — full transcript of runs one and two in context. Something went wrong on run two? Data source returned an unexpected format, notification went to the wrong channel, calculation was off? That mistake and its correction are right there in the history.
The assistant doesn't repeat mistakes visible in its own conversation. In-context learning, purest form. No fine-tuning, no external memory system, no retrieval pipeline. The history is literally in the stack. Past corrections persist in the same chat session and inform every subsequent execution.
Each iteration makes the next one better. Not through a training loop — through the simplest mechanism available: the assistant reads what happened last time and adjusts.
This is the purest form of memory. Because it's currently in the stack.
The Trade-Offs
Less reliable than compiled code — for now. That's a fact worth stating plainly.
A Python script reconciling two datasets produces the same output every time given the same input. An assistant-driven workflow has a probability distribution. Almost certainly does the right thing. Occasionally does the right thing a slightly different way. Rarely does something unexpected.
The reliability gap is closing, and it's closing fast. Every model release pushes the distribution tighter. Tasks that required careful prompt engineering a year ago work with plain English now. Judgment calls that needed explicit guardrails are handled correctly by default. The trend line is clear.
More expensive per run — also for now. Delivering complete outcomes as a service is not cheap — there's an LLM reasoning through every iteration, not a script executing in milliseconds. But the savings don't come from the cost of the run. They come from redirecting human effort. The agent delivers outcomes, not summaries. The human reviews results instead of doing the work. That math works out quickly for most recurring operational tasks.
Both concerns are real today. Both are "for now" concerns. We're moving at the speed of AI here — and the architecture is out of the way.
Because there's no compiled flow logic between the model and the work, nothing limits increasing model capability. When the model gets smarter, the workflows get smarter. Automatically. No redeployment, no graph updates, no code changes. We're not constructing infrastructure around the flow. The decisions are made real-time by an intelligent agent, and this approach doesn't fight improving model intelligence. It rides it.
Anything that limits intelligence — rigid graphs, compiled flow logic, hardcoded decision trees — is technical debt the moment the next model ships. We'd rather have the problem of "this gets better every quarter without us touching it" than "we need to rebuild the flow logic to take advantage of the new model."
Skills vs. Wake-Up Workflows
We support both. Different tools, different jobs.
Natural progression: start with a wake-up workflow. Set it up in a single conversation. If the task stabilizes and the steps stop changing, graduate it to a compiled skill later. But in many cases, the adaptability is the point.
A skill is something compilable. A wake-up workflow embeds an agent as the execution engine. The flip side of having an agent that can build a skill — is having an agent that can build a workflow and embed other agents as nodes. Live execution, accounting for all conditions and exceptions.
It's like running a live chat session once a week that gets smarter each iteration.
The Taxonomy
Helps to put this on a scale.
The jump from 3 to 4 is an order of magnitude. In level 3, the agent is a component embedded in infrastructure. In level 4, the agent replaces the infrastructure. Workflow definition is a natural-language memo. Execution engine is a reasoning model. Memory is the conversation history.
When to Use This
Not the right choice for everything. Right choice when:
- The task benefits from judgment and adaptation, not just execution
- Steps may evolve based on changing conditions
- You want something running today, not after a development cycle
- Exceptions and edge cases are expected, not exceptional
- The cost of an occasional imperfect run is lower than the cost of building and maintaining compiled logic
For high-frequency, fixed-logic tasks — compiled skill. For the recurring operational work that occupies most of a team's week — a live agent that reasons its way through is simpler to set up, easier to maintain, and increasingly reliable.
See It
What design patterns are others landing on for recurring agentic work?