When I realized I could teach the AI.
Step 8 of 8: not just a smarter model. A loop that learns.
Step 7 ended with the realization that embedded AI gets sharper in two ways.
The first way happens on its own. Every few months a more capable reasoning model comes out, and the AI inside the experience steps up with it — better at conversation, better at reasoning, better at picking up what someone is actually saying. That improvement is real and it requires nothing from me.
The second way is the one this post is about. The AI inside the apps could get sharper about this specifically — about retirement readiness, about this cohort, about the questions that keep coming up and the responses that have been working and the ones that have not. The intelligence the model brought was general. The work it was doing was specific. There was a way to close that gap.
This is the step I am still in the middle of, so I want to be honest about what I have done and what I have not.
What I have done is build the first version of the loop. The cohort’s conversations with the assistant generate transcripts. The transcripts are read — not by the live assistant in real time, but by a separate process that runs weekly. That process clusters the conversations by topic, identifies the recurring patterns, and produces a brief on what the cohort is actually using the assistant for and where the assistant is falling short. The brief is the input to the next round of system-prompt updates and surface refinements.
The AI is not training itself in any technical sense. The model weights are not changing. The model is doing what it has always done. What’s changing is the context the model is operating inside — what it knows about this cohort, what patterns it has been pointed at, what responses have been tuned in and out.
But its behavior is changing, week over week, because the data it is generating is being used to make it sharper.
That is the loop. It is crude. It is manual. It is real.
The Sunday the loop closed for the first time
This was a Sunday in spring of 2026. I had been getting cohort feedback through a combination of email replies, occasional calls, and the assistant transcripts I had started reading by hand. Reading by hand was not going to scale. The cohort was small but already producing thousands of messages per week.
I asked the AI to read its own transcripts.
Specifically: given the past week of cohort conversations with the Plan assistant, produce a brief on what the cohort is using the assistant for, what is working, and what is not. Be specific. Quote where useful. Cluster by topic. Tell me where the assistant is being asked to do things it cannot do well. Tell me where the assistant is producing responses the user clearly did not find useful.
It identified four patterns I had not seen, two of which were obvious in retrospect, two of which I would not have noticed on my own. It quoted specific user phrasings that showed up across multiple conversations. It flagged three places where the assistant was producing wrong-shaped responses. It pointed to one entire category of question — about the timing of retirement decisions — that the assistant kept punting on, when it should have been engaging.
I wrote the next round of system-prompt updates against what the brief had told me. The next week, the same analysis ran again. The patterns from the prior week showed up less. New ones appeared. The brief had pointed me at things I needed pointed out, and the assistant improved inside one week.
This was the first time the work I was producing was being used by the system the work was producing. The AI was not autonomously improving. I was the bridge. But the bridge could be walked, repeatedly, faster than I could improve the system by guessing.
The artifact
The artifact is not a single thing. It is a weekly process. The transcripts feed a brief. The brief feeds the prompt updates. The prompt updates change the assistant. The next week’s transcripts reflect the change. The week after that, the next round of brief, prompt, change.
It is not technically recursive — the AI is not modifying its own weights, and given the model architecture available to me, it cannot. But it is functionally recursive at the level that matters: the product improves because of how it is being used, on a cadence shorter than any feedback loop I have run in my career.
If the artifact has to be named, it is the brief itself. The thing that did not exist before — a weekly written reflection on how the AI inside the apps was being used, what was working, what was not — is now an artifact the AI produces about its own behavior. That artifact is what makes the loop walkable.
What it unlocked
The pace of improvement changed. Before the loop existed, I would notice things in conversations I happened to read, hold them in my head, and remember to make changes when I sat down to update the system. The lag between noticing and shipping was weeks, sometimes longer. With the loop in place, the lag is days. The version the cohort uses today is meaningfully different from the version they used three weeks ago, and it is different in directions their own usage pointed at.
The cohort is teaching the system without knowing they are teaching it. They are using it. The use leaves a trail. The trail is now being read. The reading changes the next version. They are not aware of any of this, and that is how it should be — the use should feel the same, only sharper.
Something else shifted: my confidence in my own judgment. When I update the system now, I am not working from intuition about what users probably want. I am working from a brief that quotes the cohort back to me. The brief is not always right, and I do not always agree with it. But the decisions are happening on a foundation of evidence I did not previously have. The work is more grounded.
Where it does not work yet
The honest answer is that the loop is small and slow and limited to the surfaces I have built. The brief reads conversations; it does not yet read the deeper signal — what users do after a conversation, whether the actions in their Plan actually got executed, whether the dimensions they scored low on shifted over months. The behavioral telemetry exists. Most of it is not yet feeding the loop.
A bigger limit: the AI inside the loop is the same AI inside the apps. There is no separation between the model that does the work and the model that learns from the work. In a more developed version of this, the learning function would be its own thing, with its own data, its own audit, its own permissions. I do not have that yet. What I have is a process that depends on me sitting between the data and the system, doing the synthesis and authoring the updates.
There is also the question of what happens when the brief sees something I do not. If the brief tells me a pattern is showing up, and I disagree with the pattern, whose read of the data is right? I am operating on the assumption that I am, for now, because my context is broader than the assistant’s. That assumption gets weaker every month the system runs.
The frontier is not just technical. It is what the relationship between the maker and the system becomes when the system is being shaped by its own use.
This was the eighth step. I don’t know what the ninth is. The eight-step staircase is what I climbed. From here, the work is no longer a staircase. It is a loop, and the loop is running.
A question for anyone reading this. What would your work learn from you if you let it? What data are you generating that no system is reading?
Dennis

