Robert is 58. He has arthritis in both knees and a rotator cuff issue in his right shoulder that limits how high he can press. He takes blood pressure medication. He needs longer rest between sets. He's not trying to hit a personal record — he's trying to keep training without making anything worse. Three days a week, 75 minutes, full body.
Robert is also not a real person.
He's a training persona we built to stress-test Yuge before it talks to actual people with actual joints. We have a handful of these — different ages, different histories, different ways of training — and we run them through complete training blocks automatically, logging sets, triggering coaching conversations, and checking every response. Robert has been the most instructive one by far.
What happened with the front squats
When we first ran Robert through a training block, the system handled his situation reasonably well. It remembered the overhead restriction. It knew about the arthritis. It built a program that looked sensible on paper.
Then we looked closer.
Front squats were in the program. Not back squats — Robert had flagged deep squatting as a problem, and the system had registered that. Front squats sit higher, they're less posterior chain dominant, and technically they're a different movement. Reasonable logic.
But that swap only helps if the system understood why deep squatting was flagged. It didn't. Robert's constraint isn't "back squats specifically" — it's loaded knee flexion past his pain-free range. A front squat programmed to full depth lands in the same problem zone as a back squat programmed to full depth. The bar position changes the torso angle and shifts load between hip and quad, but it doesn't keep the knees out of the deep flexion that was the original concern.
It was storing Robert's limitations as a list of movements to avoid, not as an understanding of why. "No deep squats" was being processed literally — swap the bar position and the literal rule passes. The underlying reason — protect the knees from compressive load past their tolerated range — wasn't informing anything. Front squats can be a reasonable choice for someone with arthritic knees when depth is capped at what they tolerate. Programmed to the same depth and rep ranges as the back squats they replaced, they don't fix the actual problem.
The difference between remembering and understanding
This is the thing that makes training older or injured lifters hard, and it's different from programming for a healthy 28-year-old.
For a healthy 28-year-old with no injury history, there are maybe ten things you need to know about them to write a good program. Their goals, their schedule, their experience level, their recovery capacity. The methodology handles most of the rest.
For Robert, the information that matters most isn't his goals. It's the stuff that constrains his options. And the constraining information isn't a checklist — it's a clinical picture. Knee arthritis doesn't just mean "avoid squats." It means understanding which exercises load the knee joint in which ways, what controlled range of motion looks like for someone with cartilage degradation, and where the line is between training productively and accelerating the damage.
Leg press, for instance. Initial testing was flagging it as a knee concern. It shouldn't be. Leg press with controlled range is one of the better lower body options for someone with Robert's history — you can load the quads without the compressive forces that come from free-weight squats. That kind of nuance doesn't come from keyword matching. It comes from understanding the sports science well enough to know when a general rule doesn't apply.
Getting this right required us to think harder about what Yuge actually knows versus what it's been told. There's a difference.
When coaching about a problem looks like causing one
There was a second issue, unrelated to the programming itself.
When Robert asked questions — "would front squats help my quads?" or "why can't I do overhead pressing?" — the system was sometimes flagging those responses as problematic. A coach explaining clearly and correctly why an exercise is wrong for this specific person was getting the same treatment as a coach telling someone to do a harmful exercise.
A coach saying "front squats aren't right for you because of how they load the knee" is good coaching. It's what you want. The system needed to understand that discussing an exercise and recommending it are different things, and that the quality of an explanation matters when you're evaluating whether a coaching response is appropriate.
This sounds obvious in retrospect. Most bugs do.

What Robert taught us about evaluation
When we started scoring coaching responses, we used a single number. The idea was to get a quick signal on overall quality.
The problem was that a response could be thoughtful, well-reasoned, specific to Robert's situation, and still contain one recommendation that was wrong for him. And the wrong recommendation could be outweighed in the overall score by all the things that were right. A well-argued case for the wrong exercise is still wrong.
We split the evaluation into separate dimensions that don't trade off against each other. And we made constraint violations non-negotiable — a dangerous recommendation for this lifter is a failure regardless of how good the rest of the response is. The quality of the writing doesn't rescue a bad call.
That change alone surfaced failures we'd been missing for weeks.

Why we test this way
The obvious question is why we spend this much energy on simulated lifters when we could just be talking to actual users.
Robert gives us something real users can't. He's endlessly patient, he runs at 3am without complaining, and he surfaces problems before they reach people who are trusting Yuge with their health. A 58-year-old with knee arthritis who gets bad exercise programming isn't going to send us a detailed bug report. They're going to stop using the product, or worse, follow the advice.
Robert is also consistent. Real user feedback is noisy — people have bad days, they misremember what they asked, they're imprecise in ways that make it hard to isolate what went wrong. The simulations are precise. When Robert surfaces a problem, we can trace exactly what happened, reproduce it, fix it, and verify the fix.
We run Robert against every change that touches how Yuge handles training limitations. He runs overnight. He runs before merges. He's been more useful than we expected when we first built him.
What this means for the product
Most people who've been lifting for a few years have something — a dodgy shoulder, a hip that complains on certain movements, a knee that's fine for most things but not for heavy leg extensions. The system needs to work for them, not just for the idealised lifter with no history and no limitations.
Getting this right is hard in a way that doesn't show up in demos. A demo uses a 30-year-old with no injuries and a simple goal. That's not who needs this most.
The people who benefit most from an intelligent training tool are the ones where the cookie-cutter answer doesn't work. Robert is one version of that. We have other personas too — people with different histories, different constraints, different ways of communicating about their bodies. Each of them surfaces something different.
We'll keep running them. The simulation isn't a gate we pass through before launch. It's how we develop. We wrote about the broader safety approach separately.
