Mulu does not stop at generated code. It opens the app, runs the flow, checks the result, records the session, and shows you evidence before you ship. The same pipeline now backs build mode and debug mode.
After Mulu builds a feature, it can open the app and run the actual flow. Click the button. Type into the form. Submit the modal. Navigate the route. Scroll the page. Then it records what happened so you can review the proof instead of hoping the model was right.
The pipeline is built for runnable product work, not just static code checks. Mulu inspects the result, watches for errors, and closes the run with evidence. If the flow is broken, the run shows you where it broke.
That is the moat. Other app builders generate code and leave validation to you. Mulu ships a feature only after the verification pipeline has actually exercised it.
A verification pipeline is only useful if it knows what matters. Mulu builds a map of the codebase first, ranks the relevant files, and understands what is connected before it edits or verifies anything.
That lets the agent choose the right flow to exercise and the right fallout to watch for. The browser run is guided by the codebase map, not by blind replay.
Context management is the second moat behind verification. Most tools do grep. Mulu knows the shape of the app it is about to test.
When Mulu debugs a broken feature, it now reruns the same verification pipeline after the fix lands. The standard is the same as build mode: make the change, rerun the flow, inspect the result, and record the outcome.
That means bug fixing no longer ends on "I think it should be fixed." If the issue is in a runnable path, debug mode has to earn the same proof a new feature does.
Manual confirmation stays available when automation cannot prove enough, but it is the fallback. The default path is tool-based verification first.
A passing verification run can include the browser recording, the executed steps, screenshots, and the console output from the session. When something fails, you get enough context to see what happened instead of rerunning the whole thing just to find the state.
That makes review faster for you and clearer for a team. The evidence is attached to the work itself, so the answer to "did we actually test this?" is visible.
It also changes how you ship. You are no longer deciding whether to trust an AI claim. You are deciding whether the proof is sufficient.
It builds the feature, opens the runnable app, executes the relevant flow, inspects the result, and records the session. The goal is evidence that the feature works, not just a successful code generation step.
No. The pipeline is designed for runnable product work, including browser flows and desktop app flows. The key idea is the same: exercise the real interface and capture proof of the result.
Yes. After a fix lands, debug mode now reruns the same verification workflow instead of stopping at a guess. Manual confirmation is only needed when the tools cannot prove enough on their own.
No. The verification pipeline is part of the product workflow. You do not need to spin up a separate automation stack just to get recorded browser verification and proof artifacts.
Mulu falls back to the strongest available proof. If a path cannot be fully automated, it can ask for manual confirmation, but only after tool-based verification has gone as far as it can.
Mulu turns verification into a product feature, not extra work you have to remember to do after the build finishes.