Verification Pipeline

Recorded browser testing, not "trust me."

After Mulu builds a feature, it can open the app and run the actual flow. Click the button. Type into the form. Submit the modal. Navigate the route. Scroll the page. Then it records what happened so you can review the proof instead of hoping the model was right.

The pipeline is built for runnable product work, not just static code checks. Mulu inspects the result, watches for errors, and closes the run with evidence. If the flow is broken, the run shows you where it broke.

That is the moat. Other app builders generate code and leave validation to you. Mulu ships a feature only after the verification pipeline has actually exercised it.

Image: recorded verification run with step-by-step browser actions, status chips, and a video artifact showing the feature passing end to end

Verification works because the context is mapped first.

A verification pipeline is only useful if it knows what matters. Mulu builds a map of the codebase first, ranks the relevant files, and understands what is connected before it edits or verifies anything.

That lets the agent choose the right flow to exercise and the right fallout to watch for. The browser run is guided by the codebase map, not by blind replay.

Context management is the second moat behind verification. Most tools do grep. Mulu knows the shape of the app it is about to test.

Image: codebase map panel showing the files, routes, and tests selected for the verification run before any browser actions start

Debug mode now has to prove the fix too.

When Mulu debugs a broken feature, it now reruns the same verification pipeline after the fix lands. The standard is the same as build mode: make the change, rerun the flow, inspect the result, and record the outcome.

That means bug fixing no longer ends on "I think it should be fixed." If the issue is in a runnable path, debug mode has to earn the same proof a new feature does.

Manual confirmation stays available when automation cannot prove enough, but it is the fallback. The default path is tool-based verification first.

Image: debug timeline showing reproduction, code fix, rerun of the recorded browser flow, and a final fixed status with proof attached

The handoff includes artifacts, not a shrug.

A passing verification run can include the browser recording, the executed steps, screenshots, and the console output from the session. When something fails, you get enough context to see what happened instead of rerunning the whole thing just to find the state.

That makes review faster for you and clearer for a team. The evidence is attached to the work itself, so the answer to "did we actually test this?" is visible.

It also changes how you ship. You are no longer deciding whether to trust an AI claim. You are deciding whether the proof is sufficient.

Image: verification detail panel showing video, executed steps, screenshots, and console output attached to a completed feature task

Common questions

What does the verification pipeline actually do?

It builds the feature, opens the runnable app, executes the relevant flow, inspects the result, and records the session. The goal is evidence that the feature works, not just a successful code generation step.

Is this only for web apps?

No. The pipeline is designed for runnable product work, including browser flows and desktop app flows. The key idea is the same: exercise the real interface and capture proof of the result.

Does debug mode use the same verification tools as build mode?

Yes. After a fix lands, debug mode now reruns the same verification workflow instead of stopping at a guess. Manual confirmation is only needed when the tools cannot prove enough on their own.

Do I need to set up a separate test harness?

No. The verification pipeline is part of the product workflow. You do not need to spin up a separate automation stack just to get recorded browser verification and proof artifacts.

What if the tool run is not enough to prove the feature?

Mulu falls back to the strongest available proof. If a path cannot be fully automated, it can ask for manual confirmation, but only after tool-based verification has gone as far as it can.

If the feature works,
you get the proof.

Recorded browser testing, not "trust me."

Verification works because the context is mapped first.

Debug mode now has to prove the fix too.

The handoff includes artifacts, not a shrug.

Common questions

What does the verification pipeline actually do?

Is this only for web apps?

Does debug mode use the same verification tools as build mode?

Do I need to set up a separate test harness?

What if the tool run is not enough to prove the feature?

Build it. Verify it. Then ship it.

If the feature works,you get the proof.

Recorded browser testing, not "trust me."

Verification works because the context is mapped first.

Debug mode now has to prove the fix too.

The handoff includes artifacts, not a shrug.

Common questions

What does the verification pipeline actually do?

Is this only for web apps?

Does debug mode use the same verification tools as build mode?

Do I need to set up a separate test harness?

What if the tool run is not enough to prove the feature?

Build it. Verify it. Then ship it.

If the feature works,
you get the proof.