The previous article described the top-level boundary: Codex and Claude are entry points, while SketchUp Agent Harness lives in the harness layer between the agent and SketchUp.

This article goes one layer deeper. After natural language enters an agent CLI, why should it not become a hand-written SketchUp Ruby script immediately? Why do we need an MCP server, an execution trace, and a Ruby bridge? And why does bridge feedback need to return to design_model.json?

The chain can be simplified as:

natural language
-> agent CLI
-> MCP tool call
-> project truth
-> bridge execution trace
-> JSON-RPC request
-> Ruby bridge execution
-> structured response
-> design_model.json feedback

This is more complex than saying “let AI control SketchUp,” but the complexity is doing real work. It turns an open-ended natural-language request into boundaries that can be validated, executed, failed, and repaired.

The MCP server is not a forwarding layer

If the MCP server only forwards what the user said to SketchUp, it is not carrying the architecture.

In SAH, the MCP server sits closer to the product core. It should:

  • read the active project workspace;
  • understand design_model.json, design_rules.json, component metadata, and import state;
  • expose tools the agent can call;
  • turn user intent into project-state changes;
  • validate whether the project truth is complete enough;
  • derive a bridge execution trace from structured truth;
  • plan headlessly when live SketchUp is not running;
  • send the trace to the Ruby bridge when the bridge is available;
  • write execution results back into design_model.json.

In other words, the MCP server is both the tool layer between the agent and the product core, and the compiler layer between project truth and host-application execution.

That layer should not be tied to a specific agent CLI. Claude, Codex, or another future entry point should call the same MCP server instead of each implementing separate SketchUp behavior.

Why generate an execution trace first

The problem with executing live SketchUp operations immediately is that failures become hard to locate.

Did the model misunderstand the user? Is the project truth incomplete? Is component metadata missing? Are the coordinates invalid? Is the bridge unavailable? Did Ruby-side execution fail?

The value of a bridge trace is that it turns “change the model” into explicit operations:

  • operation id
  • operation type
  • payload
  • rollback behavior

For example, component placement should not be represented only as “place a toilet.” In the trace, it should become an explicit operation: which component to place, which instance id it has, what its dimensions are, where it goes, and whether procedural fallback is allowed when the real .skp asset is missing.

Once the trace exists, the system can reason before live SketchUp execution:

  • can the current design_model.json produce a complete trace?
  • are there spaces, components, or lighting instances that cannot become operations?
  • should partial execution be refused?
  • is clean replay required?
  • are there stale import overlays or old generated objects?
  • can this path be smoke-tested headlessly?

This is why SAH separates planning from execution. A planning failure and a bridge execution failure are different failures. Mixing them makes debugging expensive.

JSON-RPC is the stable protocol boundary

After the MCP server generates a trace, the agent should not improvise Ruby code and send it to SketchUp. A more stable boundary is a protocol.

SAH uses a JSON-RPC-shaped bridge boundary: a request has a method, params, and id; a successful response has a result; a failed response has an error, and the error data should identify the failed operation, rollback status, and model revision.

This boundary has several benefits:

  • Python and Ruby can evolve independently;
  • every operation has a traceable id;
  • failures can be attributed to specific operations;
  • rollback behavior can be represented at the protocol level;
  • bridge responses can be recorded, tested, and replayed;
  • later agent calls do not need to guess what happened inside SketchUp.

For developers, this means the Ruby bridge is not a temporary script. It is a host-application adapter and should be designed like a product interface, not like a one-off automation snippet.

The Ruby bridge should be narrow and strict

The Ruby bridge runs inside SketchUp. It should not own the entire product strategy, and it should not know the differences between Claude and Codex.

Its responsibilities are narrower:

  • receive structured operations;
  • validate whether the payload can be executed in SketchUp;
  • create, modify, or query SketchUp entities;
  • return structured errors on failure;
  • trigger rollback when needed;
  • return execution metadata such as entity ids, spatial delta, model revision, and elapsed time.

The key is that the Ruby bridge should not do product-level reasoning. Product reasoning belongs in the MCP server and project truth layer. The bridge should focus on host-application execution.

This also matters in practice. A live SketchUp environment can fail for many reasons: the model window is not ready, the plugin is not loaded, the host application is busy, or an operation is invalid. The bridge should convert those situations into structured blockers or errors instead of leaving the agent with “nothing happened.”

Execution feedback must return to project truth

Many automation systems stop at “the operation succeeded.” For an agent harness, that is not enough.

If the bridge creates walls, doors, windows, components, or lights, later agent calls need to know which SketchUp entities correspond to those project objects. Otherwise, the next edit can only guess.

SAH therefore writes project-backed execution feedback back into the same design_model.json:

  • bridge operations from the current replay;
  • entity ids returned by successful operations;
  • execution metadata for generated space walls;
  • execution feedback for explicit walls and hosted openings;
  • entity ids for component and lighting instances;
  • the operation id that last created or updated each instance.

Old metadata also needs to be replaced, not accumulated forever. After replay, obsolete bridge operations should be removed, and old execution data for targeted walls and openings should be cleared. Otherwise the agent reads stale state.

This is the practical meaning of “SketchUp is the execution view, while design_model.json is the project truth.”

Partial execution should be refused by default

Suppose a design project contains spaces, components, and lighting, but some of them cannot be converted into bridge operations. The system has two choices:

  • execute whatever it can;
  • refuse by default and require the agent to make the omitted objects explicit to the designer.

SAH takes the second direction by default.

That is not just conservatism. It is a product-quality boundary. Silent skip is dangerous in design software. The SketchUp scene may look successful while missing a wall, fixture, component, or light. Visual review, screenshots, saves, and demos would then be based on a false scene.

Partial execution can still exist, but it should be explicit. The agent or user should know which instances are omitted before continuing.

Clean replay prevents scene contamination

Repeated execution creates another problem.

If every replay adds new walls, doors, and components without removing old generated geometry, the SketchUp scene quickly becomes contaminated. Visually it may look more detailed, but in reality stale objects were never removed.

SAH therefore supports clean replay: before executing current truth, it can clean the managed layers or objects, then replay the current design_model.json.

Import replay needs even stricter handling. Source overlays, template entities, old imported walls, and placeholder objects can all distort visual judgment. The system needs clear rules for:

  • preserving manual geometry;
  • removing harness-managed objects;
  • doing full-scene clean when required;
  • checking for scene contamination after execution.

This is not a UI detail. It is a consistency problem. A system cannot claim that design_model.json is truth while letting stale SketchUp objects drive interpretation.

Coordinates and units must be stable at the protocol layer

Users may say “2 meters by 1.8 meters” in natural language. A geometry protocol cannot depend on that wording.

SAH uses millimeters internally and follows a Z-up coordinate system. User-facing units such as meters, feet, or inches should be converted by the Python / MCP layer before the bridge sees them. The Ruby bridge should receive stable geometry data, not natural-language dimensions.

This basic rule supports everything downstream:

  • whether face vertices are coplanar;
  • whether box dimensions are positive;
  • whether wall alignment is explicit;
  • whether bounding boxes can be compared;
  • whether placement can be reproduced;
  • whether spatial delta is meaningful.

If units and coordinates are unstable, validation, trace generation, bridge response, and visual review all become unreliable.

Why this matters beyond SketchUp

SAH uses SketchUp as the host application, but the pattern generalizes:

LLM intent
-> tool contract
-> structured project truth
-> deterministic execution trace
-> protocol boundary
-> host bridge
-> structured feedback
-> updated project truth

The core idea is not MCP as a brand name, and it is not Ruby as an implementation language. The core idea is the boundary:

  • the LLM does not own host-application state directly;
  • the tool layer is not a natural-language forwarding proxy;
  • the execution trace gives the system an inspectable intermediate artifact before live execution;
  • the bridge focuses on execution and feedback;
  • the project truth receives feedback and supports the next agent decision.

If you want a maintainable agent product, these boundaries matter more than a polished live demo.

Source trace