Tag: agent-workflow | Marlin Bian

May 27, 2026

Evaluating Agent Workflows: Traces, Cost, Recovery

Connects trajectory evaluation, goal-level cost, semantic recovery, and confidence calibration into a practical evaluation frame for agent workflows.

May 27, 2026

AI Agent Operational Governance

Defines AI agent operational governance: the layer of authority, evidence, tools, traces, costs, and human gates that makes agents usable in real work.

May 27, 2026

Commitment Tracking and Agent Drift

Explains residual drift in long-running agent sessions and why serious workflows need commitment tracking, not just memory or contradiction checks.

May 27, 2026

Evidence and Source Authority

Shows how evidence-carrying actions and source authority rules prevent fluent text from overriding structured data, tool output, or source evidence.

May 27, 2026

Runtime Authority and Autonomy Gates

Explains why agent actions need runtime authority checks and autonomy gates, instead of relying only on plan-time approval or a confident task plan.

May 27, 2026

Tool Registries Are Control Surfaces

Treats MCP servers, plugins, tool descriptions, and dependencies as part of the agent control surface that needs governance and supply-chain review.

May 12, 2026

How Codex Can Drive Verifiable SketchUp Modeling

Shows how an agent CLI, MCP server, SketchUp Ruby bridge, runtime skills, and a structured design model turn design intent into verifiable project state.

May 12, 2026

Why Codex Should Be a Worker, Not the Scheduler

Explains why Codex and similar AI coding tools should remain bounded workers, not long-running schedulers or release controllers.

May 12, 2026

Give AI the Context It Should See, Not the Whole Repository

Many AI task failures do not happen because the model cannot modify code. They happen because the model reads the wrong context.

May 12, 2026

Designers Should Think, Not Draft

A product and workflow essay arguing that AI should reduce low-level drafting work so designers can focus on intent, judgment, constraints, and tradeoffs.

May 12, 2026

Evidence Contract: AI Delivery Must Come With Proof

Explains why AI delivery must include verifiable proof: tests, logs, screenshots, risk notes, and a review path, not only a claim that work is done.

May 12, 2026

From AI Failure to Project Memory

Shows how repeated AI mistakes should become project memory, updated rules, and regression checks so the delivery pipeline improves after each failure.

May 12, 2026

Use Labels, Branches, and Comments as an AI Engineering State Machine

Shows how GitHub labels, branches, and comments can become an AI engineering state machine that keeps issue repair work observable and controllable.

May 12, 2026

Put Humans at Risk Boundaries, Not Only at the Final Approval

Shows where humans should stand in an AI delivery pipeline: requirements, risk boundaries, release decisions, rollback choices, and final acceptance.

May 12, 2026

Stage-Gated AI Workers Need Isolated Workspaces

Explains why real projects should put AI work in isolated branches or workspaces, then move changes through explicit gates before they reach the main codebase.

May 12, 2026

An Issue Is Not a Todo: It Should Be an Executable Contract for AI

Argues that an issue should be an executable contract for AI work, with scope, context, gates, evidence, and release boundaries instead of a loose todo.

May 12, 2026

Open-Sourcing an Issue-Driven Agent Workbench

Explains how to safely extract public methodology, templates, and toy examples from private production AI workflow experience.

May 12, 2026

PR Merged Is Not Done: The Release Boundary in AI Engineering

Explains why a merged PR is not the end of AI engineering work; release verification, rollout checks, and post-merge evidence still define done.

May 12, 2026

What a Project-Specific AI Delivery Pipeline Means

Defines a project-specific AI delivery pipeline: AI acts as a worker while the project owns task intake, context, gates, evidence, and release boundaries.

May 12, 2026

From Chat Request to Task Contract: Route the Work Before AI Executes

The most common risk in AI-assisted development is not that the model cannot write code. It is that the model starts writing code too early.

May 12, 2026

Why AI Design Tools Need a Source of Truth

Uses SketchUp Agent Harness to explain why AI design tools need an editable, verifiable, repairable source of truth instead of only generating finished-looking output.

active

SketchUp Agent Harness

An open-source project connecting agent CLIs, an MCP server, a SketchUp Ruby bridge, structured design models, and runtime skills into a verifiable design workflow.