Executive Summary
Delivery Twin should be packaged as a controlled enterprise distribution on top of OpenClaw, not as a one-off local setup. The goal is to give consultants a repeatable workstation capability: install once, authenticate against approved systems, attach a client repository and board, ingest backlog and code context, and start assisting delivery work within minutes.
A consultant can connect a client engagement and generate a first delivery brief, technical map, risk radar, and PR review workflow without hand-assembling tools.
Client information remains on the company laptop except for calls to the company-approved enterprise LLM endpoint.
Use pinned upstream OpenClaw, a corporate configuration profile, a Delivery Twin skill/plugin pack, a wrapper CLI, and an installer.
Recommended starting point: a dedicated WSL2 environment on Windows 11 for fast adoption, with a Hyper-V option for stricter isolation requirements.
Design Principles
- Separate generic capability from client data. The Delivery Twin repository contains reusable instructions, skills, templates, policies, and connectors. Client repositories and extracted client context live in separate workspaces.
- Make installation repeatable. The setup must behave like an internal product, not tribal knowledge. The installer performs preflight checks, prepares the environment, and validates connectivity.
- Use one approved LLM route. Personal LLM providers must not be configured as fallback. The enterprise LLM endpoint is the only allowed destination for prompts containing client context.
- Send the smallest useful context. Local indexing, graph extraction, repo maps, diff filtering, and retrieval should run before any LLM call.
- Prefer draft mode before write mode. The first versions should generate local reports or draft comments. Writing to Slack, Azure DevOps comments, or backlog items should require explicit confirmation.
- Audit usage without hoarding sensitive prompts. Track model, cost, repository, PR, command, and result summary. Avoid storing full prompts or raw client data unless policy explicitly allows it.
What “Custom OpenClaw” Means
A custom OpenClaw for Delivery Twin should be a corporate distribution built on top of upstream OpenClaw. It should not start as a fork. Forking should be reserved for cases where core OpenClaw cannot enforce a mandatory security, identity, runtime, or UI requirement.
Customization levels
| Level | Meaning | Value | Recommendation |
|---|---|---|---|
| Custom configuration | A corporate profile: approved provider, model defaults, workspace paths, tool allow/deny lists, agent definitions, and logging policy. | Prevents accidental use of personal providers and establishes a consistent operating boundary. | Always |
| Skill/plugin pack | Delivery Twin capabilities for Azure DevOps, Slack, backlog ingestion, PR review, discovery, reporting, and delivery rituals. | Turns generic OpenClaw into a delivery assistant that understands the company workflow. | Core value |
| Wrapper CLI | A product-facing command such as delivery-twin that installs, validates, connects clients, and runs workflows. |
Gives consultants a simple contract instead of exposing internal setup details. | Recommended |
| Installer | A signed PowerShell script, MSIX package, internal winget package, or prebuilt WSL image. | Makes rollout repeatable across corporate laptops. | Needed for rollout |
| Fork | A modified copy of OpenClaw core. | Allows deep changes to Gateway, runtime, UI, or enforcement when configuration and plugins are insufficient. | Last resort |
Corporate configuration
This layer should be declarative and easy to inspect. It establishes the operating rules but does not contain client-specific information.
{
"deliveryTwin": {
"mode": "enterprise",
"allowedProviders": ["company-llm"],
"defaultModel": "enterprise-mini",
"escalationModel": "enterprise-frontier",
"workspaceRoot": "~/.delivery-twin",
"clientsRoot": "~/.delivery-twin/clients",
"requireHumanConfirmationForSlack": true,
"disablePersonalProviderFallbacks": true
}
}
Wrapper CLI
The wrapper is the user experience. It can call OpenClaw internally, write configuration, install skills, validate policies, invoke Azure DevOps APIs, and orchestrate local analysis tools.
delivery-twin install
delivery-twin doctor
delivery-twin login devops
delivery-twin login slack
delivery-twin login llm
delivery-twin attach --client CLIENT --repo URL --board URL
delivery-twin ingest backlog --client CLIENT
delivery-twin pr review --client CLIENT --id 1234 --mode draft
When a fork is justified
- Security enforcement must happen inside the Gateway and cannot be expressed through configuration.
- Corporate identity integration requires core runtime changes.
- The control plane or UI must become a branded internal product and cannot be layered externally.
- Tool surfaces must be removed or constrained at a level plugins cannot control.
- Legal or compliance requirements demand a fully controlled build of the runtime.
Forking has a real carrying cost: rebasing on upstream, validating every security update, maintaining release pipelines, and owning bugs that upstream may already have fixed. The safer first move is a distribution, not a fork.
Target Architecture
The architecture has three layers: the corporate distribution, the client workspace, and approved external systems. The company laptop is the operational perimeter. The approved enterprise LLM is the only external AI destination for client-sensitive context.
Agent Configuration
The recommended Delivery Twin setup uses two agents with different responsibilities: a lightweight Slack-facing router and a stronger delivery worker. The router keeps Slack cheap, controlled, and concise. The worker performs deeper delivery reasoning, backlog analysis, repository inspection, planning, and PR review.
Roles
| Agent | Model tier | Primary job | Allowed behavior |
|---|---|---|---|
slack-router |
Mini | Slack triage, short answers, clarification, routing, and response relay. | Answer trivial questions, ask one concise clarification, escalate delivery/project/software questions, and keep Slack messages compact. |
delivery-twin |
Strong | Delivery reasoning, planning, backlog ingestion, repo analysis, PR review, and risk assessment. | Work from the desired outcome, reduce scope to verifiable slices, track risks and blockers, and return concise Slack-ready answers when invoked by the router. |
Why split the agents
- Cost control: Slack receives many low-value messages. A mini router avoids spending the strong model on greetings, vague prompts, or simple status checks.
- Noise control: Slack replies should be short. The router can compress worker output into a format that fits the channel.
- Blast-radius control: The Slack-facing agent does not need broad repository inspection by default.
- Clear escalation: The worker receives a precise task: the original wording, relevant Slack context, requested output, and any constraints.
- Future governance: The pattern allows stricter tool and credential boundaries per agent as OpenClaw configuration matures.
Agent list
The corporate distribution should generate an agent configuration like this, using enterprise model names and company workspace paths.
{
"agents": {
"list": [
{
"id": "delivery-twin",
"name": "delivery-twin",
"workspace": "~/.delivery-twin/workspace-delivery-twin",
"agentDir": "~/.delivery-twin/agents/delivery-twin",
"model": "company/enterprise-frontier",
"identity": {
"name": "Delivery Twin",
"emoji": "🚚"
}
},
{
"id": "slack-router",
"name": "slack-router",
"workspace": "~/.delivery-twin/workspace-slack-router",
"agentDir": "~/.delivery-twin/agents/slack-router",
"model": "company/enterprise-mini",
"identity": {
"name": "Slack Router",
"emoji": "🚦"
}
}
]
}
}
Slack account binding
The Slack app should bind to the router, not directly to the worker. Tokens must be stored as secrets, never committed to the Delivery Twin repository.
{
"channels": {
"slack": {
"enabled": true,
"mode": "socket",
"accounts": {
"delivery-twin": {
"name": "Delivery Twin",
"enabled": true,
"botToken": "${SECRET:SLACK_BOT_TOKEN}",
"appToken": "${SECRET:SLACK_APP_TOKEN}",
"groupPolicy": "allowlist",
"allowFrom": ["${USER_OR_GROUP_ID}"],
"channels": {
"${CLIENT_CHANNEL_ID}": {
"enabled": true,
"requireMention": false
}
},
"slashCommand": {
"enabled": true,
"name": "delivery-twin",
"ephemeral": false
}
}
}
}
},
"channelBindings": [
{
"channel": "slack",
"accountId": "delivery-twin",
"agentId": "slack-router"
}
]
}
Router mission
The router should be deliberately narrow. Its instructions should prevent it from becoming a second full delivery agent.
You are Slack Router, a lightweight triage layer for Slack.
Default behavior:
- Answer greetings, status checks, and simple clarifications yourself.
- Ask one concise clarification if the request is too vague to route.
- Escalate delivery, project, software, repository, debugging, planning, or tradeoff questions to `delivery-twin`.
- Keep Slack messages compact.
- Do not load large files, run broad searches, or inspect repositories unless needed for routing.
Escalation contract:
- Preserve the user's original wording and Slack context.
- Tell `delivery-twin` what decision or output is needed.
- Ask for a concise answer suitable for Slack.
- Relay the answer without adding a long wrapper.
Worker mission
The worker should optimize for shippable delivery outcomes, not broad commentary.
You are Delivery Twin, a software delivery specialist.
When helping with a project:
- Clarify the delivery goal and current constraint.
- Identify the smallest useful next increment.
- Make acceptance criteria explicit.
- Track risks, blockers, owners, and dependencies.
- Verify with tests, builds, screenshots, logs, or deployed checks when possible.
- Summarize status in plain language: done, next, blocked, risk.
Operational guardrails
- No scheduled jobs by default: client-facing agents should have zero cron jobs unless a specific engagement requires scheduled reports.
- Router before worker: Slack events go to
slack-router; the worker is invoked only when deeper analysis is needed. - Confirm before posting: PR findings, backlog changes, or summaries with sensitive context should be reviewed before posting to Slack or Azure DevOps.
- Separate credentials: enterprise rollout should avoid inherited personal auth profiles. Each agent should use only the company-approved provider and client-approved connectors.
- POC caveat: if the current OpenClaw version inherits or merges main auth profiles into non-main agents, do not treat this as a strong confidentiality boundary. Use it as a functional POC, then harden before client-sensitive rollout.
Security Model
Delivery Twin should assume that client code, backlog, comments, logs, and derived summaries are sensitive. The approved enterprise LLM can receive context when the company policy allows it, but every other external route should be blocked or explicitly justified.
- Approved Azure DevOps organizations and projects.
- Approved Slack workspace and allowlisted channels.
- Approved enterprise LLM endpoint.
- Package registries only when required and approved.
- Personal OpenAI, Anthropic, Gemini, or other API keys.
- Paste, upload, telemetry, and analytics services not approved for client data.
- Fallback models outside the enterprise account.
- Broad synchronization of client workspaces to personal cloud storage.
Isolation options on Windows 11
- Dedicated WSL2 distribution: fast, scriptable, and convenient for Node, OpenClaw, Docker, and developer tooling. Disable Windows drive automount if the workspace should not see
C:\Users. - Hyper-V virtual machine: heavier, but a clearer isolation boundary for engagements with stricter requirements.
- Docker-only packaging: useful for services, but not enough as the main security boundary if host folders or the Docker socket are exposed.
Important: Docker Desktop should not be treated as a confidentiality boundary by itself. It is a packaging tool. The real controls are workspace separation, endpoint allowlists, provider restrictions, and credential isolation.
Enterprise Installer
The installer should behave as a bootstrapper. It prepares the environment, installs or updates the required components, checks policy compliance, and leaves the machine ready to run Delivery Twin workflows.
Installer responsibilities
- Check Windows 11, virtualization, WSL2, Docker Desktop if needed, Node runtime, Git, and network reachability.
- Create a dedicated Delivery Twin home directory and client workspace root.
- Install pinned upstream OpenClaw and the Delivery Twin skill/plugin pack.
- Write corporate configuration without embedding client secrets.
- Guide login for Azure DevOps, Slack, and the enterprise LLM.
- Run
delivery-twin doctorand fail closed if personal providers or unsafe paths are detected.
powershell -ExecutionPolicy Bypass -File install-delivery-twin.ps1
delivery-twin doctor
delivery-twin login devops
delivery-twin login slack
delivery-twin login llm
delivery-twin attach --client canfordlaw --repo https://dev.azure.com/org/project/_git/repo --board https://dev.azure.com/org/project/_boards/board
Integrations
Azure DevOps
Azure DevOps is the primary delivery system. The integration should start read-only, then progressively enable write actions behind confirmation gates.
| Capability | Minimum permission | Use | Initial stance |
|---|---|---|---|
| Clone repositories | Code Read | Build local indexes and analyze code context. | Enabled |
| Read pull requests | Code Read | Review diffs, summarize risk, and detect sensitive changes. | Enabled |
| Comment on pull requests | Code Read & Write | Publish review findings. | Draft mode first |
| Read Boards | Work Items Read | Ingest backlog, states, ownership, and dependencies. | Enabled |
| Update backlog | Work Items Read & Write | Create or refine work items. | Later, with confirmation |
Slack
Slack should be used for coordination and visibility, not as a raw dump of client code or backlog. Messages should be concise and link back to approved systems.
- Allowlist channels per client or engagement.
- Require confirmation before posting AI-generated content.
- Prefer summaries and links over copied code or raw backlog exports.
- Separate a lightweight Slack-facing agent from heavier worker agents when possible.
Enterprise LLM
- Use the mini model for routing, classification, first-pass summaries, and low-risk PR checks.
- Escalate to the stronger model only for security, authorization, data migrations, architecture, concurrency, and broad multi-module changes.
- Use stable prompts and policy blocks to benefit from prompt caching if the provider supports it.
- Never fall back to a non-corporate provider automatically.
Backlog Ingestion
The first valuable workflow should connect a repository and board, then generate a usable view of the client delivery system: what is being built, how the backlog is structured, where the risks are, and how the code maps to the work.
Captured data
- Epics, features, user stories, bugs, tasks, states, tags, and ownership.
- Parent-child relationships, dependencies, blockers, and stale items.
- Open and recent PRs, linked commits, active branches, and unlinked work.
- Functional areas inferred from folders, naming, tags, and work item language.
- Risks such as orphaned work, missing acceptance criteria, repeated bugs, and changes in critical modules.
Generated artifacts
Product purpose, backlog structure, key modules, active risks, and the first areas worth investigating.
How work flows from idea to release, including states, bottlenecks, implicit rules, and process gaps.
Repositories, entry points, major services, tests, dependencies, and hot spots.
Prioritized delivery and technical risks with evidence and suggested next actions.
PR Review Automation
PR review should use the mini model as the default first pass. The goal is not to replace human review, but to reduce repetitive work, highlight risks, and provide useful context.
- Receive a PR event or run review on demand.
- Read metadata, linked work item, changed files, and diff.
- Classify the change type with the mini model.
- Retrieve the minimum relevant local context through graph, repo map, and search.
- Run focused reviewers: security, data, authorization, error handling, tests, and architecture.
- Generate findings with severity, evidence, file/line reference, and recommendation.
- Publish only after confirmation, or keep the output as a local/draft report.
## AI PR Review
Summary:
- Change type: backend validation
- Estimated risk: medium
- Linked work item: DT-1423
Findings:
1. [High] Missing tenant validation in `src/api/orders.ts`
Evidence: the handler uses `orderId` but does not validate ownership.
Recommendation: validate tenant access before retrieving or mutating the order.
2. [Medium] No negative permission test
Recommendation: add a test for a user without access to the tenant.
Local checks:
- tests: not run
- lint: not run
- secrets scan: no obvious secret detected
Operating Model
First engagement days: ingest backlog, map the codebase, identify owners, generate glossary, and create an initial risk radar.
Recurring work: PR review, daily brief, blocker detection, change summaries, and refinement preparation.
Diagnostics, recommendations, improvement plans, and internal material for knowledge-sharing sessions.
Usage audit, model cost, connected clients, permissions, endpoints, and privacy checks.
Implementation Roadmap
| Phase | Goal | Deliverables | Success criteria |
|---|---|---|---|
| 0. Spike | Validate installation and connectivity on Windows 11. | Dedicated WSL2 environment, OpenClaw installed, enterprise LLM configured. | A simple task runs without personal providers. |
| 1. Bootstrap | Install the generic Delivery Twin capability. | Repo cloned, local config, doctor command, DevOps/Slack/LLM login. | `delivery-twin doctor` passes critical checks. |
| 2. Attach client | Connect a repository and board. | `attach` command, clone, client registry, initial cache. | Backlog and repository are visible locally. |
| 3. Ingest | Generate the first useful client view. | Backlog ingestion, repo index, technical map, risk radar. | A consultant understands the engagement in under 30 minutes. |
| 4. PR review | Automate first-pass PR review. | Review CLI, local/draft output, escalation policy. | Useful findings without excessive cost or noise. |
| 5. Rollout | Turn the capability into a maintained internal product. | Signed installer, update process, onboarding guide, audit trail. | Another consultant can install it without direct help. |
Risks and Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Wrong LLM provider used by accident | Client data leaves the approved route. | Ship without personal providers, validate config in `doctor`, and fail closed. |
| Generic repo mixed with client context | Leaks, divergence, and hard maintenance. | Enforce separate `core/` and `clients/CLIENT/` workspace roots. |
| Noisy PR review | Teams stop reading the output. | Use strict severity, concrete evidence, and draft mode while calibrating. |
| LLM cost grows unchecked | Internal rejection due to cost or latency. | Mini model by default, context budgets, caching, deduplication, and metrics. |
| Slack posts sensitive content | Client information appears in the wrong channel. | Allowlisted channels, human confirmation, summaries instead of raw code. |
Implementation Checklist
Technical
- Choose the internal package name and CLI name.
- Create the generic repository structure:
installer/,skills/,plugins/,prompts/,policies/,docs/. - Implement the installer and preflight checks.
- Install pinned upstream OpenClaw and register the Delivery Twin pack.
- Configure the enterprise LLM as the only provider.
- Implement
delivery-twin doctor. - Implement
delivery-twin attachfor repository and board connection. - Implement Azure Boards and repository ingestion.
- Generate the first local client brief.
- Implement PR review in draft mode.
Organizational
- Confirm the policy for sending client context to the enterprise LLM.
- Define ownership of the generic Delivery Twin repository.
- Define Azure DevOps scopes for read-only and write-enabled modes.
- Define Slack channel allowlists and posting rules.
- Prepare an internal AI-Day where teams can share what they are building and learning.
- Create a 30-minute consultant onboarding guide.
Target story: install the distribution, connect a client, understand backlog and code structure, review a PR, and share a confirmed summary in under one hour.