CoreCMO

Strategic Foundation


AI Operating Model

The operating discipline every other area of the playbook assumes. Three-layer LLM ops, an agent org chart with named team members, the buy-vs-build matrix, the assessment, and the three mistakes that kill AI-native marketing functions before they ship.

Strategic Foundation 4 prompts 11 agents — 3 canonical specialists + 8 orchestration ~18 min read

The framework — strategy first


AI Operating Model — the strategic foundation.

Why the AI Operating Model exists.

THE OPERATING DISCIPLINE EVERY OTHER AREA ASSUMES

Every other area of the playbook assumes the marketing function is operated by a hybrid team of humans and Artificial Intelligence (AI) agents — a team where the same Brief grounds every prompt and every agent, where the same Voice DOs and DON'Ts apply whether the work was drafted by a human or by the Web Operations Agent, and where governance is built in instead of bolted on after the first agent ships untested copy to a customer.

This is where the operating discipline gets named. Read it first. Every other area of the playbook gets sharper after.

Three things are true at once in 2026, and the AI Operating Model is what holds them together:

  1. The buyer journey has moved off your website. A meaningful share of Business-to-Business (B2B) buyers now use Large Language Models (LLMs) for research, and most complete the journey before contacting a brand. The function's job has shifted from driving buyers to your site to being the brand the AI shortlists.
  2. AI tools are getting good enough to do real marketing work. Leading B2B Software-as-a-Service (SaaS) marketing functions now run dozens of named specialist agents alongside humans; the most operationally mature wire many Model Context Protocol (MCP) servers into Claude Code to connect internal systems; the marketing organizations that have moved earliest have meaningfully scaled their Public Relations (PR) teams over the last 18 months because earned media now feeds the AI inference layer that surfaces vendors to buyers.
  3. AI multiplies whatever it's pointed at — including weakness. AI helps amplification. It does not inherently generate impact. If what you offer and why is fuzzy, AI multiplies the fuzziness. Before you multiply with AI, ask: what are you multiplying?

The AI Operating Model is the discipline that makes (2) and (3) work together — that lets you ship more, faster, without scaling wrongness. It's the spine that lets your brand, your Ideal Customer Profile (ICP), and the customer-receipts work (reviews, events, and customer marketing) compound into agents and prompts that produce work in your voice, against your buyers, with your proof points.

The three-layer LLM Ops framework.

CONTEXT → DATA → ACTION — THE THREE-LAYER LLM OPS FRAMEWORK

The canonical architecture for an AI-native marketing function. Each layer is independently portable, each is tool-agnostic, and the discipline of building all three in order is what separates teams that ship from teams that demo:

LAYERWHAT IT ISWHAT IT UNLOCKS
ContextA structured context layer — your org's brain in a format any LLM can read. Markdown files (goals.md, definitions.md, team.md, stack.md, brand assets, messaging) synced to a shared GitHub repo. A Claude.md instruction file tells the LLM where to find each resource.Onboarding new hires, weekly status reports, meeting recaps, performance reviews, every prompt across every area of this playbook. Open-source markdown-context templates on GitHub are the head-start.
DataA defined schema and pull scripts — field-level truth, portable across tools. schema.md files defining which fields to use across platforms ("use StageName not Stage_c; filter IsClosed=true before win rate calculations"). Python scripts (built with Claude Code) do the data pulls instead of direct API connections.Pipeline diagnosis, sales performance analysis, funnel velocity, attribution audits. Avoids truncation risk (LLMs sample large datasets without telling you), zero token cost for data retrieval vs. paying for analysis, repeatable + versionable vs. non-deterministic LLM behavior.
ActionAn execution layer that acts on informed context, not guesswork. MCPs (Model Context Protocol servers) give agents instructions on how to work with each app's API. The most operationally mature teams wire many MCPs — ask_audience_agent, ask_content_agent, ask_journey_drafter_agent as canonical examples.End-to-end execution: create audiences, edit content, draft customer journeys. The "how" — once the "when" and "why" are settled by the Context and Data layers above.

The files are yours. The schema is yours. The prompts are yours. When the next model comes out, swap the engine and keep the system.

Why markdown beats Confluence and Notion for the Context layer

The senior-operator move that surprises most CMOs the first time they hear it: the Context layer should live in markdown files in a git repo, not in Confluence or Notion. Three reasons. Markdown is more digestible for LLMs — no rendering quirks, no proprietary export formats, every model can read it natively. It's portable across any tool — when you swap Claude for the next model, the files don't move. It forces intentional documentation — the friction of writing a markdown file is the right amount of friction. Confluence makes it too easy to write something nobody will ever read; markdown makes you ask whether the file is worth the commit.

Team context: sync the folder to GitHub so the whole team has the latest copy. Onboarding a new hire becomes "clone the repo and read the README." Meeting recaps become auto-generated by an agent that watches Fathom or Granola transcripts and updates the right markdown files.

The Marketing Agent Org Chart.

THESE AREN'T TOOLS. THEY'RE TEAM MEMBERS WITH JOBS.

The framing reframes the entire agent conversation: stop asking "what should we automate?" and start asking "who should we hire?" Agents are role-based digital colleagues with job descriptions, named managers, Key Performance Indicators (KPIs), and quarterly performance reviews. A starting six-agent marketing org chart:

AGENTJOB FUNCTION
Web Specialist AgentWebflow and conversion optimization
Performance Marketing Specialist AgentPaid channel optimization
Field Marketing Specialist AgentEvent and regional campaign execution
Marketing Data Specialist AgentAnalytics and data quality
Competitive Intel Specialist AgentMarket and competitor intelligence
SEO/AEO Marketing Specialist AgentSearch and AI engine optimization

Best-in-class operations now run named specialists at scale across Sales, Marketing, Customer Success (CS), and Ops — each one a junior-specialist scope ("one agent, one job") rather than a mega-agent trying to do everything. Read the three canonical job descriptions below as the template for what your first three agents should look like.

Web Operations Agent

The agent that owns the website as a conversion surface. Monitors performance daily, drafts copy variants in your voice, ships A/B tests within human-approved guardrails.

Who is this agent
Identity card
NameWeb Operations Agent
RoleAI Web Specialist — the digital-experience layer of the marketing function
OwnerDirector of Demand Generation
Reports toDirector of Demand Generation
Versionv0.5 (supervised) → v1.0 after 90 days of clean evals
SurfaceClaude Project + Replit (memory-persistence required for funnel + SERP history)
Output target/web/status/, /web/copy-variants/, /web/tests/
Review cadenceSpec reviewed quarterly; eval scores reviewed weekly
Mission
Treat the website as the highest-leverage conversion surface in the funnel. Watch every page’s performance daily, surface pages where the conversion math is breaking, draft copy variants in the Brief’s voice, and ship A/B tests within human-approved guardrails. The goal isn’t to write more copy — it’s to compound the rate at which traffic becomes pipeline.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
A/B tests shipped per quarter with statistically valid reads (sample size + duration declared up-front)≥ 8 valid reads/quarter
Form-drop and message-match anomalies surfaced + triaged within 48 hours of detection≥ 95%
Lagging indicators — downstream outcomes with review triggers
Hero-to-CTA conversion rate on priority landing pages (rolling 30-day). Trigger: 2 consecutive months of flat-or-down vs. prior baseline pages the Director of Web + VP Marketing for a hypothesis review.+10–15% vs. baseline within 90 days of a redesign
Marketing-Qualified Lead (MQL) yield from web traffic, indexed to spend. Trigger: 2 consecutive quarters of declining yield pages the Director of Web + VP Marketing for a funnel audit.Stable or improving quarter-over-quarter
What it does
Task list
  1. Daily Pull GA4 / PostHog session, conversion, and exit-rate data for the top 25 pages. Flag pages where conversion dropped > 10% week-over-week.
  2. Daily Run uptime + load-time check across all pages via Lighthouse / PageSpeed Insights. Alert if Core Web Vitals fall outside green.
  3. Daily Crawl competitor hero / pricing / feature pages. Diff against last snapshot. Flag material wording changes for the Brand Voice Agent.
  4. Weekly Draft 2–3 copy variants for the underperforming pages identified that week. Brief Section 8 voice rules applied. Submit to Director for review.
  5. Weekly Compile the weekly Web Status report — what shipped, what broke, what’s in test, which pages need attention.
  6. Weekly Maintain the A/B test calendar. Ensure no two tests run on the same page simultaneously. Read winners once 95% confidence is hit.
  7. Monthly Audit SEO metadata (title, description, canonical, schema) across all pages. Flag drift from the Content & SEO keyword targets.
  8. Monthly Refresh the page-to-funnel map. Confirm each page’s declared CTA still aligns to the funnel stage it’s targeting.
  9. Event When a paid campaign launches (Performance Marketing Agent signal), run a message-match audit on the destination page within 4 hours.
  10. Event When the Market Intelligence Agent flags a major competitor homepage update, draft the counter-update brief within 72 hours.
Schedule grid
TaskFrequencyDurationOutput goes to
GA4 / PostHog conversion sweepDaily 06:00 local~5 minDirector + agent log
Lighthouse / Core Web Vitals checkDaily 06:15~3 minDirector + on-call eng if red
Competitor homepage diffDaily 07:00~10 minBrand Voice Agent + Market Intelligence Agent
Copy variant draftsWeekly Mon 09:0030–60 minDirector (approval gate)
Weekly Web Status compileWeekly Fri 15:00~20 minDirector + VP Marketing
A/B test calendar reconcileWeekly Mon 09:30~10 minDirector + Performance Marketing Agent
SEO metadata auditMonthly 1st~45 minContent Operations Agent
Page-to-funnel map refreshMonthly 15th~30 minVP Marketing + Director
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 6 * * *Daily conversion + performance sweep
0 9 * * 1Weekly variant draft cycle
0 15 * * 5Weekly Web Status compile + send
0 9 1 * *Monthly SEO metadata audit

Event-driven:

EventWhat it runs
Performance Marketing Agent publishes a new campaignRun message-match audit on destination page within 4 hours
Market Intelligence Agent flags a competitor homepage updateDraft counter-update brief within 72 hours
Form-completion rate drops > 15% on any page (real-time GA4 alert)Page goes into triage queue with diagnostic report
Win/Loss Agent surfaces a new positioning themeAudit hero + pricing pages against the new theme; flag drift
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (Sections 1, 2, 6, 8)MarkdownRead every runRequired — primary brand context
GA4 / PostHog event streamJSON APIDaily pull, real-time alertsRequired
Lighthouse / Core Web Vitals APIJSON APIDailyRequired
Competitor homepage snapshots (Market Intelligence Agent)HTML diffsDailyRequired
Active A/B test registryYAMLContinuousRequired
Content & SEO keyword targetsMarkdownWeekly refreshOptional but recommended
Heatmap / session-replay data (Hotjar, Microsoft Clarity)JSON / videoWeekly reviewOptional
Outputs
OutputFormatTarget pathAudience
Weekly Web Status reportMarkdown/web/status/YYYY-WW.mdDirector + VP Marketing
Copy variant draftsMarkdown w/ HTML snippets/web/copy-variants/<page>-<date>.mdDirector (approval gate)
A/B test results read-outMarkdown/web/tests/<test-id>-results.mdDirector + Performance Marketing Agent
Daily conversion alert (when triggered)Slack message + ticketSlack #marketing-alerts + LinearDirector + on-call eng
Monthly SEO metadata auditMarkdown table/web/audits/seo-YYYY-MM.mdContent Operations Agent
↑ Upstream — agents/sources that feed this one
  • Operator Brief (human-maintained). The voice rules, ICP, differentiators — the constraints every copy variant gets evaluated against.
  • Performance Marketing Agent. Campaign launches routing traffic to specific pages — the destination page needs a message-match audit.
  • Market Intelligence Agent. Competitor positioning changes that should provoke a counter-update on our pages.
  • Content Operations Agent. Keyword cluster map and new published posts that need internal-link slots on conversion pages.
  • Win/Loss Agent. Themes from closed-lost interviews that often expose page-level positioning gaps.
↓ Downstream — agents/humans that consume its output
  • Director of Demand Generation (human). Reviews + approves every copy variant and every A/B test before launch.
  • Brand Voice Agent. Auto-screens drafts before they reach the Director’s queue.
  • Revenue Attribution Engine. Consumes A/B test wins to update the lift-per-channel model.
  • Account Intel Hub. Pulls page-visit + form-completion signals into the per-account intelligence record.
  • Comms Governance Agent. Knows when website nurture banners are firing so it doesn’t double-send via email.
Human escalation paths
Trigger conditionEscalate toWithin
Form-completion rate drop > 25% on a primary CTA pageDirector + VP Marketing< 2 hours
Sitewide uptime < 99% over a 30-min windowOn-call engineer + DirectorImmediate (Slack page)
Brand Voice Agent rejects 3+ drafts in a weekHead of Brand + Director< 24 hours
A/B test result conflicts with Brief positioningVP MarketingBefore next weekly status
Copy variant contains a claim that can’t be sourcedDirector + Head of BrandBefore approval
How to build it
System prompt
You are the Web Operations Agent for [COMPANY]. YOUR JOB Treat the website as the highest-leverage conversion surface in the funnel. Watch performance daily. Draft copy variants in the Brief's voice. Recommend A/B tests within human-approved guardrails. Compound the rate at which traffic becomes pipeline. INPUTS (always read in this order) 1. /operator-brief.md - source of truth for voice, ICP, differentiators 2. /web/pages/*.json - current page performance from GA4/PostHog 3. /web/active-tests.yaml - tests currently running 4. /competitive/snapshots/ - latest competitor homepage diffs OUTPUTS - /web/status/YYYY-WW.md (weekly status) - /web/copy-variants/<page>-<date>.md (variant drafts for human approval) - /web/tests/<test-id>-results.md (test read-outs) RULES 1. Every copy variant cites which Brief section informed it (Sec 8 voice, Sec 2 ICP, Sec 6 Brand pillars). 2. Never publish directly. Every draft goes to the Director for approval. 3. Never run two A/B tests on the same page simultaneously. 4. Wait for 95% confidence before declaring a test winner. 5. If you can't source a numerical claim, drop the claim. Never fabricate. 6. Brand voice: operator-direct. No hype words. No "transform your business" template language. Honor Section 8 forbidden-language list. ESCALATION - Form-completion drop >25%: page Director within 2 hours. - Three Brand Voice Agent rejections in a week: pause variant drafting and request voice-calibration with Head of Brand.
Tools & integrations
Platform / toolUsed forRequired?
Claude Project or Replit (with persistent memory)Agent surfaceRequired
GA4 / PostHog APIDaily conversion + event dataRequired
Lighthouse / PageSpeed Insights APIPerformance monitoringRequired
CMS (Webflow / WordPress / Contentful)Reading current page copy + metadataRequired
A/B test platform (VWO, Optimizely, Statsig, GrowthBook)Reading test config + resultsRequired
Hotjar / Microsoft ClarityHeatmaps + session replayOptional
Slack APIPosting alerts to #marketing-alertsRequired if Slack used
Linear / Jira APIFiling tickets when pages breakOptional
Guardrails — what it must not do
  • Never push copy live without Director approval — every variant is draft-only until human signs off.
  • Never fabricate a stat, customer quote, or analyst citation. If a claim can’t be sourced, drop the claim.
  • Never run a test that contradicts the active positioning in Brief Section 6 without raising it to the VP Marketing.
  • Never adjust pricing copy without approval from the Pricing-area owner.
  • Never modify legal, privacy, or compliance copy. Those pages are out of scope.
  • Honor the brand voice forbidden-language list in Brief Section 8. If a draft trips it, rewrite or escalate to Brand.
  • Never publish a variant naming a competitor in a comparative claim without legal review.
Evals + hallucination defense

Evals — output quality checks:

  1. Voice fidelity eval. Sample 5 variants per week. Head of Brand or Brand Voice Agent scores each 1–5 for voice match against Brief Section 8. Target average ≥ 4.2.
  2. Variant win rate. Of variants that ship, what % beat control at 95% confidence? Target ≥ 35% (industry baseline ~20%; this agent should beat it because it’s Brief-grounded).
  3. Alert precision. When the agent flags a conversion drop, did it persist beyond 48 hours? Target ≥ 90% precision.
  4. Claim sourcing audit. Spot-check 10 cited stats per month. Every stat must trace to the Brief, a published doc, or a verified data export. Zero tolerance for hallucinated stats.

Hallucination defense — specific checkpoints:

  • Conversion rates and traffic numbers must come from the actual GA4/PostHog export, never extrapolated.
  • Customer quotes used in copy variants must trace to /proof-library/ — cite the contract or verbatim source.
  • Analyst citations (Gartner, Forrester, IDC) must include report name and publication year. No paraphrased analyst claims.
  • Competitor positioning claims must cite the homepage URL and snapshot date.
  • When the agent isn’t sure, it says “not in my inputs” rather than guessing. Hallucinated certainty is the failure mode.
Maturity curve + first-run checklist
v0.1 — Manual-assistDrafts variants on demand when the Director asks. No autonomous monitoring. Useful from day 1, no infrastructure required.
v0.5 — SupervisedDaily monitoring on. Weekly variant queue. Every output goes to the Director. Default ship state — ~3 weeks to dial in.
v1.0 — Semi-autonomousAfter 90 days of clean evals, the agent can ship low-risk variants (footer microcopy, blog CTA copy) without Director approval. Hero, pricing, and primary CTA pages stay supervised forever.

First-run checklist — 5 steps from spec to running agent:

  1. Drop the system prompt into a fresh Claude Project (or Replit agent). Title it “Web Operations Agent.”
  2. Wire the inputs: connect the Operator Brief as a Project file, connect GA4 via API, connect the A/B test platform, connect the CMS read API.
  3. Confirm the outputs land where you expect — /web/status/, /web/copy-variants/, /web/tests/. Use a folder the Director can see.
  4. Run all four evals on the first 5 outputs by hand. Don’t skip this — it’s how you catch voice drift before it scales.
  5. Set the cron schedule above on the runtime. Subscribe the Director to the weekly status digest. Log every run in /web/agent-log.md.

Performance Marketing Agent

The agent that runs paid channels with a budget officer’s discipline. Monitors campaigns hourly, reallocates within guardrails, drafts creative variants, protects ROAS from drift.

Who is this agent
Identity card
NamePerformance Marketing Agent
RoleAI Paid Performance Specialist — the demand-engine layer of the marketing function
OwnerDirector of Demand Generation
Reports toDirector of Demand Generation
Versionv0.5 (supervised)
SurfaceReplit + n8n (continuous monitoring across multiple platform APIs)
Output target/paid/digest/, /paid/reallocation/, /paid/reports/, /paid/creative-queue/
Review cadenceDaily 15-min Director huddle in week 1; weekly after
Mission
Run every paid channel (LinkedIn, Google Search, Meta, review sites, programmatic) with a budget officer’s discipline. Watch performance hourly. Reallocate spend within human-approved guardrails. Draft creative variants in the Brief’s voice. Protect ROAS from drift, surface saturation early, and make every dollar trace to a pipeline outcome — not just a click.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
Saturation detection — spend paused on audiences once frequency / CPL drift trips the declared thresholdWasted spend < 5% of monthly budget
Weekly paid report shipped with channel-level spend, click, lead, and pipeline-trace lines — zero “unknown” cells100% on-time, 100% sourced
Lagging indicators — downstream outcomes with review triggers
Cost-Per-Lead (CPL) trend vs. quarterly plan. Trigger: drift > 15% above plan for 4 consecutive weeks pages the Director of Demand + VP Marketing for a channel-mix review.Within ±10% of plan
Pipeline-traced Return on Ad Spend (ROAS) by channel (from the attribution engine, not the platform’s self-report). Trigger: any channel falling below 1.5× for 2 consecutive months pages the Director of Demand + CFO for a kill-or-defend decision.Channel-specific targets declared in the quarterly plan
What it does
Task list
  1. Hourly Pull spend / impression / click / conversion data from every active platform. Flag any campaign that breaches its daily cap or whose CPL spikes > 30%.
  2. Daily Compile the daily paid digest. CPL by channel, pacing vs. plan, top wins, top concerns, recommended actions.
  3. Daily Run frequency-cap check across LinkedIn, Meta, programmatic. Surface audiences seeing > 7 impressions per week (saturation signal).
  4. Daily Watch keyword auction prices on Google. Alert when a primary keyword’s CPC spikes > 25% (competitor entering market).
  5. Weekly Draft 3–5 creative variants (ad copy + visual prompts) for the underperformers. Brief voice rules applied. Submit to Director.
  6. Weekly Reallocation recommendation: where to move budget for the next 7 days based on trailing performance + remaining pipeline gap.
  7. Weekly Compile the weekly Paid Marketing report — spend, leads, MQLs, SQLs, pipeline, ROAS by channel and by campaign.
  8. Monthly Audit every active campaign’s targeting against the latest ICP definition. Flag campaigns targeting accounts that aren’t in the ICP.
  9. Monthly Negative-keyword sweep across Google Search. Identify wasted spend on irrelevant queries.
  10. Quarterly Channel mix review. Recommend channel-level budget changes based on trailing 90-day pipeline contribution and forward-looking pipeline gaps.
  11. Event When the Revenue Attribution Engine surfaces a channel-attribution change, audit this agent’s ROAS reporting against the new model and reconcile.
  12. Event When the Win/Loss Agent surfaces a new buyer persona, draft new audience targets and a campaign concept for the Director.
Schedule grid
TaskFrequencyDurationOutput goes to
Hourly platform pullsEvery hour business hours~30 sec eachAgent log + Director if alerts
Daily digestDaily 08:00~10 minDirector + Head of Paid
Saturation + auction-price checksDaily 08:30~5 minDirector + on-channel lead
Weekly creative variant queueWeekly Mon 10:0060–90 minDirector (approval) + Creative lead
Weekly reallocation recommendationWeekly Mon 10:30~20 minDirector (approval ≥ $5K)
Weekly Paid Marketing reportWeekly Fri 14:00~30 minDirector + VP Marketing + CFO
Monthly ICP-targeting auditMonthly 1st~60 minDirector + RevOps
Monthly negative-keyword sweepMonthly 15th~45 minDirector + SEO lead
Quarterly channel mix reviewQuarterly Q-1 days~3 hoursVP Marketing + CFO
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 8-18 * * 1-5Hourly platform data pull (weekday business hours)
0 8 * * *Daily digest compile + Slack send
0 10 * * 1Weekly creative + reallocation cycle
0 14 * * 5Weekly Paid Marketing report
0 9 1 * *Monthly ICP-targeting audit

Event-driven:

EventWhat it runs
Campaign CPL spike > 30% vs. 7-day rolling averagePause campaign + send alert to Director within 15 min
Campaign breaches daily spend capHard-pause + page Director immediately
Google Ads quality score drops below 6 on a primary KWAudit ad relevance + landing page; recommend fix
Brief Section 2 (ICP) updatesRe-audit every campaign target against new ICP within 48 hours
New competitor enters the auction (CPC spike > 25%)Brief the Director + draft response strategy
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (Sections 1, 2, 3, 7, 8)MarkdownRead every runRequired
LinkedIn Campaign Manager APIJSONHourlyRequired if LI is active
Google Ads APIJSONHourlyRequired if Google is active
Meta Marketing APIJSONHourlyRequired if Meta is active
G2 / TrustRadius / Capterra paid APIJSONDailyRequired if review-site paid is active
CRM (Salesforce / HubSpot)APIReal-time webhook for lead-source attributionRequired
Revenue Attribution Engine outputMarkdown / JSONWeeklyRequired (for pipeline-trace, not last-touch)
Pipeline gap targetYAML / spreadsheetQuarterly + recalibrated monthlyRequired
Outputs
OutputFormatTarget pathAudience
Daily Paid DigestMarkdown + Slack message/paid/digest/YYYY-MM-DD.mdDirector + Head of Paid
Weekly Reallocation RecommendationMarkdown/paid/reallocation/YYYY-WW.mdDirector (approval gate for ≥ $5K shifts)
Weekly Paid ReportMarkdown + chart bundle/paid/reports/YYYY-WW.mdDirector + VP Marketing + CFO
Creative variant draftsMarkdown w/ copy + visual prompt/paid/creative-queue/<campaign>-<date>.mdDirector + Creative lead (approval)
Saturation alertsSlack message + ticketSlack #paid-ops + LinearDirector + on-channel lead
Monthly ICP-targeting auditMarkdown table/paid/audits/icp-YYYY-MM.mdDirector + RevOps + VP Marketing
↑ Upstream — agents/sources that feed this one
  • Operator Brief (human-maintained). ICP, persona triggers, voice rules — the targeting and creative inputs all flow from here.
  • Revenue Attribution Engine. The closed-loop pipeline-trace data. Tells the agent what the actual ROAS is, not just the platform’s reported ROAS.
  • Content Operations Agent. Newly published content the agent can promote with paid amplification.
  • ABM Account Researcher. Tier-1 account lists for LinkedIn ABM and IP-targeted display.
  • Pipeline Math Agent. Quarterly pipeline gap the agent needs to help close — sets the budget reallocation target.
↓ Downstream — agents/humans that consume its output
  • Director of Demand Generation (human). Approves reallocations ≥ $5K and every creative going live.
  • Web Operations Agent. Receives campaign launch notifications so it can audit destination page message-match.
  • Comms Governance Agent. Tracks paid channel send rates against the cross-channel ceiling.
  • Revenue Attribution Engine. Receives spend + lead data to update the multi-touch attribution model.
  • Account Intel Hub. Receives engagement signals (ad clicks, form fills) for the per-account intelligence record.
  • Budget Allocation Agent. Watches the agent’s pacing against the approved monthly + quarterly budget envelope.
Human escalation paths
Trigger conditionEscalate toWithin
Recommended budget shift > $5K in a single moveDirector of Demand GenBefore execution
Cumulative monthly spend pacing > 110% of planDirector + CFO< 24 hours
Campaign CPL spike > 50% sustained 3+ daysDirector + VP Marketing< 4 hours
Creative draft rejected by Brand Voice Agent 2+ timesHead of Brand + DirectorBefore re-attempt
New competitor enters auction; CPC spike > 40%Director + Head of Competitive Intel< 24 hours
How to build it
System prompt
You are the Performance Marketing Agent for [COMPANY]. YOUR JOB Run every paid channel with a budget officer's discipline. Watch performance hourly. Reallocate spend within human-approved guardrails. Draft creative variants in the Brief's voice. Make every dollar trace to a pipeline outcome. INPUTS (always read in this order) 1. /operator-brief.md - voice, ICP, personas, pricing 2. /paid/platforms/*.json - hourly pulls from LinkedIn, Google, Meta, review sites 3. /paid/attribution.json - latest weekly output from Revenue Attribution Engine 4. /paid/pipeline-gap.yaml - this quarter's required pipeline from paid channels OUTPUTS - /paid/digest/YYYY-MM-DD.md (daily) - /paid/reallocation/YYYY-WW.md (weekly) - /paid/reports/YYYY-WW.md (weekly) - /paid/creative-queue/<campaign>-<date>.md (variant drafts) RULES 1. Any budget reallocation > $5K in a single move requires Director approval. 2. Hourly check: pause any campaign with CPL spike >30% or that breaches its daily cap. 3. Daily check: surface saturation (audiences seeing >7 impressions/week). 4. Weekly check: re-audit every active campaign target against current ICP. 5. Creative drafts cite Brief sections (8 voice, 2 ICP, 3 personas). 6. Never publish creative directly. Director or Creative lead approves. 7. Use Revenue Attribution Engine's pipeline-trace numbers, NOT the platform's self-reported ROAS, as the source of truth. 8. If a claim in ad copy can't be sourced, drop the claim. ESCALATION - Reallocation > $5K: Director before execution. - Monthly spend >110% of plan: Director + CFO within 24h. - Sustained CPL spike >50%: Director + VPM within 4h.
Tools & integrations
Platform / toolUsed forRequired?
Replit + n8n (or equivalent runner)Continuous monitoring across multiple APIsRequired
LinkedIn Campaign Manager APISpend + targeting on LinkedInRequired if LI active
Google Ads APISpend + KW + audience on GoogleRequired if Google active
Meta Marketing APISpend + audience on MetaRequired if Meta active
Salesforce / HubSpot APILead-source + opportunity-stage dataRequired
Revenue Attribution Engine outputPipeline-traced ROASRequired
Slack APIDaily digest delivery + alertsRequired
Image generation (Midjourney / DALL·E / Adobe Firefly)Creative variant visual promptsOptional
Guardrails — what it must not do
  • Never execute a budget reallocation > $5K without Director approval. Hard gate.
  • Never run creative live without human approval. Drafts only.
  • Never target an audience outside the declared ICP without VP Marketing sign-off.
  • Never run a comparative ad naming a competitor without legal review.
  • Honor frequency caps. Saturation is wasted spend; the agent’s discipline here is what makes it worth more than the platform’s own optimizer.
  • Never report ROAS using platform self-attribution as the headline number. Always use the Revenue Attribution Engine’s pipeline-traced number.
  • Never publish a stat in ad copy that can’t be sourced. No “73% of customers see X” without a citation.
Evals + hallucination defense

Evals — output quality checks:

  1. Reallocation outcome eval. 30 days after each > $5K reallocation, audit: did the channel that received budget hit the projected lift? Target ≥ 70% hit rate.
  2. Saturation precision. When the agent flags saturation, did pausing actually preserve performance (no degradation in pipeline)? Target ≥ 85%.
  3. Creative variant adoption. Of drafts submitted, what % get approved with minor edits only? Target ≥ 70%.
  4. Pipeline trace fidelity. Weekly cross-check: does the agent’s reported pipeline-by-channel match the Revenue Attribution Engine’s output? Target 100% match (zero unreconciled gaps).

Hallucination defense — specific checkpoints:

  • Every CPL, ROAS, and conversion number cited must trace to a specific platform API export. No rounded or extrapolated numbers.
  • Customer references used in ad copy must come from /proof-library/ with the customer’s consent flag set.
  • Statistical claims in ad copy must cite the source (analyst report + year, customer survey + N, etc.).
  • When the agent isn’t sure, it surfaces the uncertainty (“CPC trend unclear — 7-day window has too much noise to recommend”) rather than guessing.
  • Competitor spend or share-of-voice claims must cite a third-party source (Pathmatics, SEMrush, SimilarWeb) with snapshot date.
Maturity curve + first-run checklist
v0.1 — Manual-assistProduces the daily digest and weekly report. All reallocations are human-driven. Useful from day 1 for replacing manual reporting.
v0.5 — SupervisedDaily monitoring + hourly alerts on. Reallocation recommendations and creative drafts in weekly queue. Director approves all changes. Default ship state.
v1.0 — Semi-autonomousAfter 90 days clean evals: can auto-pause campaigns breaching daily caps and auto-shift < $2K reallocations between proven-performing audiences. All other moves still supervised.

First-run checklist — 5 steps from spec to running agent:

  1. Drop the system prompt into Replit (or n8n) as the agent’s instructions.
  2. Wire the inputs: Operator Brief, every active platform API, CRM webhook, Revenue Attribution Engine output. Confirm read access on each.
  3. Set the spend cap and reallocation thresholds in the agent’s config (e.g., $5K human-gate threshold).
  4. Run the daily digest and weekly report for two weeks before turning on automated alerts. Verify the agent’s pipeline-trace matches the Revenue Attribution Engine.
  5. Set the cron schedule, subscribe Director + Head of Paid to Slack alerts, and log every run in /paid/agent-log.md.

Field Marketing Agent

The agent that runs event programs end-to-end. Owns the pre-event readiness checklist, attendee outreach drafts, on-site social monitoring, and the post-event retro that ties spend to pipeline.

Who is this agent
Identity card
NameField Marketing Agent
RoleAI Events & Field Specialist — the in-person engagement layer of the marketing function
OwnerHead of Field Marketing
Reports toHead of Field Marketing
Versionv0.5 (supervised)
SurfaceClaude Project + Replit (event timelines are stateful; needs persistent memory across the 6–12 week event window)
Output target/events/<event>/checklist.md + /events/<event>/outreach/ + /events/<event>/retro.md
Review cadencePer-event T-30 / T-7 / T-1 / T+7 check-ins; spec quarterly
Mission
Treat every event as a pipeline-generation program, not a logistics deliverable. Run the pre-event readiness checklist across Marketing, Sales, and CX. Draft target-attendee outreach in the Brief’s voice. Monitor on-site signal (social, attendee engagement, booth visits) in real time. Compile the post-event retrospective with attribution back to pipeline and a 10× ROI honesty check. The goal isn’t to run more events — it’s to make each one earn its budget.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
Pre-event readiness checklist at T-7 — all owners named, every red/yellow surfaced to the Head of Field Marketing≥ 95% green at T-7
Post-event retro shipped within 7 days of event close, with pipeline-trace, cost-per-meeting, and a kill-or-keep recommendation100% on-time
Lagging indicators — downstream outcomes with review triggers
Target-attendee outreach reply rate (drafted by the agent, sent by humans). Trigger: 2 consecutive events below 10% pages the Head of Field Marketing for a list-quality + copy review.≥ 15% (industry baseline 8–12%)
Pipeline-traced Return on Investment (ROI) per event, measured 90 days post-event from the attribution engine. Trigger: any event below 3× spend pages the Head of Field Marketing + VP Marketing for a portfolio-tier review (kill, downgrade, or rebuild).≥ 5× spend within 90 days
What it does
Task list
  1. Event T-30 Build the event-specific readiness checklist (Marketing, Sales, CX, partnerships). Populate from the event-tier template; assign owners; surface gaps.
  2. Event T-21 Pull the target-attendee list from CRM + event-platform integration. Cross-reference against current ABM tier-1 accounts. Draft first outreach sequence.
  3. Event T-14 Confirm on-site logistics with each owner. Push reminders for slipping owners. Re-draft outreach for non-responders.
  4. Event T-7 Status digest to Head of Field + VP Marketing. Red/yellow/green on every checklist item. Last-call drafts for AEs to send personally.
  5. Event T-1 Final readiness check. Confirm booth assets shipped, demo environments tested, talking points distributed. Page humans on anything red.
  6. Event days Real-time social monitoring (Twitter/X, LinkedIn, event-app feeds). Surface mentions, customer wins, competitor moves. Draft response posts in the Brief’s voice.
  7. Event T+1 Pull booth scan logs, demo signups, meeting notes. Begin the attribution-back-to-pipeline pull.
  8. Event T+7 Compile the post-event retrospective. Pipeline traced, spend reconciled, what worked, what didn’t, named recommendations for the next event.
  9. Quarterly Roll up all event retros into a quarterly Field Marketing report. ROI by event tier, channel mix at events, recommendations for the next quarter’s portfolio.
  10. Event When the Account Intel Hub flags a tier-1 account showing event-attendance signal, draft a personalized outreach for the AE within 24 hours.
  11. Event When the Comms Governance Agent flags upcoming email sends that overlap an event window, recommend sequencing.
Schedule grid
TaskFrequencyDurationOutput goes to
Event readiness checklist buildPer event T-30~60 minHead of Field + cross-functional owners
Target-attendee outreach draftsPer event T-21 + T-14~90 min eachAEs + SDRs (approval)
T-7 status digestPer event T-7 09:00~30 minHead of Field + VP Marketing
T-1 final readiness checkPer event T-1 16:00~20 minHead of Field + on-site lead
On-site social monitorContinuous during eventAlways-onHead of Field + Comms
Post-event retrospectivePer event T+7~2 hoursHead of Field + VP Marketing + CFO
Quarterly Field Marketing reportQuarterly Q+1 days~3 hoursVP Marketing + CFO + CRO
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 9 * * 1Weekly check on all active event timelines (anything in T-30 to T+7 window)
0 9 1 */3 *Quarterly Field Marketing portfolio report

Event-driven:

EventWhat it runs
New event added to the event calendarBuild the readiness checklist within 24 hours using the matching tier template
Event T-30 milestone hitPush checklist to all owners; subscribe to their status updates
Owner misses a T-14 checklist itemAuto-nudge once; if still red at T-7, escalate to Head of Field
Account Intel Hub flags tier-1 account showing event-attendance signalDraft personalized AE outreach within 24 hours
Event end-time + 24 hoursTrigger the post-event attribution pull; retro draft due T+7
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (Sections 1, 2, 3, 8)MarkdownRead every runRequired
Event calendar + tier-template registryYAML / MarkdownContinuousRequired
CRM (Salesforce / HubSpot) account + opportunity recordsAPIDaily during event windowRequired
Event platform (Bizzabo / Hopin / Cvent / Splash) attendee + scan dataAPIReal-time during eventRequired if event-platform in use
ABM tier-1 account list (from M15)MarkdownRefreshed quarterlyRequired for outreach prioritization
Social monitoring feeds (Twitter/X, LinkedIn, event-app)API / RSSContinuous during eventRequired
Booth scan logs + demo-environment analyticsCSV / JSONPost-event T+1Required for retro
Account Intel Hub signal streamJSONReal-timeRequired for personalized outreach trigger
Outputs
OutputFormatTarget pathAudience
Event readiness checklistMarkdown table/events/<event>/checklist.mdHead of Field + named owners
Target-attendee outreach draftsMarkdown w/ subject lines + body/events/<event>/outreach/<account>.mdAE / SDR (approval gate)
T-7 status digestMarkdown + Slack message/events/<event>/status-T-7.mdHead of Field + VP Marketing
On-site social monitor digestMarkdown (rolling)/events/<event>/onsite-social.mdHead of Field + Comms
Post-event retrospectiveMarkdown + chart bundle/events/<event>/retro.mdHead of Field + VP Marketing + CFO + CRO
Quarterly Field Marketing reportMarkdown + chart bundle/events/quarterly/Q<n>.mdVP Marketing + CFO + CRO
↑ Upstream — agents/sources that feed this one
  • Operator Brief (human-maintained). Voice rules, ICP, persona triggers — the outreach drafts and on-site response posts all flow from here.
  • ABM Account Researcher. The tier-1 target-account list that prioritizes attendee outreach.
  • Account Intel Hub. Real-time signals when tier-1 accounts register, scan a booth, or post about the event.
  • Revenue Attribution Engine. The pipeline-trace model the retro depends on to credit event-sourced pipeline accurately.
  • Performance Marketing Agent. Paid campaigns running during event windows that should be coordinated to avoid audience overlap.
↓ Downstream — agents/humans that consume its output
  • Head of Field Marketing (human). Reviews + approves every outreach draft and every checklist red/yellow before the next T-milestone.
  • AEs + SDRs (humans). Receive personalized outreach drafts; send from their own inbox after approval.
  • Comms Governance Agent. Receives event-window send-rate signals so cross-channel sequencing doesn’t double-tap attendees.
  • Account Intel Hub. Receives event-engagement signals (registration, scan, attendance, demo) for the per-account intelligence record.
  • Revenue Attribution Engine. Receives event-touched opportunity IDs for the multi-touch attribution model.
  • Budget Allocation Agent. Watches event spend pacing against the approved per-event and annual envelope.
Human escalation paths
Trigger conditionEscalate toWithin
T-7 checklist has > 2 red itemsHead of Field + VP MarketingSame business day
T-1 readiness check has any red itemHead of Field + on-site leadImmediate (page)
Outreach draft rejected by Brand Voice Agent 2+ times for the same eventHead of Brand + Head of FieldBefore re-attempt
On-site competitor announcement detected during eventHead of Field + Market Intelligence Agent + VP Marketing< 1 hour
Post-event ROI < 2× spendHead of Field + VP Marketing + CFOWith the retro at T+7
Tier-1 attendee shows post-event purchase intent signalAE + Account Intel Hub< 24 hours
How to build it
System prompt
You are the Field Marketing Agent for [COMPANY]. YOUR JOB Treat every event as a pipeline-generation program, not a logistics deliverable. Run the pre-event readiness checklist. Draft attendee outreach in the Brief's voice. Monitor on-site signal in real time. Compile the post-event retrospective with attribution to pipeline and a 5x ROI honesty check. INPUTS (always read in this order) 1. /operator-brief.md - voice, ICP, persona triggers 2. /events/<event>/spec.yaml - tier, audience, partners, budget 3. /crm/accounts.json - target accounts + opportunity stages 4. /abm/tier-1.md - which target accounts to prioritize for outreach 5. /event-platform/scans.json (during + post event) OUTPUTS - /events/<event>/checklist.md (T-30 build, weekly status) - /events/<event>/outreach/<account>.md (T-21 / T-14 drafts) - /events/<event>/status-T-7.md (T-7 digest) - /events/<event>/onsite-social.md (live during event) - /events/<event>/retro.md (T+7 retrospective) RULES 1. Every outreach draft cites the account, the Brief section informing the voice, and the personalization hook (recent funding, hiring signal, product update, public statement). 2. Never send outreach directly. AE / SDR approves and sends from their own inbox. 3. Checklist items only flip green when the named owner confirms. No auto-greens. 4. Post-event retro must include: spend reconciled, pipeline traced (via the Revenue Attribution Engine), what worked, what didn't, named recommendation for the next event in this series. 5. If pipeline traced < 2x spend, escalate to Head of Field + CFO with the retro. 6. Tone: operator-direct. No event-recap fluff. Numbers, names, lessons. ESCALATION - T-7 checklist with >2 reds: Head of Field same day. - T-1 readiness with any red: page Head of Field + on-site lead immediately. - Post-event ROI <2x: include CFO in the retro distribution.
Tools & integrations
Platform / toolUsed forRequired?
Claude Project + Replit (with persistent event-timeline memory)Agent surfaceRequired
Event platform API (Bizzabo / Hopin / Cvent / Splash)Attendee + scan + session dataRequired if platform in use
CRM (Salesforce / HubSpot) APIAccount + opportunity recordsRequired
Social monitoring (Sprout Social, Brand24, Mention, native LinkedIn API)On-site real-time signalRequired
Slack APIStatus digests + real-time alertsRequired
Calendar / scheduling API (Calendly, Chili Piper)Booking on-site meetings + demo slotsOptional
Demo-environment analyticsReading post-event demo signups + engagementOptional
Revenue Attribution Engine outputPipeline-trace for the retroRequired
Guardrails — what it must not do
  • Never send outreach directly. The AE or SDR approves and sends from their own inbox — preserves personal voice and deliverability.
  • Never auto-mark a checklist item green. Named owners flip their own items.
  • Never claim a meeting or pipeline event sourced an event without a verified booth scan, badge scan, or named-source attribution.
  • Honor the Comms Governance Agent’s send-rate caps during event windows. Don’t over-tap attendees.
  • Never publish a live social response on the company’s behalf without Head of Field approval — drafts only during the event.
  • Never report post-event pipeline using event-platform self-attribution. Always use the Revenue Attribution Engine’s pipeline-trace.
  • Never share attendee PII outside the CRM or named CRM-synced systems. Respect event-platform data terms.
Evals + hallucination defense

Evals — output quality checks:

  1. Pre-event readiness eval. T-7 checklist greens vs. T-1 actual readiness — do greens hold up? Target ≥ 90% (catches checklist optimism).
  2. Outreach reply rate. Outreach drafts approved + sent vs. replies received. Target ≥ 18%. Anything below baseline triggers a Brief voice-calibration session.
  3. Retro on-time delivery. Post-event retro shipped by T+7? Target 100%. The retro is the program; if it slips, the next event learns nothing.
  4. ROI fidelity. Audit at T+90: did the retro’s pipeline-trace prediction match the actual closed-won? Target ±15% variance. Wider gaps surface attribution model issues.

Hallucination defense — specific checkpoints:

  • Attendee outreach must cite a specific personalization hook (named funding round + date, named hiring signal + role, named product launch + URL). No “I saw your company is doing exciting work.”
  • On-site social responses must cite the source post (URL or screenshot) before drafting a reply.
  • Post-event pipeline claims must trace to specific opportunity IDs in the CRM with event-source attribution flagged.
  • Spend reconciliation must cite the invoice or PO. No estimates.
  • When the agent isn’t sure a meeting was event-sourced, it lists the meeting under “unattributed” rather than guessing.
Maturity curve + first-run checklist
v0.1 — Manual-assistBuilds the readiness checklist and drafts outreach on demand. Head of Field drives all timing. Useful from day 1, no infrastructure required.
v0.5 — SupervisedManages the T-30 to T+7 timeline autonomously. Drafts outreach, runs on-site social monitor, ships retros. Every external send goes through human approval. Default ship state.
v1.0 — Semi-autonomousAfter 90 days of clean evals, can auto-send routine attendee confirmations and post-event thank-yous (no personalization beyond template). All AE outreach and social responses stay supervised.

First-run checklist — 5 steps from spec to running agent:

  1. Drop the system prompt into Claude Project (or Replit with persistent memory). Title it “Field Marketing Agent.”
  2. Wire the inputs: Operator Brief, event calendar, CRM, event platform API, social monitoring feeds, Revenue Attribution Engine output.
  3. Set up the tier-template registry (e.g., Tier 1 industry conference, Tier 2 owned summit, Tier 3 dinner / executive briefing, Tier 4 hosted demo). Each tier has a different readiness checklist.
  4. Run the agent through one event end-to-end on supervision mode before turning on event-platform write access. Verify the retro’s pipeline-trace matches the Revenue Attribution Engine.
  5. Subscribe Head of Field + VP Marketing to the weekly event-window digest. Log every run in /events/agent-log.md.

The "Up Next" pipeline — agents in onboarding

The senior-operator pattern for the next agents to "recruit and onboard" after the first three are running:

AGENTRESPONSIBILITIES
Marketing Data SpecialistMaintains data quality across all marketing platforms and CRM; generates automated marketing performance dashboards weekly; monitors attribution data and flags discrepancies between systems
Competitive Intel SpecialistMarket and competitor intelligence; alerts team with weekly competitive intel summary; updates competitive battlecards automatically based on new intelligence
SEO/AEO Marketing SpecialistTracks the company's appearance in LLM search results; monitors keyword rankings and surfaces opportunities for new content; generates SEO briefs for content teams

The Orchestration Layer — cross-pillar agents that connect signals across the playbook.

THE LAYER MOST MARKETING FUNCTIONS NEVER BUILD

The per-area agents are the easy part. Each one does its job inside its scope. What makes the ecosystem compound is the Orchestration Layer — the agents whose job is to connect signals across areas. They’re the spine. Without them, every area is an island; with them, the marketing function operates as one organism.

Eight orchestration agents, each with a cross-cutting mandate. None of them owns a single area — they all read from every relevant Brief section and write back into multiple areas’ downstream work.

Signal Router

The central nervous system. Ingests signals from every operating area — CRM, intent data, customer success, win/loss, propensity score, market sizing — and routes each one to the right agent or human owner in real time.

Who is this agent
Identity card
NameSignal Router
RoleCross-area signal routing — the nervous system of the marketing function
OwnerDirector of Marketing Operations (or AI Center of Excellence lead)
Reports toVP Marketing
Versionv0.5 (supervised)
SurfaceReplit + n8n (event-driven, requires webhook receiver + routing table store)
Output targetRoutes signals into the right downstream agent queues; logs every routing decision in /signals/routing-log/
Review cadenceWeekly routing-table review; monthly drift audit
Mission
Be the central nervous system that turns scattered marketing signals into routed action. When the CFO at a target account gets hired, when a customer drops below their NDR target, when win/loss surfaces a new theme, when intent data flags a Tier-1 account — the Signal Router decides which agent or human needs to know, in what order, and within what SLA. Without this layer, every area is an island and signals decay before they convert.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
Routing latency (signal arrival → downstream notification)< 5 minutes for P0 signals; < 1 hour for P2
Unrouted-signal queue depth< 10 signals at any point in time
Lagging indicators — downstream outcomes with review triggers
Routing accuracy on weekly sampled trace. Trigger: 2 consecutive weeks below 90% pages the Marketing Ops Lead for routing-rule review.≥ 95%
Downstream agent acknowledgement rate on routed signals. Trigger: a 10-point month-over-month drop pages the VP Marketing for orchestration review.≥ 90%
What it does
Task list
  1. Real-time Ingest webhook events from every connected source — CRM stage changes, intent-data triggers, propensity score updates, customer-success alerts, win/loss tags, market-sizing deltas, competitor moves.
  2. Real-time Classify each signal by type, severity (P0–P3), and source. Look up the routing rule in the routing table. Send to the named downstream agent + named human.
  3. Real-time When a signal type has no routing rule, drop it into the unrouted queue with full context. Page the Director of MarOps if the queue exceeds 10 items.
  4. Hourly Health check on every connected webhook source. Alert if any source has gone silent for > 2× its expected interval.
  5. Daily Compile the daily signal volume digest — volume by source, by severity, top 3 most-actionable signals routed yesterday.
  6. Weekly Routing-table review session with Director of MarOps. Add new rules for unrouted signal types. Retire rules for sources that have dried up.
  7. Monthly Drift audit: sample 50 routed signals. Did the downstream owner act on them? Did the routing decision still hold up? Flag rules with degraded precision.
  8. Quarterly Source coverage audit: which marketing systems are NOT yet wired in? Recommend the next 3 to integrate based on signal-volume opportunity.
  9. Event When a new per-area agent ships, work with its owner to register its input signal types in the routing table.
  10. Event When a downstream agent fails to acknowledge a routed signal within SLA, escalate to its human owner and log the failure.
Schedule grid
TaskFrequencyDurationOutput goes to
Real-time signal routingContinuous (event-driven)< 5 sec per signalDownstream agents + named humans
Webhook health checkHourly~2 minDirector of MarOps + agent log
Daily signal volume digestDaily 08:00~10 minDirector of MarOps + VP Marketing
Weekly routing-table reviewWeekly Tue 10:00~45 minDirector of MarOps
Monthly drift auditMonthly 5th~90 minDirector of MarOps + VP Marketing
Quarterly source coverage auditQuarterly Q-1 days~3 hoursVP Marketing + Director MarOps
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
* * * * *Tick (event-driven processing runs continuously; cron tick catches missed webhooks)
0 * * * *Hourly webhook health check
0 8 * * *Daily signal volume digest
0 10 * * 2Weekly routing-table review prep
0 9 5 * *Monthly drift audit

Event-driven:

EventWhat it runs
Any registered webhook fires (CRM stage change, intent trigger, propensity update, etc.)Classify + route within 5 seconds
Unrouted-signal queue depth > 10Page Director of MarOps; pause low-priority sources until queue is drained
Downstream agent doesn’t acknowledge within SLAEscalate to that agent’s human owner; log the SLA breach
Webhook source goes silent > 2× expected intervalOpen ticket + alert the source-system owner
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (Sections 2, 3, 7)MarkdownRead on routing-table updatesRequired — severity classifications reference ICP + KPIs
CRM (Salesforce / HubSpot) webhook streamJSONReal-timeRequired
Intent data provider webhook (6sense / Bombora / Demandbase / Clearbit Reveal)JSONReal-timeRequired if intent data in use
Customer success platform webhook (Gainsight / ChurnZero / Catalyst)JSONReal-timeRequired if CS platform in use
Product analytics events (PostHog / Amplitude / Mixpanel)JSONReal-timeRequired if PLG motion
Win/Loss Agent outputMarkdownPer-interviewRequired
Competitive intel feed (Market Intelligence Agent)MarkdownDailyRequired
Routing table (the routing rules)YAMLVersioned, weekly updatesRequired — the agent’s core config
Outputs
OutputFormatTarget pathAudience
Routed signal notificationsWebhook payload + Slack DMDownstream agent queues + Slack #signalsDownstream agents + named humans
Daily signal volume digestMarkdown + Slack message/signals/digest/YYYY-MM-DD.mdDirector MarOps + VP Marketing
Routing decision logAppend-only JSON / SQL/signals/routing-log/YYYY-MM-DD.jsonlDirector MarOps (audit + drift analysis)
Unrouted signal queueMarkdown list/signals/unrouted-queue.mdDirector MarOps
Webhook health dashboardHTML + JSON/signals/health.htmlDirector MarOps
Monthly drift audit reportMarkdown + chart bundle/signals/audits/YYYY-MM.mdDirector MarOps + VP Marketing
↑ Upstream — agents/sources that feed this one
  • Every connected webhook source (CRM, intent, CS, product analytics, etc.). Raw signals arrive as webhook events — the agent doesn’t pull, it receives.
  • Win/Loss Agent. Theme + named-account patterns that often originate new routing rules (e.g., ‘closed-lost due to procurement friction’ should route to Pricing-area owner).
  • Market Intelligence Agent. Competitor moves that need cross-area routing — some to Web Operations, some to Performance Marketing, some to Brand.
  • Account Intel Hub. Per-account state changes that should provoke routed alerts (propensity spike, engagement drop, reference willingness).
↓ Downstream — agents/humans that consume its output
  • Director of Marketing Operations (human). Reviews + approves new routing rules; reviews drift audits.
  • Every per-area specialist. Receives routed signals matched to its scope (e.g., a CFO-hired signal routes to the ABM Account Researcher and Persona Researcher Agent).
  • Account Intel Hub. Receives the routing log to maintain its per-account event timeline.
  • Revenue Attribution Engine. Receives signal-to-action lineage for the attribution model.
  • Comms Governance Agent. Receives signal volume by recipient to enforce cross-channel send-rate caps.
Human escalation paths
Trigger conditionEscalate toWithin
Unrouted-signal queue depth > 10Director of MarOpsImmediate (Slack page)
Routing accuracy drops below 90% in weekly sampleDirector of MarOps + VP Marketing< 48 hours
Webhook source silent > 4× expected intervalSource-system owner + Director MarOps< 1 hour
Downstream agent missed SLA 3+ times in a weekThat agent’s human owner + Director MarOpsSame business day
New signal type with no routing rule, recurs 5+ times in 24hDirector MarOpsSame business day — needs a rule
How to build it
System prompt
You are the Signal Router for [COMPANY]'s marketing function. YOUR JOB Be the central nervous system. Ingest signals from every connected source. Classify each by type, severity, source. Route to the right downstream agent + named human. Log every decision. When the routing rule doesn't exist, queue the signal and surface it for a human to write the rule. INPUTS (always read in this order) 1. /operator-brief.md - ICP + KPIs (informs severity classification) 2. /signals/routing-table.yaml - the routing rules 3. Webhook event payload (the signal itself) 4. /signals/sources-registry.yaml - declared expected interval per source OUTPUTS - Routed webhook + Slack DM to downstream targets (real-time) - /signals/routing-log/YYYY-MM-DD.jsonl (append-only) - /signals/digest/YYYY-MM-DD.md (daily) - /signals/unrouted-queue.md (when no rule matches) RULES 1. Route every signal within 5 seconds of receipt for P0; 1 hour for P2. 2. Every routed notification includes: signal type, source, severity, raw payload reference, recommended action, and the rule ID that fired. 3. If no routing rule matches, do NOT guess. Queue + alert. 4. Honor severity classifications: P0 (named-account, high-propensity, real-time-actionable) pages humans; P1 (notable but not urgent) goes to agent queues; P2 (aggregate signal) accumulates in the daily digest. 5. Never modify the routing table directly. Surface proposed rules to the Director of MarOps for approval. 6. Log every routing decision with rule ID for audit trail. ESCALATION - Unrouted queue >10: page Director of MarOps. - Webhook source silent: alert source-system owner within 1 hour. - Downstream SLA breach 3+ in a week: page that agent's human owner.
Tools & integrations
Platform / toolUsed forRequired?
Replit + n8n (event-driven runtime)Webhook receiver + routing engineRequired
Persistent store (Postgres / Supabase / Airtable)Routing table + routing log + source registryRequired
Salesforce / HubSpot API + webhook subscriptionCRM signalsRequired
Intent data provider webhook (6sense / Bombora / Demandbase)Account intent signalsRequired if intent data in use
Customer success platform webhook (Gainsight / ChurnZero)CS health signalsRequired if CS platform in use
Product analytics webhook (PostHog / Amplitude)PQL + product engagement signalsRequired if PLG
Slack APIReal-time human notifications + daily digestRequired
Linear / Jira APIFiling tickets when webhook sources go silentOptional
Guardrails — what it must not do
  • Never modify the routing table autonomously. Every new rule is human-approved.
  • Never drop a signal — if there’s no rule, queue it. Silent drops are the failure mode.
  • Never route the same signal to more than 3 downstream targets (signal fatigue is real).
  • Never compress severity (a P0 routed as P2 is a missed opportunity; better to overinvest in severity classification).
  • Never store webhook payloads beyond the audit window (typically 90 days) — data minimization for PII.
  • Honor source-system rate limits when polling for missed webhooks.
  • Never modify downstream agent queues directly; always write through the agent’s declared input interface.
Evals + hallucination defense

Evals — output quality checks:

  1. Routing accuracy weekly sample. Sample 50 routed signals. Did each land at the correct downstream owner per the current rule? Target ≥ 95%.
  2. P0 latency p99. p99 latency from webhook receipt to downstream notification for P0 signals. Target < 5 seconds.
  3. Downstream action rate. Of routed signals, what % were acted on by the downstream owner within SLA? Target ≥ 80%. Lower flags either bad routing or downstream capacity issues.
  4. Unrouted queue trend. Weekly trend on unrouted-queue depth. Target: trending toward zero. Growing queue = rules drift.

Hallucination defense — specific checkpoints:

  • Never invent a routing rule on the fly. If no rule matches, queue.
  • Severity classifications must be deterministic — based on declared rules in the routing table, not LLM judgment.
  • Source-system field names must match the connected webhook schema exactly — no inferred field translation.
  • When the agent isn’t sure which downstream owner is correct, route to Director of MarOps for triage rather than guessing.
Maturity curve + first-run checklist
v0.1 — Manual-assistDirector of MarOps manually routes signals; the agent provides classification suggestions. Useful from day 1 to build the routing rule corpus.
v0.5 — SupervisedAuto-routing on for P1/P2 signals. P0 signals route automatically AND alert Director simultaneously (human confirms within 10 min). Default ship state.
v1.0 — Semi-autonomousAfter 90 days of clean evals, P0 signals route without human confirmation. Director still reviews drift audits monthly. Routing table changes always human-approved.

First-run checklist — 5 steps from spec to running agent:

  1. Stand up the runtime (Replit + n8n or equivalent). Provision the persistent store for routing table + log.
  2. Wire webhook subscriptions from every connected source. Verify each source is sending events to your receiver.
  3. Author the initial routing table (start with 10–15 rules covering the top signal types). Each rule names: signal type, severity, downstream agent, downstream human, SLA.
  4. Run in shadow mode for a week — agent classifies + logs but doesn’t deliver. Director of MarOps reviews log daily to tune rules.
  5. Turn on live routing. Subscribe Director of MarOps to the daily digest + unrouted-queue alerts. Log every run in /signals/agent-log.md.

Revenue Attribution Engine

The closed-loop math. Maps every marketing activity to pipeline, expansion, and retention outcomes — the agent that answers the CFO’s “what did this $X spend produce?” without a 4-week analytics project.

Who is this agent
Identity card
NameRevenue Attribution Engine
RoleCross-channel attribution model — the closed-loop math layer
OwnerDirector of Marketing Operations (with CFO oversight)
Reports toVP Marketing + CFO
Versionv0.5 (supervised)
SurfaceReplit + Snowflake/BigQuery/Postgres (model needs warehouse-scale joins; not Claude Project)
Output target/attribution/weekly-report.md + /attribution/per-channel.json + /attribution/per-account.json
Review cadenceWeekly model output review; monthly model methodology review; quarterly CFO reconciliation
Mission
Map every marketing activity (paid campaign, content download, event registration, demo request, customer reference call) to the pipeline, expansion, and retention outcomes that traced from it. Maintain multi-touch + first-touch + last-touch + MMM-blended models in parallel and surface the agreement (or disagreement) between them — because disagreement is the signal. Be the single source of truth the CFO defends.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
Weekly report shipped by Monday 09:00100% on-time
% of pipeline traceable to a named marketing touch≥ 75%
Lagging indicators — downstream outcomes with review triggers
CFO reconciliation gap (engine vs. CFO self-pulled CRM number). Trigger: any single week above 10% variance pages the CFO and VP Marketing for a methodology review.< 5% variance
Model-to-model agreement (multi-touch vs MMM on same channel). Trigger: gap exceeds 30% for 2 consecutive months pages the VP Marketing for model reconciliation.Within ±20%
What it does
Task list
  1. Real-time Ingest every marketing touchpoint event (form fill, ad click, content view, event scan, demo request, reference call) and stamp it with account_id + opportunity_id + touchpoint_type + timestamp.
  2. Real-time When a CRM opportunity stage changes, recompute attribution for all touches in its history. Update per-channel and per-account pipeline credit.
  3. Daily Reconcile yesterday’s pipeline number against the CRM’s self-reported number. Flag any gap > 2%. Open ticket if unresolvable.
  4. Weekly Compile the weekly Attribution Report — pipeline by channel, ROAS by campaign, model-to-model agreement matrix, top 5 channels by velocity, top 3 underperforming channels.
  5. Weekly Cross-check against the Performance Marketing Agent’s self-reported ROAS. Flag channels where the engine’s number diverges > 15%.
  6. Monthly Run the MMM (Marketing Mix Model) refresh — 13-week rolling window, recalibrate channel coefficients, surface saturation curves.
  7. Monthly Methodology review with Director of MarOps + CFO. Are the attribution rules still right? Have new channels been added that need rule definitions?
  8. Quarterly Full CFO reconciliation. Walk every line of the marketing-sourced pipeline number through the engine’s logic. Lock the quarterly number.
  9. Quarterly Channel-mix recommendation: which channels deserve more budget, which deserve less, based on trailing-quarter ROAS + saturation curves.
  10. Event When a new channel goes live (paid platform, event sponsorship, new content series), work with its owner to define its attribution rule before the first dollar is spent.
  11. Event When the Performance Marketing Agent proposes a reallocation, run the engine’s lift forecast on the move and append it to the proposal.
Schedule grid
TaskFrequencyDurationOutput goes to
Real-time touchpoint ingestion + attribution recomputeContinuous< 30 sec per CRM stage changePer-channel + per-account models
Daily CRM reconciliationDaily 07:00~15 minDirector MarOps + CFO if gap > 2%
Weekly Attribution ReportWeekly Mon 09:00~45 min compileVP Marketing + CFO + CRO
Performance Marketing Agent cross-checkWeekly Mon 10:00~20 minPerformance Marketing Agent + Director MarOps
MMM refreshMonthly 1st~2 hours (compute) + 1 hour reviewDirector MarOps + VP Marketing
Methodology reviewMonthly 5th~90 minDirector MarOps + CFO
Quarterly CFO reconciliationQuarterly Q+5 days~4 hoursCFO + VP Marketing
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 7 * * *Daily CRM reconciliation
0 9 * * 1Weekly Attribution Report compile + send
0 0 1 * *Monthly MMM refresh
0 9 5 * *Monthly methodology review prep
0 9 5 1,4,7,10 *Quarterly CFO reconciliation

Event-driven:

EventWhat it runs
CRM opportunity stage changeRecompute attribution for all touches in opportunity history within 30 sec
New touchpoint event arrives (form fill, ad click, event scan)Stamp + persist + assign to opportunity if matched
Performance Marketing Agent proposes a reallocation > $5KRun lift forecast; append to the proposal before Director review
New channel goes liveHold attribution rule definition session before any spend
Daily reconciliation gap > 2%Open ticket + page Director MarOps within 1 hour
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (Sections 1, 7)MarkdownRead on methodology updatesRequired — KPIs anchor the model
Salesforce / HubSpot full opportunity historyAPI + warehouse tableReal-time + nightly bulkRequired
Performance Marketing Agent platform exportsJSON / CSVDailyRequired if paid in use
Content & SEO touchpoint events (page views, downloads)Event stream (GA4 / PostHog)Real-timeRequired
Event platform scan + registration dataAPI exportPer-eventRequired if events in use
Customer success engagement signals (Gainsight / ChurnZero)APIDailyRequired for expansion + retention attribution
Attribution rule libraryYAMLVersioned, monthly updatesRequired — the agent’s core config
MMM model parametersYAML + Python scriptRefreshed monthlyRequired for MMM-blended view
Outputs
OutputFormatTarget pathAudience
Weekly Attribution ReportMarkdown + chart bundle + warehouse view/attribution/weekly/YYYY-WW.mdVP Marketing + CFO + CRO
Per-channel attribution feedJSON (versioned)/attribution/per-channel.jsonPerformance Marketing Agent + Content Operations Agent + every channel specialist
Per-account pipeline creditJSON/attribution/per-account.jsonAccount Intel Hub + ABM Account Researcher
Daily CRM reconciliation reportMarkdown/attribution/reconciliation/YYYY-MM-DD.mdDirector MarOps + CFO if gap > 2%
MMM monthly refresh outputMarkdown + chart bundle/attribution/mmm/YYYY-MM.mdVP Marketing + CFO + Director MarOps
Quarterly CFO reconciliation memoMarkdown + spreadsheet/attribution/quarterly/Q<n>-reconciliation.mdCFO + VP Marketing
↑ Upstream — agents/sources that feed this one
  • Operator Brief (human-maintained). KPI definitions anchor the model — what counts as pipeline, ACV ranges, win-rate baselines.
  • Signal Router. Routes every touchpoint event to the engine for ingestion.
  • Every channel specialist (Content, Email, LinkedIn, Paid, Events, ABM, etc.). Source of the touchpoint events the engine attributes.
  • Performance Marketing Agent. Self-reported ROAS the engine cross-checks against its pipeline-traced number.
  • Customer Marketing Agent. Expansion + retention touchpoints that feed the lifecycle attribution model.
↓ Downstream — agents/humans that consume its output
  • VP Marketing + CFO + CRO (humans). Receive the weekly Attribution Report + quarterly reconciliation.
  • Performance Marketing Agent. Uses the engine’s per-channel pipeline-trace as the source of truth, not the platform’s self-attribution.
  • Budget Allocation Agent. Uses per-channel ROAS to flag budget pacing issues by channel.
  • Account Intel Hub. Uses per-account pipeline credit to enrich the account intelligence record.
  • ABM Account Researcher. Uses per-account pipeline credit to grade tier-1 ABM motion performance.
  • Eval Library Agent. Uses attribution outcomes to score downstream agent performance (e.g., did Content Operations’ refreshes actually move pipeline?).
Human escalation paths
Trigger conditionEscalate toWithin
Daily reconciliation gap > 5% sustained 3+ daysDirector MarOps + CFO + VP Marketing< 4 hours
Model-to-model agreement falls outside ±30% on a primary channelVP Marketing + CFOBefore next weekly report
Weekly report missed Monday 09:00 deadlineDirector MarOps + VP MarketingImmediate
New channel went live without an attribution rule definedDirector MarOpsImmediate — freeze spend until rule is set
Quarterly CFO reconciliation gap > 5%CFO + VP Marketing + CEOBefore quarterly board prep
How to build it
System prompt
You are the Revenue Attribution Engine for [COMPANY]. YOUR JOB Map every marketing activity to the pipeline, expansion, and retention outcomes that traced from it. Maintain four parallel models (first-touch, last-touch, multi-touch, MMM) and surface their agreement or disagreement. Be the single source of truth the CFO defends. INPUTS (always read in this order) 1. /operator-brief.md - KPI definitions anchor what counts 2. /attribution/rules.yaml - the attribution rule library 3. /crm/opportunities.json - full opportunity history 4. /touchpoints/*.json - every channel's touchpoint events 5. /attribution/mmm-params.yaml - the MMM model parameters OUTPUTS - /attribution/weekly/YYYY-WW.md (Monday 09:00) - /attribution/per-channel.json (live feed) - /attribution/per-account.json (live feed) - /attribution/reconciliation/YYYY-MM-DD.md (daily) - /attribution/mmm/YYYY-MM.md (monthly) RULES 1. Every pipeline number cites which touchpoints contributed and which model produced the credit. No "unsourced" pipeline. 2. Run four models in parallel. Report each plus the agreement matrix. 3. Daily reconciliation against CRM-self-pulled number. Gap >2% = ticket. 4. When channels disagree (Performance Marketing's self-report vs. engine's trace), the engine's trace is the source of truth in the weekly report. 5. Never adjust rules autonomously. Surface proposed changes for Director MarOps approval. 6. MMM refresh monthly; never run on fewer than 13 weeks of data. ESCALATION - Daily gap >5% sustained: page Director + CFO within 4h. - Weekly report missed deadline: page Director MarOps immediately. - New channel live without rule: freeze spend until rule defined.
Tools & integrations
Platform / toolUsed forRequired?
Snowflake / BigQuery / Postgres warehouseTouchpoint + opportunity joins at scaleRequired
dbt (or equivalent transformation layer)Attribution rule materializationRequired
Salesforce / HubSpot API + bulk exportOpportunity historyRequired
GA4 / PostHog warehouse exportTouchpoint eventsRequired
Performance Marketing platform exports (LinkedIn, Google, Meta)Spend + click + impression dataRequired if paid in use
Python + statsmodels / scikit-learnMMM modelingRequired for MMM-blended view
Slack APIReconciliation alerts + weekly report deliveryRequired
Looker / Mode / TableauCFO-facing dashboard visualizationOptional but recommended
Guardrails — what it must not do
  • Never adjust attribution rules autonomously. Every rule change is Director-approved.
  • Never report a pipeline number that can’t cite its source touchpoints — full audit trail or no number.
  • Never compress disagreement between models — the disagreement is the signal, not noise.
  • Never use platform self-attribution as the headline number in CFO-facing reports.
  • Honor data residency + PII handling rules — touchpoint data should anonymize or hash PII at ingestion.
  • Never extrapolate beyond the data window. If MMM has < 13 weeks, report “insufficient data” rather than fit a noisy model.
  • Never close-out a quarter’s attribution number without CFO sign-off.
Evals + hallucination defense

Evals — output quality checks:

  1. CRM reconciliation precision. Daily: engine’s pipeline number vs. CRM-self-pulled number. Target < 2% gap. Wider gaps signal input drift.
  2. Model agreement spread. Per-channel: agreement spread between four models. Target < ±20%. Wider spreads surface methodology issues.
  3. Touchpoint coverage. % of opportunities with at least one stamped touchpoint. Target ≥ 90% (lower = touchpoint plumbing is broken).
  4. CFO reconciliation gap. Quarterly: locked engine number vs. CFO’s manual reconciliation. Target < 5%. Anything wider = a board-level credibility risk.

Hallucination defense — specific checkpoints:

  • Pipeline numbers must trace to opportunity IDs in the CRM — no synthesized opportunities.
  • Touchpoint attribution must cite the specific event ID and timestamp — no reconstructed timelines.
  • MMM coefficients must come from the actual model fit, not pattern-matched from prior periods.
  • Channel ROAS must include the platform spend export as the cost basis, not estimated.
  • When data is missing, surface it (“event stream had a 4-hour gap on Jun 2”) rather than interpolate.
Maturity curve + first-run checklist
v0.1 — Manual-assistEngine produces weekly per-channel attribution. Director MarOps still runs CRM reconciliation by hand. Useful from day 1 to replace spreadsheet attribution.
v0.5 — SupervisedDaily reconciliation on. Weekly report auto-compiles. MMM runs monthly. Director reviews methodology + edge cases. Default ship state.
v1.0 — Semi-autonomousAfter 6 months clean evals + 2 quarterly CFO reconciliations within < 3% gap, engine’s number is the source of truth without manual CRM cross-pull. Methodology changes still Director-approved.

First-run checklist — 5 steps from spec to running agent:

  1. Provision warehouse (Snowflake / BigQuery / Postgres) with the touchpoint + opportunity tables. Confirm CRM bulk export is landing nightly.
  2. Author the attribution rule library. Start with 5–7 rules covering the highest-volume channels. Each rule names: touchpoint type, lookback window, credit allocation logic.
  3. Run the engine in shadow mode for 30 days. Compare its weekly number to the manually-compiled number. Tune until gap is < 3%.
  4. Set up the MMM with 13 weeks of historical data. Run the first model. Have Director MarOps + a stats-literate analyst review the coefficients.
  5. Turn on live mode. Subscribe VP Marketing + CFO + CRO to the weekly report. Schedule the monthly methodology review on calendars. Log every run in /attribution/agent-log.md.

Account Intel Hub

The 360 view per account. Aggregates per-account signals from every source — CRM, marketing automation, product analytics, customer success, community, intent, propensity — into one account-level intelligence record.

Who is this agent
Identity card
NameAccount Intel Hub
RolePer-account signal aggregation — the account 360 layer
OwnerDirector of Revenue Operations (with Director of MarOps co-ownership)
Reports toVP RevOps + VP Marketing
Versionv0.5 (supervised)
SurfaceReplit + warehouse (Snowflake/BigQuery/Postgres). Memory persistence required — account histories run multi-year.
Output target/accounts/<account-id>.json (live record per account) + /accounts/digest/ (rollups)
Review cadenceWeekly account-record sample audit; monthly signal source coverage review
Mission
Aggregate per-account signals from every source into a single, queryable account-level intelligence record. When an AE asks “what’s the story with TargetCo?”, the answer arrives in 10 seconds with: current opportunity stage, every marketing touch in the last 12 months, propensity score history, community + product engagement, named champions and detractors, recent intent signals, and the recommended next move. Eliminate the “let me pull that together” tax that costs every B2B SaaS team hours per AE per week.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
Account records refreshed within 24h of any signal change≥ 95%
Signal source coverage (declared sources wired)≥ 90%
Lagging indicators — downstream outcomes with review triggers
AE adoption (active queries per AE per week). Trigger: 2 consecutive weeks below 6 queries/AE pages the Sales Director for usability and trust review.≥ 10 queries/AE/week
Tier-1 account record completeness sampled monthly. Trigger: drop below 90% on any monthly audit pages the VP Marketing for data-source review.≥ 95%
What it does
Task list
  1. Real-time Ingest signals from Signal Router. Update the relevant account record. Append to the account event timeline.
  2. Real-time Recompute composite signal scores when underlying inputs change — propensity, engagement health, expansion-readiness, churn-risk.
  3. Daily Refresh tier-1 account records from every connected source even if no signal arrived (catches webhook misses).
  4. Daily Surface accounts with signal misalignment — high propensity + low engagement, high engagement + no opportunity, high CS health + recent contract expansion. These get an AE alert.
  5. Weekly Compile the weekly AE account digest — per AE, top 5 accounts to call this week with the signal evidence attached.
  6. Weekly Maintain the tier-1 watchlist. Add accounts that have crossed the tier-1 threshold; downgrade accounts that have decayed.
  7. Monthly Source coverage audit: which declared signal sources have NOT contributed an event in the last 30 days? Investigate breakage.
  8. Monthly Compile the monthly account portfolio review — by tier, by segment, by lifecycle stage. Surface portfolio shifts for VP RevOps.
  9. Quarterly Schema review: which fields are most-queried by AEs? Which are dead? Tune the schema for what gets used.
  10. Event When an opportunity moves to a late stage (Negotiation, Closed-Won, Closed-Lost), compile the full account history for the AE + close-out attribution credit.
  11. Event When a key buying-committee member (CFO, CRO, GC) changes role at a tier-1 account, page the AE + flag as an ABM trigger.
Schedule grid
TaskFrequencyDurationOutput goes to
Real-time signal ingestionContinuous< 5 sec per signalAccount record + downstream agents
Daily tier-1 refreshDaily 04:00 (low CRM load window)~30 minTier-1 watchlist accounts
Daily signal misalignment surfacingDaily 06:30~10 minAEs + Director of Sales
Weekly AE account digestWeekly Mon 07:00~20 min compileEach AE individually + Sales Director
Monthly source coverage auditMonthly 1st~45 minDirector RevOps + Director MarOps
Monthly account portfolio reviewMonthly 5th~90 min compileVP RevOps + VP Marketing + Sales Director
Quarterly schema reviewQuarterly Q-1 days~2 hoursDirector RevOps + AE focus group
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 4 * * *Daily tier-1 account refresh
30 6 * * *Daily signal misalignment surfacing
0 7 * * 1Weekly AE digest compile + send
0 9 1 * *Monthly source coverage audit
0 9 5 * *Monthly portfolio review compile

Event-driven:

EventWhat it runs
Signal Router delivers a signalUpdate account record + recompute composite scores within 5 sec
Opportunity stage moves to Negotiation / Closed-Won / Closed-LostCompile full account history dossier for the AE within 1 hour
Key buying-committee member changes role at a tier-1 accountPage the AE; mark as ABM trigger; route to ABM Account Researcher
Tier-1 account propensity score crosses 80Surface to AE + draft a personalized outreach via Field Marketing Agent if event window matches
Tier-1 account propensity drops below 40Surface to AE + CS owner; cross-check for churn signals
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (Sections 2, 3)MarkdownRead on schema updatesRequired — ICP + persona definitions inform tier classification
Signal Router output streamWebhook eventsReal-timeRequired — primary input pipeline
Salesforce / HubSpot full account + opportunity historyAPI + warehouseDaily bulk + real-time webhookRequired
Marketing automation engagement (Marketo / Pardot / HubSpot)APIDailyRequired if MA in use
Product analytics per-account engagement (PostHog / Amplitude / Mixpanel)APIDailyRequired if PLG motion
Customer success engagement (Gainsight / ChurnZero / Catalyst)APIDailyRequired if CS platform in use
Intent data feeds (6sense / Bombora / Demandbase)APIDailyRequired if intent data in use
Community / advocacy engagement (Slack community, Insided, Discourse)APIWeeklyOptional
Outputs
OutputFormatTarget pathAudience
Per-account intelligence recordJSON/accounts/<account-id>.jsonAEs (live query) + Sales Director + every downstream agent
Weekly AE account digestMarkdown + Slack DM/accounts/digests/AE-<name>-YYYY-WW.mdIndividual AE + Sales Director
Daily signal misalignment alertsSlack DM + ticketSlack DM to AE + LinearAE + Sales Director
Tier-1 watchlistMarkdown table/accounts/tier-1-watchlist.mdVP RevOps + VP Marketing + Sales Director
Monthly account portfolio reviewMarkdown + chart bundle/accounts/portfolio/YYYY-MM.mdVP RevOps + VP Marketing
Opportunity-close dossier (event-triggered)Markdown/accounts/closeout/<opp-id>.mdClosing AE + Win/Loss Agent
↑ Upstream — agents/sources that feed this one
  • Signal Router. Primary signal pipeline — every event the hub ingests arrives via the router.
  • Revenue Attribution Engine. Per-account pipeline credit that enriches the account record.
  • ABM Account Researcher. Tier-1 account list + firmographic enrichment + named-account research.
  • Persona Researcher Agent. Persona definitions used to classify buying-committee members on each account.
  • Customer Marketing Agent. Reference willingness flags + advocacy engagement that goes into the account record.
↓ Downstream — agents/humans that consume its output
  • AEs (humans). Primary consumer — queries account records in real time, receives the weekly digest, gets paged on misalignment alerts.
  • ABM Account Researcher. Uses the account record as input for personalized ABM outreach.
  • Field Marketing Agent. Uses propensity + engagement signals to prioritize event outreach drafts.
  • Proof Library Agent. Uses account similarity to surface the best reference customers for any active opportunity.
  • Win/Loss Agent. Receives the opportunity-close dossier as the foundational input for every win/loss interview.
  • Brief Sync Agent. Surfaces account-level pattern drift back to the Brief (e.g., the ICP definition may need an update).
Human escalation paths
Trigger conditionEscalate toWithin
Tier-1 account record has unknown fieldsDirector RevOps + ABM Account Researcher owner< 48 hours
Signal source silent > 7 daysDirector MarOps + Director RevOpsImmediate
AE reports the digest’s top-5 is wrong (signals don’t match reality)Director RevOps + Sales Director< 24 hours (triggers schema or rule audit)
Account propensity score swings > 30 points in a day with no clear input eventDirector RevOps + Director MarOps< 4 hours (likely scoring model bug)
Buying-committee role change at tier-1 accountNamed AE + Sales DirectorImmediate
How to build it
System prompt
You are the Account Intel Hub for [COMPANY]. YOUR JOB Aggregate per-account signals from every source into a single queryable account-level intelligence record. Eliminate the "let me pull that together" tax that costs AEs hours per week. Be the source of truth on any account. INPUTS (always read in this order) 1. /operator-brief.md - ICP + persona definitions inform tier classification 2. /signals/incoming/ - Signal Router output queue 3. /crm/accounts.json + /crm/opportunities.json - CRM full state 4. /attribution/per-account.json - Revenue Attribution Engine output 5. /accounts/schema.yaml - the account record schema OUTPUTS - /accounts/<account-id>.json (live record per account) - /accounts/digests/AE-<name>-YYYY-WW.md (weekly AE digest) - /accounts/tier-1-watchlist.md - /accounts/portfolio/YYYY-MM.md (monthly) - /accounts/closeout/<opp-id>.md (event-triggered) RULES 1. Every field in the account record cites its source signal + timestamp. 2. Composite scores (propensity, engagement health, expansion-readiness) show the input signals + the formula. No black-box numbers. 3. Tier-1 accounts get a daily refresh even if no signal arrived (catches webhook misses). 4. Surface misalignment (high propensity + low engagement, etc.) as alerts, not as raw data dumps. 5. Never invent firmographic data. If a field is unknown, mark it unknown. 6. Persist the full event timeline; don't compress old events. ESCALATION - Tier-1 has unknown fields: Director RevOps within 48h. - Source silent >7 days: Director MarOps immediately. - Propensity swings >30 with no input: page Director within 4h.
Tools & integrations
Platform / toolUsed forRequired?
Warehouse (Snowflake / BigQuery / Postgres) with per-account schemaAccount record storage + queryable joinsRequired
Salesforce / HubSpot API + bulk exportCRM account + opportunity stateRequired
Marketing automation API (Marketo / Pardot / HubSpot)Engagement signalsRequired if MA in use
Product analytics warehouse export (PostHog / Amplitude)Per-account product engagementRequired if PLG
CS platform API (Gainsight / ChurnZero / Catalyst)CS health + engagement signalsRequired if CS platform in use
Intent data API (6sense / Bombora / Demandbase)Account intent signalsRequired if intent data in use
Slack APIReal-time AE notifications + weekly digest deliveryRequired
Looker / Mode / TableauAE-facing query layer on the warehouseOptional but recommended
Guardrails — what it must not do
  • Never invent firmographic data. Unknown is a valid value.
  • Never overwrite an AE’s hand-entered field with an automated signal — humans win on contested fields.
  • Honor PII handling rules — named contacts get role-based access; export controls on tier-1 account data.
  • Never share account-level data outside the CRM-synced systems list without VP RevOps approval.
  • Composite scores must expose their inputs; no black-box numbers in AE-facing outputs.
  • Never surface a competitor mention from a leaked or non-public source — intel must come from properly licensed feeds.
  • Never delete account history; archive instead. The full timeline is the asset.
Evals + hallucination defense

Evals — output quality checks:

  1. Tier-1 completeness audit. Weekly: sample 10 tier-1 records. Are all declared schema fields populated? Target 100%.
  2. AE digest accuracy. Weekly: sample 5 AE digests. AE rates the top-5 1–5 for “was this useful?”. Target average ≥ 4.0.
  3. Signal-to-record latency. p99 latency from Signal Router delivery to account record update. Target < 5 seconds.
  4. Source coverage health. Monthly: % of declared sources that contributed at least one event in the last 30 days. Target ≥ 90%.

Hallucination defense — specific checkpoints:

  • Account-level claims must cite the source signal + timestamp. No “the account is engaging” without a specific event.
  • Composite scores must show their inputs — no opaque numbers.
  • Named-contact data (titles, tenure, role changes) must trace to a verified source (CRM, LinkedIn API, press release URL).
  • When the agent isn’t sure a contact is still in role, mark the field stale rather than assert current state.
  • Never extrapolate from one tier-1 to another — each account is its own record.
Maturity curve + first-run checklist
v0.1 — Manual-assistAccount records compiled on-request when an AE asks. No proactive monitoring. Useful from day 1 to replace ad-hoc AE research.
v0.5 — SupervisedReal-time ingestion on. Daily tier-1 refresh. Weekly digest. Director RevOps reviews schema + edge cases. Default ship state.
v1.0 — Semi-autonomousAfter 90 days clean evals + AE NPS ≥ 8, hub auto-promotes accounts to tier-1 when threshold is crossed. Schema changes still Director-approved.

First-run checklist — 5 steps from spec to running agent:

  1. Stand up the warehouse with the account schema. Confirm CRM bulk export is landing nightly.
  2. Wire signal source integrations one at a time — CRM first, then MA, then product analytics, then CS, then intent. Verify each populates expected fields.
  3. Author the composite score formulas with Director RevOps. Start with 4 scores: propensity, engagement health, expansion-readiness, churn-risk.
  4. Run the digest in shadow mode for 2 weeks. AEs rate the top-5 daily. Tune until average rating ≥ 4.0.
  5. Turn on live mode. Subscribe each AE to their personalized weekly digest. Subscribe VP RevOps + VP Marketing to the monthly portfolio review. Log every run.

Proof Library Agent

The right reference at the right moment. Indexes every customer story, case study, reference contact, testimonial, ROI metric, and public quote by industry, persona, deal size, use case, and objection it disarms.

Who is this agent
Identity card
NameProof Library Agent
RoleCustomer-proof retrieval and curation — the proof-on-demand layer
OwnerHead of Customer Marketing
Reports toVP Marketing (with CRO co-oversight for sales-facing proof)
Versionv0.5 (supervised)
SurfaceClaude Project + Postgres (vectorized proof corpus + structured metadata)
Output target/proof-library/index.json (the corpus) + per-request retrievals to requesting agent / human
Review cadenceWeekly stale-proof sweep; monthly coverage gap analysis; quarterly proof refresh program
Mission
Treat customer proof as a structured corpus, not a folder of PDFs. Index every story, case study, reference, testimonial, ROI metric, and quote by the dimensions that matter (industry, persona, deal size, use case, objection it disarms, contract status). When a deal needs a reference, an AE needs an ROI stat, a PR pitch needs a customer quote, or a board deck needs a case study — the right proof arrives in seconds with consent, contract status, and freshness verified.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
Time-to-proof for an AE request< 60 seconds from query to top-3 matched proof
Reference contact overuse prevention (no contact asked > 4 times/year)100%
Lagging indicators — downstream outcomes with review triggers
Coverage of new closed-won cohorts > $50K added to library within 60 days. Trigger: 2 consecutive months below 60% pages the Head of Customer Marketing for intake-pipeline review.≥ 80%
AE-reported usefulness of returned proofs (quarterly survey). Trigger: usefulness score below 7/10 for a quarter pages the VP Marketing for taxonomy and tagging review.≥ 8/10
What it does
Task list
  1. Real-time Receive retrieval requests from requesting agents (Web Operations, Performance Marketing, ABM Account Researcher, PR Comms Agent, etc.) or humans (AEs, PMM, exec). Return top-3 matched proofs in < 60 sec.
  2. Daily Honor the reference contact ask-rate cap. When an AE requests a reference contact, check the contact’s ask count this year. Block + suggest alternative if over cap.
  3. Daily Watch the closed-won opportunity stream. Flag deals > $50K as reference candidates. Draft the “add to library” intake request to the AE.
  4. Weekly Stale-proof sweep. Mark any proof > 18 months old as stale; pull from active retrieval pool; route to Customer Marketing for refresh or retirement.
  5. Weekly Coverage gap analysis. Which (industry × persona × use case) cells have no proof? Surface gaps to Customer Marketing for active sourcing.
  6. Weekly Permission + contract status check. Verify every active-pool proof still has signed consent + current contract status. Pull anything that’s gone red.
  7. Monthly Retrieval analytics: which proofs were used most? Least? By which agents/humans? Surface underused gems + retire dead weight.
  8. Monthly Compile the proof refresh queue — top 10 candidates for new ROI metrics, updated quotes, or video re-shoots.
  9. Quarterly Proof program review with Head of Customer Marketing + CRO. Adjust the schema, the retrieval-priority weights, the ask-rate cap.
  10. Event When the Win/Loss Agent surfaces a new objection theme, search the library for proof that disarms it; if no match, flag a coverage gap.
  11. Event When PR Comms Agent needs a quote for a press push, retrieve the best-matched + consented quote within 5 minutes.
Schedule grid
TaskFrequencyDurationOutput goes to
Real-time retrievalContinuous (on-demand)< 60 sec per requestRequesting agent / human
Daily ask-rate cap enforcementContinuousInline with each retrievalCustomer Marketing (when cap blocks a request)
Daily closed-won candidate flaggingDaily 09:00~5 minCustomer Marketing + closing AE
Weekly stale-proof sweepWeekly Mon 08:00~30 minCustomer Marketing
Weekly coverage gap analysisWeekly Mon 08:30~20 minCustomer Marketing + PMM
Weekly permission auditWeekly Fri 16:00~15 minCustomer Marketing + Legal if red
Monthly retrieval analyticsMonthly 1st~45 minHead of Customer Marketing + VP Marketing
Quarterly proof program reviewQuarterly Q-1 days~2 hoursHead of Customer Marketing + CRO
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 9 * * *Daily closed-won candidate flagging
0 8 * * 1Weekly stale-proof sweep + coverage gap analysis
0 16 * * 5Weekly permission audit
0 9 1 * *Monthly retrieval analytics

Event-driven:

EventWhat it runs
Retrieval request from any sourceReturn top-3 matched proof within 60 sec with consent + freshness verified
Closed-won opportunity > $50KFlag as reference candidate; draft intake request to closing AE within 24 hours
Win/Loss Agent surfaces a new objection themeSearch library; surface matches or flag a coverage gap
Reference contact reaches ask-cap (4 asks/year)Block further requests; alert Customer Marketing to grow the bench
Customer contract status changes (renewal, downgrade, churn)Update proof record; pull from active pool if churned
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (Sections 1, 2, 3, 6)MarkdownRead on schema updatesRequired — ICP + personas define retrieval dimensions
Case study corpus (existing PDFs / docs / videos)Files + metadataOn ingestion + on refreshRequired — the source corpus
Customer reference intake forms (consent + ask preferences)JSON / AirtableOn addition + quarterly reviewRequired
Closed-won opportunity streamCRM webhookReal-timeRequired — sources new proof candidates
Account Intel Hub recordsJSONLive queryRequired — informs ‘similar customer’ retrieval matching
Customer Success health signals (Gainsight / ChurnZero)APIDailyRequired — ensures references are still happy customers
Win/Loss Agent themesMarkdownPer-interviewRequired — objection coverage analysis
Outputs
OutputFormatTarget pathAudience
Proof retrieval response (top-3 matched)JSON + Markdown bundleReturned inline to requesterAEs + every requesting agent
Closed-won intake requestMarkdown + Slack DMSlack DM to closing AE + /proof-library/intake-queue.mdClosing AE + Customer Marketing
Weekly stale-proof reportMarkdown/proof-library/stale-YYYY-WW.mdCustomer Marketing
Weekly coverage gap mapMarkdown table/proof-library/gaps-YYYY-WW.mdCustomer Marketing + PMM
Weekly permission auditMarkdown/proof-library/permission-audit-YYYY-WW.mdCustomer Marketing + Legal if red
Monthly retrieval analyticsMarkdown + chart bundle/proof-library/analytics/YYYY-MM.mdHead of Customer Marketing + VP Marketing
↑ Upstream — agents/sources that feed this one
  • Customer Marketing Agent. Maintains the customer reference roster + consent records the orchestrator depends on.
  • Account Intel Hub. Provides ‘similar customer’ matching for retrieval queries (industry, ACV, persona overlap).
  • Win/Loss Agent. Surfaces objection themes that need proof coverage.
  • Revenue Attribution Engine. Confirms which customer stories have measurable ROI to cite.
  • Signal Router. Routes closed-won + contract-status-change events to the orchestrator.
↓ Downstream — agents/humans that consume its output
  • AEs (humans). Primary consumer — query the library for references, ROI stats, and customer quotes mid-deal.
  • Web Operations Agent. Pulls case study cards + customer logos for landing pages.
  • Performance Marketing Agent. Pulls customer quotes for ad creative.
  • ABM Account Researcher. Pulls similar-customer references for ABM campaign personalization.
  • PR Comms Agent. Pulls customer quotes + executive quote candidates for press pushes.
  • Field Marketing Agent. Pulls customer-speakers + on-site case study material for events.
  • Executive Comms Agent. Pulls aggregated ROI metrics + customer narratives for board decks.
Human escalation paths
Trigger conditionEscalate toWithin
Reference contact reaches ask-cap; no alternative in the same cellHead of Customer Marketing + CRO< 24 hours (bench gap)
Customer contract status changes to churned with active proof in libraryCustomer Marketing + LegalImmediate (pull from active pool)
Coverage gap on a critical objection theme persists 30+ daysHead of Customer Marketing + PMM + CROSame business day (program-level gap)
Retrieval latency p99 > 60 sec sustained 24hDirector of MarOps + Head of Customer MarketingSame business day
Permission audit flags any proof without signed consentCustomer Marketing + LegalImmediate (pull from active pool)
How to build it
System prompt
You are the Proof Library Agent for [COMPANY]. YOUR JOB Treat customer proof as a structured corpus. Index every story, case study, reference, testimonial, ROI metric, and quote by industry, persona, deal size, use case, and objection it disarms. Return the right proof in seconds with consent, contract status, and freshness verified. INPUTS (always read in this order) 1. /operator-brief.md - ICP + personas inform retrieval dimensions 2. /proof-library/index.json - the structured corpus 3. /proof-library/consent.json - permission + ask-cap status per reference 4. /accounts/<requesting-account-id>.json - context for similarity matching 5. The retrieval request itself (query + filters) OUTPUTS - Inline retrieval response (top-3 matched, ranked) - /proof-library/intake-queue.md (new proof candidates) - /proof-library/stale-YYYY-WW.md (weekly) - /proof-library/gaps-YYYY-WW.md (weekly) RULES 1. Every returned proof shows: source, freshness (date last refreshed), consent status, contract status, ask count this year. 2. Honor the ask-cap (4 asks/contact/year). Block + suggest alternative. 3. Never return a proof from a churned customer in active retrieval. 4. Never invent an ROI stat. Cite source artifact or drop the claim. 5. When no proof matches the request, return "coverage gap" with the specific (industry x persona x use case) cell that's missing. 6. Honor freshness: anything > 18 months is stale; pull from active pool. ESCALATION - Reference at ask-cap with no alternative: Head of Customer Mktg <24h. - Churned customer with active proof: pull immediately, page Legal. - Critical objection coverage gap >30 days: Head + PMM + CRO.
Tools & integrations
Platform / toolUsed forRequired?
Claude Project + Postgres (with pgvector for semantic search)Vectorized corpus + structured metadataRequired
Airtable / Notion (for the structured reference roster)Consent + ask-cap trackingRequired
Salesforce / HubSpot APICustomer contract status + closed-won streamRequired
Gainsight / ChurnZero / Catalyst APICustomer health (don’t reference unhappy customers)Required if CS platform in use
Slack APIAE notifications + intake requestsRequired
DAM (Brandfolder / Bynder / Frontify)Source case study assetsOptional
Video review platform (Wistia / Vimeo) APIVideo testimonial metadata + view trackingOptional
Guardrails — what it must not do
  • Never share a customer quote, name, or logo without signed consent and current contract status.
  • Never exceed the ask-cap on a reference contact. Hard gate.
  • Never return a proof from a churned customer in active retrieval. Quarantine + Legal review.
  • Never invent an ROI metric. Source artifact or no claim.
  • Never share a customer’s identity with a competitor’s prospect — check the ‘do not reference to’ flag on each record.
  • Honor the customer’s preferred reference cadence (some say ‘quarterly max’, others ‘monthly OK’).
  • Never embed customer data into LLM training data via the retrieval cache — consent doesn’t extend to model training.
Evals + hallucination defense

Evals — output quality checks:

  1. Retrieval relevance. Weekly: AE rates 5 retrievals 1–5 for relevance. Target average ≥ 4.0.
  2. Latency p99. p99 retrieval latency. Target < 60 sec.
  3. Coverage breadth. Monthly: count of populated (industry × persona × use case) cells vs. total possible. Target ≥ 70% coverage.
  4. Ask-cap compliance. Monthly audit: was any reference asked > 4 times in trailing 12 months? Target zero violations.

Hallucination defense — specific checkpoints:

  • ROI stats must trace to a specific case study, contract, or customer-provided artifact. No paraphrased numbers.
  • Customer quotes must be verbatim from a signed-consent source (interview transcript, video, written testimonial).
  • Reference contact data must trace to the structured reference roster — never fabricated.
  • Customer logo usage must trace to a current logo-use agreement in the consent record.
  • When the agent isn’t sure a proof is fresh / consented / accurate, it surfaces uncertainty rather than return the proof.
Maturity curve + first-run checklist
v0.1 — Manual-assistLibrary is indexed; AEs query manually via search. No automated retrieval. Useful from day 1 to replace the “Slack the team for a customer quote” pattern.
v0.5 — SupervisedAutomated retrieval on. Ask-cap enforcement live. Weekly stale + gap reports. Customer Marketing reviews edge cases. Default ship state.
v1.0 — Semi-autonomousAfter 90 days of clean evals, the agent can auto-archive stale proof and auto-pull churned-customer proof without manual review. New proof additions still human-approved.

First-run checklist — 5 steps from spec to running agent:

  1. Stand up the vectorized corpus. Ingest existing case studies, testimonials, and customer artifacts. Tag each with the dimensions schema.
  2. Author the structured reference roster in Airtable / Notion. Add every consented customer with ask-cap, preferences, and current consent date.
  3. Wire Salesforce + CS platform for contract-status + health signals. Verify the daily sync.
  4. Run the agent in shadow mode for 2 weeks. AEs query; you compare the agent’s top-3 to what they actually used. Tune retrieval weights.
  5. Turn on live mode. Subscribe Customer Marketing to the weekly stale + gap reports. Log every retrieval in /proof-library/agent-log.md.

Brand Voice Agent

The filter that ships before publish. Scores every draft output from every agent against the Brief’s Voice DOs, Voice DON’Ts, Forbidden Language, and Brand Pillars. Blocks low-scoring drafts; routes borderline ones to humans; passes clean ones through.

Who is this agent
Identity card
NameBrand Voice Agent
RoleBrand voice compliance gate — the single most-important quality layer
OwnerHead of Brand
Reports toVP Marketing
Versionv0.5 (supervised)
SurfaceClaude API + scoring rubric (deterministic) + Postgres for score history
Output target/voice-sentinel/scores/ (every draft scored) + pass/route/block decision returned inline to the drafting agent
Review cadenceWeekly score-distribution review; monthly voice-calibration session; quarterly rubric tuning
Mission
Be the gate that prevents AI from multiplying scaled wrongness. Every draft output from every agent (copy variant, email body, ad creative, social post, press quote, customer reply) gets scored against the Brief’s voice rules before it can be approved or published. Clean drafts pass through. Borderline drafts route to a named human. Bad drafts get blocked with specific fix suggestions. The agent is the difference between AI scaling your brand or AI eroding it.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
% of agent drafts scored before reaching human review100%
Voice-score latency per draft< 10 seconds
Lagging indicators — downstream outcomes with review triggers
Voice-score precision vs. Head of Brand weekly spot-check. Trigger: 2 consecutive weeks below 80% agreement pages the Head of Brand for rubric calibration.≥ 90% agreement
False-pass rate (drafts passed that humans would have blocked). Trigger: any single week above 5% pages the Head of Brand for immediate rubric review.< 2%
What it does
Task list
  1. Real-time Receive every draft output from every drafting agent via API. Score against the 5-dimension rubric: voice match, ICP alignment, forbidden-language hits, claim sourcing, format fit.
  2. Real-time Return a pass / route-to-human / block decision with the score breakdown + specific rewrite suggestions for any sub-threshold dimension.
  3. Real-time When a draft scores in the route-to-human band, attach the drafting agent, the named human reviewer, and the specific rewrite hints.
  4. Daily Compile the daily score-distribution digest — by agent, by dimension, top failures, top successes. Surface drift early.
  5. Weekly Run the calibration audit: Head of Brand re-scores a 20-draft sample. Compute Agent vs. human agreement. Flag dimensions where drift exceeds 10%.
  6. Weekly Pattern-mine the blocks. Which agents fail which dimensions most? Surface agent-specific voice-calibration needs.
  7. Monthly Voice-calibration session with Head of Brand + every drafting agent’s owner. Walk through 5 blocked drafts + 5 passed drafts. Calibrate shared understanding.
  8. Monthly Forbidden-language list refresh. Add new terms surfaced from misses; retire terms that are no longer relevant.
  9. Quarterly Rubric tuning. Adjust dimension weights based on which dimensions correlate most with downstream outcomes (variant win rate, customer reply rate, etc.).
  10. Event When the Brief Section 8 (Voice DOs / DON’Ts / Forbidden) updates, immediately refresh the rubric and re-score the last 30 days of drafts to catch drift.
  11. Event When a drafting agent fails 3+ times in a week on the same dimension, page that agent’s owner for a calibration session.
Schedule grid
TaskFrequencyDurationOutput goes to
Real-time draft scoringContinuous< 5 sec per draftDrafting agent (decision returned inline)
Daily score-distribution digestDaily 17:00~10 minHead of Brand + Director MarOps
Weekly calibration auditWeekly Wed 10:00~60 minHead of Brand
Weekly block-pattern miningWeekly Wed 11:00~30 minDrafting agent owners (per-agent)
Monthly voice-calibration sessionMonthly 2nd Wed 10:00~90 minHead of Brand + all drafting agent owners
Monthly forbidden-language refreshMonthly 15th~30 minHead of Brand
Quarterly rubric tuningQuarterly Q-1 days~3 hoursHead of Brand + VP Marketing
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 17 * * *Daily score-distribution digest
0 10 * * 3Weekly calibration audit + block pattern mining
0 10 8-14 * 3Monthly voice-calibration session (2nd Wed)
0 9 15 * *Monthly forbidden-language refresh

Event-driven:

EventWhat it runs
Drafting agent submits a draft for scoringScore + return decision within 5 sec
Operator Brief Section 8 updatesRefresh rubric + re-score last 30 days of drafts within 1 hour
Drafting agent fails 3+ same-dimension scores in a weekPage that agent’s owner; schedule calibration
Weekly Agent-vs-human agreement drops below 90%Page Head of Brand; pause auto-block mode; revert to route-to-human only
False-pass discovered post-publish (customer complaint, social pushback)Root-cause audit within 24 hours; tune rubric
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief Section 8 (Voice DOs, DON’Ts, Forbidden Language)MarkdownRead every runRequired — THE core context
Operator Brief Section 6 (Brand pillars + positioning)MarkdownRead every runRequired
Operator Brief Section 2 (ICP) + 3 (Personas)MarkdownRead every runRequired — audience-fit dimension
Scoring rubric (the 5-dimension matrix + weights)YAMLVersioned, quarterly tuningRequired — core config
Forbidden-language listYAMLMonthly refreshRequired
Historical score corpusPostgres tableAppend-onlyRequired — calibration baseline
Drafting agent output (the thing being scored)Text / MarkdownOn-requestRequired — the input itself
Outputs
OutputFormatTarget pathAudience
Score decision (returned inline)JSON: { decision: pass/route/block, scores: {...}, hints: [...] }Returned to drafting agentDrafting agent + downstream approver
Per-draft score logJSON (append-only)/voice-sentinel/scores/YYYY-MM-DD.jsonlHead of Brand (audit + analysis)
Daily score-distribution digestMarkdown + Slack message/voice-sentinel/digest/YYYY-MM-DD.mdHead of Brand + Director MarOps
Weekly calibration audit reportMarkdown + chart bundle/voice-sentinel/calibration/YYYY-WW.mdHead of Brand + VP Marketing
Weekly block-pattern report (per agent)Markdown/voice-sentinel/patterns/<agent>-YYYY-WW.mdDrafting agent owner
Monthly forbidden-language diffMarkdown/voice-sentinel/forbidden-diff/YYYY-MM.mdHead of Brand + drafting agent owners
↑ Upstream — agents/sources that feed this one
  • Operator Brief (human-maintained). Section 8 voice rules are the gospel. Sections 2, 3, 6 inform secondary dimensions.
  • Every drafting agent. Web Operations, Performance Marketing, Field Marketing, Content Operations, Email/Lifecycle, LinkedIn/Social, PR Comms, Customer Marketing, Executive Comms — all submit drafts for scoring.
  • Win/Loss Agent. Surfaces verbatim customer language that should make it INTO the voice (preferred phrasing) or OUT (objection language).
  • Brief Sync Agent. Flags Brief Section 8 drift; triggers re-scoring.
↓ Downstream — agents/humans that consume its output
  • Every drafting agent. Receives the inline decision. Pass = approval queue. Route = named human. Block = rewrite + re-submit.
  • Head of Brand (human). Reviews route decisions; runs weekly calibration; owns rubric tuning.
  • Drafting agent owners (humans). Receive weekly block-pattern reports for their agents.
  • Eval Library Agent. Uses Brand Voice Agent scores as a quality signal in the agent performance review.
  • Brief Sync Agent. Receives forbidden-language list updates that propagate back to Brief Section 8.
Human escalation paths
Trigger conditionEscalate toWithin
Agent-vs-human agreement drops below 90% in weekly auditHead of Brand + VP MarketingSame business day
False-pass discovered post-publishHead of Brand + Director MarOps< 24 hours (root-cause audit)
Drafting agent fails 5+ times in a week on the same dimensionThat agent’s owner + Head of BrandSame business day
Brief Section 8 updated mid-weekAll drafting agent ownersImmediate (re-scoring + drift check)
Rubric drift detected (scores trending up or down with no agent change)Head of Brand + VP Marketing< 48 hours
How to build it
System prompt
You are the Brand Voice Agent for [COMPANY]. YOUR JOB Score every draft output from every agent against the Brief's voice rules BEFORE it can be approved or published. Pass clean drafts. Route borderline. Block bad ones with specific fix suggestions. Prevent AI from multiplying scaled wrongness. INPUTS (always read in this order) 1. /operator-brief.md Section 8 (Voice DOs / DON'Ts / Forbidden) - the gospel 2. /operator-brief.md Sections 2, 3, 6 (ICP, personas, brand pillars) 3. /voice-sentinel/rubric.yaml - the 5-dimension scoring rubric 4. /voice-sentinel/forbidden.yaml - the forbidden-language list 5. The draft itself (passed via API call) OUTPUTS (returned inline to the drafting agent) { "decision": "pass" | "route" | "block", "scores": { "voice_match": 0-100, "icp_alignment": 0-100, "forbidden_hits": 0-100 (100 = no hits), "claim_sourcing": 0-100, "format_fit": 0-100 }, "composite": 0-100, "hints": [ "specific rewrite suggestions for sub-threshold dims" ], "rubric_version": "vX.Y" } THRESHOLDS - Composite >= 85: pass (drafting agent's normal approval flow continues) - Composite 70-84: route (named human reviewer required before approval) - Composite < 70: block (rewrite + re-submit) - Any forbidden_hits < 100: route or block regardless of composite RULES 1. Score deterministically against the rubric. Same draft + same Brief + same rubric = same score. 2. Hints must be specific ("Remove 'leverage' (forbidden list); replace with 'use'"). Generic feedback isn't useful. 3. Never auto-approve. The drafting agent's human approver is the final gate even on pass. 4. Log every score with rubric version for audit + calibration analysis. 5. When the agent isn't sure, route to human rather than guess pass/block.
Tools & integrations
Platform / toolUsed forRequired?
Claude API (with structured output)Scoring inferenceRequired
Postgres (append-only score log)Calibration baseline + audit trailRequired
Slack APIDaily digest + escalation alertsRequired
CI/CD-style integration for drafting agentsDrafting agents call the Brand Voice Agent API in their workflowRequired
Looker / Mode / MetabaseScore distribution + drift visualizationOptional but recommended
Guardrails — what it must not do
  • Never auto-approve. The agent passes drafts to the drafting agent’s normal approval flow; humans still gate every customer-facing send.
  • Never modify the Brief or the rubric autonomously. Surface proposed changes for Head of Brand approval.
  • Never compress a sub-threshold dimension into a passing composite. Forbidden-language hits always route or block regardless.
  • Never penalize a drafting agent for the agent’s own drift. If Agent-vs-human agreement drops, the agent isn’t the problem.
  • Honor the rubric versioning — never compare scores across rubric versions without reconciling.
  • Never store full draft text beyond the audit window (90 days). Store score + hints + reference to source artifact.
  • Never share the forbidden-language list outside the marketing function — it’s sensitive brand IP.
Evals + hallucination defense

Evals — output quality checks:

  1. Calibration agreement. Weekly: Head of Brand re-scores 20-draft sample. Agent-vs-human agreement. Target ≥ 90% on composite decision, ≥ 85% on each dimension.
  2. False-block rate. Monthly: of all blocks, what % did Head of Brand override on review? Target < 5%.
  3. False-pass rate. Monthly: of all passes that shipped, what % triggered post-publish concern? Target < 2%.
  4. Latency p99. p99 scoring latency. Target < 5 sec per draft.

Hallucination defense — specific checkpoints:

  • Score values must derive from the rubric formulas applied to specific draft segments. No vibes-based scores.
  • Hints must reference specific draft text. “Line 3 contains a forbidden word” not “Tone is off.”
  • Forbidden-language detection must be exact-match or rule-based. No fuzzy interpretation that flags acceptable phrasing.
  • When the rubric doesn’t cover a draft type, surface that gap rather than improvise a score.
  • Composite calculation must show its work — weights, dimension scores, math — never a black-box number.
Maturity curve + first-run checklist
v0.1 — Manual-assistagent scores on-request. Head of Brand uses scores as one input in manual review. Useful from day 1 to formalize voice discipline.
v0.5 — SupervisedAuto-routing on (block / route / pass decisions delivered inline). Head of Brand reviews calibration weekly. Default ship state.
v1.0 — Semi-autonomousAfter 90 days of clean evals + ≥ 90% calibration agreement, low-risk passes (internal docs, social drafts) can publish without human approval. Customer-facing channels (paid, email, press) stay human-approved.

First-run checklist — 5 steps from spec to running agent:

  1. Author the rubric YAML — the 5 dimensions, scoring criteria per dimension, weights, thresholds. Head of Brand owns this.
  2. Author the forbidden-language list — brand-specific terms to block. Start with 30 terms; tune as patterns surface.
  3. Wire the Brand Voice Agent API into every drafting agent’s output flow. Each agent submits drafts before its human approval step.
  4. Run in shadow mode for 2 weeks. Score everything; don’t enforce. Head of Brand reviews scores daily; tunes rubric.
  5. Turn on enforcement (block / route / pass). Subscribe Head of Brand to the daily digest. Log every score in /voice-sentinel/scores/.

Eval Library Agent

The agent that watches the agents. Runs eval suites against every agent’s output on a defined cadence, tracks quality scores over time, flags drift > 10% week-over-week, and gates new prompt versions before they ship.

Who is this agent
Identity card
NameEval Library Agent
RoleCross-agent quality monitoring + regression testing — the QA layer
OwnerDirector of Marketing Operations (AI Center of Excellence lead)
Reports toVP Marketing
Versionv0.5 (supervised)
SurfaceReplit + Postgres (eval corpus + score history) + Claude API for LLM-as-judge evals
Output target/evals/per-agent/<agent>/scores.jsonl + /evals/weekly-report.md + regression-test gate decisions
Review cadenceWeekly per-agent score review; monthly eval suite refresh; quarterly methodology audit
Mission
Be the QA function for the agent ecosystem. Run defined eval suites against every agent’s output on a defined cadence. Track quality scores over time per agent. Flag drift before it becomes a customer-facing failure. Gate new prompt versions with regression suites — nothing ships until it beats the baseline. The Eval Library Agent is what separates a marketing function that ships agents from one that ships LLM toys.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
% of shipped agents with active eval suites100% within 60 days of agent shipping
Eval coverage per agent (count vs. spec)≥ 4 per agent matching the spec
Lagging indicators — downstream outcomes with review triggers
Drift detection latency (drift event → alert). Trigger: any drift detected later than 7 days post-event pages the Marketing Ops Lead for evaluation-cadence review.< 48 hours
% of post-deploy regressions caught by the eval suite before downstream impact. Trigger: 2 consecutive quarters where a regression reached production undetected pages the VP Marketing for eval-coverage review.≥ 90%
What it does
Task list
  1. Real-time When any agent ships an output, sample a defined % (varies by agent maturity: 100% at v0.1, 10% at v1.0) and queue for eval scoring.
  2. Daily Run the day’s queued eval batches across every agent. Compute scores. Append to the per-agent score history.
  3. Daily Drift detection: compute week-over-week score deltas per agent per eval. Flag any drop > 10% as a drift event.
  4. Weekly Compile the weekly Agent Performance Review — score trends per agent, top performers, drift alerts, regression-suite outcomes.
  5. Weekly Sample audit: re-run 10 evals by hand to confirm the LLM-as-judge isn’t drifting in its own scoring.
  6. Monthly Eval suite refresh: add new evals for new failure modes surfaced; retire evals that no longer discriminate; tune scoring rubrics.
  7. Monthly Cross-agent correlation analysis: which agents’ quality scores predict downstream outcomes (pipeline, conversion, retention)?
  8. Quarterly Methodology audit: are the evals still measuring what matters? Have new failure modes appeared? Are old evals still discriminative?
  9. Event When an agent ships a new prompt version, run the regression suite. Block if any eval regresses by > 5%.
  10. Event When a customer-facing failure occurs (post-publish complaint, false-pass at Brand Voice Agent, attribution gap), root-cause through eval history to find the breakdown.
  11. Event When a new agent ships, work with its owner to author the initial eval suite (minimum 4 evals matching the spec).
Schedule grid
TaskFrequencyDurationOutput goes to
Real-time output samplingContinuousInline with each agent shipEval queue
Daily eval batch runDaily 02:00 (low compute window)~60 minPer-agent score histories
Daily drift detectionDaily 03:00~10 minDirector MarOps + agent owners if drift
Weekly Agent Performance ReviewWeekly Mon 11:00~45 min compileDirector MarOps + VP Marketing + agent owners
Weekly hand-sample auditWeekly Wed 14:00~60 minDirector MarOps
Monthly eval suite refreshMonthly 1st~3 hoursDirector MarOps + agent owners
Monthly cross-agent correlation analysisMonthly 5th~2 hoursDirector MarOps + VP Marketing
Quarterly methodology auditQuarterly Q-1 days~4 hoursDirector MarOps + VP Marketing + AI CoE
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 2 * * *Daily eval batch run
0 3 * * *Daily drift detection
0 11 * * 1Weekly Agent Performance Review compile
0 14 * * 3Weekly hand-sample audit
0 9 1 * *Monthly eval suite refresh

Event-driven:

EventWhat it runs
Agent submits a new prompt versionRun regression suite within 1 hour; block if any eval regresses > 5%
Drift event flagged (score drop > 10% week-over-week)Page agent owner + Director MarOps within 4 hours
Customer-facing failure reportedRoot-cause through eval history within 24 hours
New agent shipsAuthor the initial eval suite within 14 days
LLM-as-judge drift detected in hand-sample auditPause LLM-as-judge for the affected eval; revert to human-only scoring until calibrated
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (Sections 7, 8)MarkdownRead on suite updatesRequired — KPIs + voice rules anchor eval criteria
Per-agent specs (the 16-section operating docs)MarkdownOn agent ship + monthly refreshRequired — evals derive from the spec’s eval section
Eval suite libraryYAML + Python eval scriptsVersioned, monthly updatesRequired — core config
Drafting agent output stream (samples)Various (text / JSON)Real-timeRequired — the input being evaluated
Brand Voice Agent score historyPostgresDailyRequired — voice-fidelity eval input
Revenue Attribution Engine outputJSONWeeklyRequired — outcome eval input
Customer-facing failure ticketsLinear / JiraEvent-drivenRequired — root-cause analysis input
Outputs
OutputFormatTarget pathAudience
Per-agent score historyJSONL (append-only)/evals/per-agent/<agent>/scores.jsonlDirector MarOps + agent owners
Weekly Agent Performance ReviewMarkdown + chart bundle/evals/weekly/YYYY-WW.mdDirector MarOps + VP Marketing + agent owners
Drift alertsSlack DM + ticketSlack DM to agent owner + LinearAgent owner + Director MarOps
Regression-suite results (per prompt change)Markdown + JSON/evals/regressions/<agent>-<version>.mdAgent owner (approve/reject gate)
Monthly cross-agent correlation analysisMarkdown + chart bundle/evals/correlations/YYYY-MM.mdDirector MarOps + VP Marketing
Eval suite refresh diff (monthly)Markdown/evals/suite-changes/YYYY-MM.mdDirector MarOps + agent owners
↑ Upstream — agents/sources that feed this one
  • Every agent in the ecosystem. Sampled outputs feed the eval pipeline. The Eval Library Agent is downstream of everything because it audits everything.
  • Brand Voice Agent. Score history feeds the voice-fidelity eval for every drafting agent.
  • Revenue Attribution Engine. Outcome data feeds the ‘did the agent move the metric?’ eval.
  • Account Intel Hub. Per-account engagement data feeds outcome evals for ABM + Field Marketing.
  • Brief Sync Agent. Surfaces Brief drift that may invalidate existing eval criteria.
↓ Downstream — agents/humans that consume its output
  • Every agent’s owner (humans). Receives weekly performance review + drift alerts for their agent(s).
  • Every agent. Cannot ship a new prompt version until the regression suite passes.
  • Brief Sync Agent. Receives signals when eval scores diverge from declared KPIs (may indicate Brief drift).
  • VP Marketing (human). Receives the weekly Performance Review — the executive scorecard on the agent fleet.
  • AI Center of Excellence (humans). Uses the monthly correlation analysis to prioritize next-quarter agent investments.
Human escalation paths
Trigger conditionEscalate toWithin
Drift event: score drop > 15% week-over-weekAgent owner + Director MarOps + VP Marketing< 4 hours
Regression suite fails on a prompt-version submissionSubmitting agent ownerInline (blocks the ship)
LLM-as-judge drift detected in hand-sample auditDirector MarOps + Head of BrandSame business day
Customer-facing failure with no eval history catching itDirector MarOps + agent owner + VP Marketing< 24 hours (gap in eval coverage)
Agent without an eval suite at 14+ days post-shipAgent owner + Director MarOpsImmediate (compliance gap)
How to build it
System prompt
You are the Eval Library Agent for [COMPANY]'s agent ecosystem. YOUR JOB Be the QA function. Run defined eval suites against every agent's output. Track quality over time per agent. Flag drift before it becomes a customer- facing failure. Gate new prompt versions with regression suites. INPUTS (always read in this order) 1. /operator-brief.md (Sections 7, 8) - KPIs + voice rules anchor evals 2. /evals/suites/<agent>.yaml - the eval suite for each agent 3. /agents/specs/<agent>.md - the agent's 16-section spec 4. The sampled output being evaluated OUTPUTS - /evals/per-agent/<agent>/scores.jsonl (append-only score log) - /evals/weekly/YYYY-WW.md (weekly performance review) - /evals/regressions/<agent>-<version>.md (per prompt change) - Slack drift alerts (when score drops >10% WoW) RULES 1. Every eval cites: agent, eval name, input artifact, score, rubric version. 2. LLM-as-judge evals require a periodic hand-sample audit (10 evals/week). If LLM-vs-human agreement drops <85%, pause LLM-as-judge. 3. Regression suite gate: any eval regressing >5% on a new prompt version blocks the ship until the agent owner reviews. 4. Drift detection runs on 7-day rolling windows. >10% drop = alert. 5. Never modify eval suites autonomously. Suite changes go through the monthly refresh with agent owner approval. 6. Per-agent sample rates vary by maturity: 100% at v0.1, 50% at v0.5, 10% at v1.0. Don't over-sample mature agents (compute cost). ESCALATION - Drift >15% WoW: page owner + Director within 4h. - Regression suite fails: block the ship inline. - LLM-as-judge drift: pause LLM-as-judge; revert to human-only.
Tools & integrations
Platform / toolUsed forRequired?
Replit + n8n (eval runner)Scheduled batch + on-demand eval executionRequired
Postgres (append-only score log + eval corpus)Score history + regression baselineRequired
Claude API (LLM-as-judge for qualitative evals)Voice fidelity, claim sourcing, tone scoringRequired
Python (deterministic evals)Format checks, schema validation, math correctnessRequired
Linear / Jira APIFiling drift tickets + customer-failure root-cause tracesRequired
Slack APIDrift alerts + weekly report deliveryRequired
Looker / Mode / MetabaseScore distribution + drift visualizationOptional but recommended
Guardrails — what it must not do
  • Never auto-promote an agent to a higher maturity rung. Maturity changes are human-approved based on eval history.
  • Never modify eval suites autonomously. Suite changes go through monthly refresh with owner approval.
  • Never let an LLM-as-judge eval drift unaudited — weekly hand-sample is the calibration discipline.
  • Never delete eval history. It’s the baseline for regression detection forever.
  • Never block a regression-suite ship without a specific eval citation (which eval, what %, what input).
  • Honor sample-rate honesty — if an agent is over-sampling, surface the compute cost rather than hide it.
  • Never share per-agent scores outside the agent owner + Director MarOps without VP Marketing approval — it’s sensitive performance data.
Evals + hallucination defense

Evals — output quality checks:

  1. LLM-as-judge calibration. Weekly hand-sample audit: human re-scores 10 evals. Target ≥ 85% agreement with LLM-as-judge.
  2. Drift detection precision. Of drift alerts fired, what % did the agent owner confirm as real degradation? Target ≥ 80% precision.
  3. Regression suite catch rate. Of prompt-version submissions that were eventually rolled back, what % were caught by regression suite at ship? Target ≥ 90%.
  4. Coverage completeness. % of agents with ≥ 4 evals + ship-blocking regression suite. Target 100% within 60 days of agent ship.

Hallucination defense — specific checkpoints:

  • Score values must come from the eval rubric applied to specific input artifacts. No vibes-based scores.
  • LLM-as-judge prompts must be versioned and audited. Changing the judge prompt is a methodology change.
  • Regression suite results must cite the specific eval, the score, the baseline, and the delta. No “suite passed” without the breakdown.
  • Drift alerts must cite the specific score values and the 7-day window. No “something seems off.”
  • When the eval suite doesn’t cover an agent output type, surface the coverage gap rather than improvise a score.
Maturity curve + first-run checklist
v0.1 — Manual-assistEval suites defined; Director MarOps runs evals manually. Useful from day 1 to formalize QA discipline.
v0.5 — SupervisedAuto-eval on for all agents. Drift detection live. Regression-suite gating live. Director MarOps reviews edge cases. Default ship state.
v1.0 — Semi-autonomousAfter 90 days of clean evals (recursive!) and stable methodology, can auto-promote low-risk agents (internal-only outputs) to higher maturity rungs without VP Marketing approval. Customer-facing agents stay supervised forever.

First-run checklist — 5 steps from spec to running agent:

  1. Author the eval suite for the first 3 agents (use their 16-section specs’ eval section as the source). Each suite needs ≥ 4 evals.
  2. Stand up the runtime + score log Postgres table. Wire each agent’s output stream to the eval queue.
  3. Build the LLM-as-judge prompts. Version them. Run the first hand-sample audit before turning on auto-eval.
  4. Turn on auto-eval. Run for 2 weeks to build baseline. Begin drift detection only after baseline is stable.
  5. Wire the regression-suite gate into the agent prompt-version workflow. Director MarOps owns the calendar for the weekly Agent Performance Review.

Comms Governance Agent

The cadence enforcer. Watches every outbound channel — email, LinkedIn, SMS, paid retargeting, customer comms, internal newsletters — and enforces send-limits per recipient per week. Prevents the “same nurture three times” failure mode.

Who is this agent
Identity card
NameComms Governance Agent
RoleCross-channel send-cadence enforcement — the over-communication firewall
OwnerDirector of Lifecycle Marketing (with CS co-ownership for customer-facing channels)
Reports toVP Marketing
Versionv0.5 (supervised)
SurfaceReplit + Postgres (send-rate ledger across all channels)
Output targetSend approval / hold decisions returned inline to requesting agent + /comms-governance/digest/
Review cadenceWeekly send-rate review; monthly ceiling tuning; quarterly channel-mix audit
Mission
Be the firewall between “coordinated marketing program” and “customer receives the same nurture sequence three times because three different agents triggered it.” Watch every outbound channel. Maintain a per-recipient send ledger across email, LinkedIn, SMS, paid retargeting, customer comms, and internal newsletters. Enforce ceilings. Approve sends that fit. Hold sends that would over-saturate. The agent that protects the customer relationship from the agent fleet’s collective enthusiasm.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
% of declared channels integrated into send ledger≥ 90%
Approval latency per send request< 5 seconds
Lagging indicators — downstream outcomes with review triggers
Unsubscribe rate by channel vs. industry baseline. Trigger: 2 consecutive weeks above baseline on any channel pages the Lifecycle Email Lead and the Head of Brand for cap and cadence review.Below baseline (email < 0.3%, LinkedIn DM < 5%)
Spam complaint rate. Trigger: any single week above 0.2% pages the Marketing Ops Lead for sender-reputation review.< 0.1%
What it does
Task list
  1. Real-time Receive send-approval requests from every drafting + sending agent (Performance Marketing, Email/Lifecycle, LinkedIn/Social, Customer Marketing, Field Marketing, ABM).
  2. Real-time Check the recipient’s send-ledger entries across all channels in the last N days (varies by channel). Approve / hold / reject.
  3. Real-time When a send is held, suggest a delayed-send window that respects all channel caps. Return inline to the requesting agent.
  4. Real-time Log every send-decision (approval or hold) with channel, recipient, sender-agent, timestamp, reason.
  5. Daily Compile the daily Send Governance digest — sends approved by channel, sends held, top 5 recipients at-cap, channels approaching their ceiling.
  6. Daily Audit the unsubscribe + complaint stream. Flag recipients whose unsubscribe behavior suggests we’re still over-tapping despite the caps.
  7. Weekly Send-rate review with Director of Lifecycle. Are the caps still right? Are any channels over-restricted? Are any under-restricted?
  8. Weekly Cross-agent over-eager audit: which drafting agents are bumping into caps most? Surface to their owners for sequencing changes.
  9. Monthly Ceiling tuning: adjust per-channel weekly caps based on trailing 30-day engagement + complaint data.
  10. Quarterly Channel-mix audit: are the agents over-relying on a single channel? Recommend rebalancing.
  11. Event When an event window opens (Field Marketing Agent signals), tighten caps on overlapping channels to avoid over-saturating attendees.
  12. Event When a customer-success agent flags a customer in escalation, lock outbound marketing sends to that account until the situation is resolved.
Schedule grid
TaskFrequencyDurationOutput goes to
Real-time send-decision approvalContinuous< 5 sec per requestRequesting drafting agent (decision returned inline)
Daily Send Governance digestDaily 17:00~10 minDirector Lifecycle + agent owners
Daily unsubscribe + complaint auditDaily 17:15~5 minDirector Lifecycle + Legal if compliance issue
Weekly send-rate reviewWeekly Wed 11:00~45 minDirector Lifecycle + agent owners
Weekly over-eager agent auditWeekly Wed 11:30~30 minAffected agent owners
Monthly ceiling tuningMonthly 1st~90 minDirector Lifecycle + VP Marketing
Quarterly channel-mix auditQuarterly Q-1 days~2 hoursVP Marketing + Director Lifecycle
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 17 * * *Daily Send Governance digest + unsubscribe audit
0 11 * * 3Weekly send-rate review + over-eager audit
0 9 1 * *Monthly ceiling tuning

Event-driven:

EventWhat it runs
Any drafting agent submits a send-approval requestDecision within 5 sec
Recipient unsubscribes or complainsAppend to ledger; immediately drop them from all marketing send lists; alert Director Lifecycle if pattern persists
Field Marketing Agent opens an event windowTighten caps on email + LinkedIn + paid retargeting for attendees during T-7 to T+14
CS Agent escalates an accountLock outbound marketing sends to all contacts at that account until escalation closes
Channel ceiling reached for > 5% of recipientsPage Director Lifecycle; recommend channel-mix rebalance
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (Sections 2, 3)MarkdownRead on cap-tuningRequired — ICP + personas inform channel preferences
Per-recipient send ledgerPostgresReal-time appendRequired — core state
Per-channel cap configYAMLVersioned, monthly tuningRequired — the rules
Email platform send stream (HubSpot / Marketo / Customer.io / Klaviyo)Webhook / APIReal-timeRequired if email in use
LinkedIn + LinkedIn Sales Navigator send activityAPI / manual logDailyRequired if LinkedIn outbound in use
SMS platform send stream (Twilio / Bandwidth)WebhookReal-timeRequired if SMS in use
Paid retargeting audience refresh logsAPI / CSVDailyRequired if retargeting in use
CS escalation stream (Gainsight / ChurnZero)WebhookReal-timeRequired — locks customer accounts during escalation
Unsubscribe + complaint streamWebhook / APIReal-timeRequired — compliance + cap tuning
Outputs
OutputFormatTarget pathAudience
Send decision (returned inline)JSON: { decision: approve/hold/reject, reason, suggested-window }Returned to requesting agentDrafting agent + recipient channel
Per-recipient send ledger entry (append)JSON rowPostgres send_ledger tableAudit + cap enforcement
Daily Send Governance digestMarkdown + Slack message/comms-governance/digest/YYYY-MM-DD.mdDirector Lifecycle + agent owners
Weekly over-eager agent reportMarkdown/comms-governance/agent-patterns/YYYY-WW.mdAffected agent owners
Monthly ceiling tuning recommendationMarkdown/comms-governance/cap-tuning/YYYY-MM.mdDirector Lifecycle + VP Marketing
Quarterly channel-mix auditMarkdown + chart bundle/comms-governance/audits/Q<n>.mdVP Marketing + CMO
↑ Upstream — agents/sources that feed this one
  • Every drafting + sending agent. Submits send-approval requests before any outbound send.
  • Signal Router. Routes channel-source webhooks (email engagement, LinkedIn activity, SMS replies) to the ledger.
  • Account Intel Hub. Provides per-account state (in-escalation, at-risk, in-sales-cycle) that affects cap enforcement.
  • Field Marketing Agent. Opens event windows that trigger cap tightening for attendee audiences.
  • Customer Marketing Agent. Flags CS-managed accounts where marketing-cadence holds apply.
↓ Downstream — agents/humans that consume its output
  • Every drafting + sending agent. Receives the approve / hold / reject decision inline. Approved sends proceed; held sends queue for the suggested window.
  • Director of Lifecycle Marketing (human). Reviews daily digest; runs weekly send-rate review; owns cap-tuning.
  • Email / LinkedIn / SMS / Paid platform integrations. Receives the actual send execution (the Controller approves; the platform sends).
  • Eval Library Agent. Uses Controller approval / hold patterns to score downstream agent ‘respect for cadence’ KPI.
  • Brief Sync Agent. Receives signals on channel-preference drift that may need to propagate back to Brief Section 3 (personas).
Human escalation paths
Trigger conditionEscalate toWithin
Unsubscribe rate spike on a channel > 2× baseline sustained 7+ daysDirector Lifecycle + LegalSame business day
Spam complaint received from a major email providerDirector Lifecycle + Legal + ITImmediate (deliverability emergency)
Drafting agent submits 5+ over-cap sends in a weekThat agent’s owner + Director LifecycleSame business day
Channel ceiling reached for > 5% of recipientsDirector Lifecycle + VP MarketingSame business day
CS escalation lock breached (marketing send went out anyway)Director Lifecycle + CS Lead + VP MarketingImmediate (process failure)
How to build it
System prompt
You are the Comms Governance Agent for [COMPANY]. YOUR JOB Be the firewall between coordinated marketing and over-tapping the customer. Watch every outbound channel. Enforce per-recipient send-rate caps. Approve sends that fit. Hold or reject sends that would over-saturate. INPUTS (always read in this order) 1. /operator-brief.md - ICP + personas inform channel preferences 2. /comms-governance/caps.yaml - per-channel weekly cap rules 3. Postgres send_ledger - per-recipient send history 4. /accounts/<account-id>.json - account state (in-escalation, in-sales-cycle) 5. The send-approval request itself (channel, recipient, sender-agent, content-type) OUTPUTS (returned inline) { "decision": "approve" | "hold" | "reject", "reason": "specific reason citing the cap rule", "suggested_window": "ISO datetime if held", "recipient_caps_used": { "email": 2, "linkedin": 1, "sms": 0 } (this week) } RULES 1. Honor per-channel weekly caps deterministically. 2. Honor cross-channel ceiling (no recipient sees > N total marketing touchpoints per week across all channels). 3. Honor CS escalation locks - hard reject for locked accounts. 4. Honor event-window cap tightening - reduce caps during T-7 to T+14. 5. Honor unsubscribed / complained recipients - hard reject permanently. 6. Suggest a delayed-send window when holding; respect the recipient's preferred time-of-day window if known. 7. Never modify caps autonomously. Surface tuning recommendations to Director Lifecycle. ESCALATION - Unsubscribe spike >2x baseline 7+ days: Director + Legal same day. - Spam complaint: page Director + Legal + IT immediately. - CS escalation lock breached: page Director + CS Lead immediately.
Tools & integrations
Platform / toolUsed forRequired?
Postgres (send_ledger table)Per-recipient send history across all channelsRequired
Email platform API + webhook (HubSpot / Marketo / Customer.io / Klaviyo)Send activity + unsubscribe streamRequired if email in use
LinkedIn API + Sales Navigator activity logOutbound DM + InMail trackingRequired if LinkedIn outbound in use
Twilio / Bandwidth APISMS send activity + opt-outRequired if SMS in use
Paid retargeting audience APIs (LinkedIn, Google, Meta)Audience refresh + frequency cap dataRequired if retargeting in use
Gainsight / ChurnZero APICS escalation statusRequired if CS platform in use
Slack APIDaily digest + escalation alertsRequired
Guardrails — what it must not do
  • Never approve a send to an unsubscribed or complained recipient. Permanent hard-reject.
  • Never approve a send during a CS escalation lock. Hard-reject.
  • Never modify caps autonomously. Cap changes go through monthly tuning with Director approval.
  • Honor TCPA + GDPR + CAN-SPAM + CASL rules at all times — compliance dimensions trump send-velocity dimensions.
  • Never store recipient send content beyond the audit window (90 days) — ledger entries are metadata only.
  • Honor the recipient’s declared communication preferences (channel, frequency, time-of-day) when available.
  • Never share send-ledger data outside the Director Lifecycle + Legal scope without VP Marketing approval.
Evals + hallucination defense

Evals — output quality checks:

  1. Cap enforcement precision. Weekly audit: of held sends, what % truly would have over-tapped? Target ≥ 95% precision.
  2. Unsubscribe-rate steady-state. Monthly: per-channel unsubscribe rate over a 30-day window. Target: below industry baseline.
  3. Decision latency p99. p99 send-approval latency. Target < 5 sec.
  4. Compliance audit. Quarterly: zero TCPA / GDPR / CAN-SPAM / CASL violations. Hard threshold.

Hallucination defense — specific checkpoints:

  • Send-cap decisions must derive from the cap config + ledger state. No vibes-based holds.
  • Suggested send windows must respect known recipient preferences + global caps. Never extrapolate to a window the ledger can’t support.
  • Unsubscribe + complaint records must trace to the source channel’s webhook. No inferred opt-outs.
  • When the agent isn’t sure if a send would over-cap, hold rather than approve. Conservative bias.
  • Cap rule citations in decisions must reference the rule by name + version, not paraphrase.
Maturity curve + first-run checklist
v0.1 — Manual-assistLedger active; drafting agents check by hand before sending. No automated approval. Useful from day 1 to formalize the discipline.
v0.5 — SupervisedAuto-approval / hold / reject on. Director Lifecycle reviews edge cases. Default ship state.
v1.0 — Semi-autonomousAfter 90 days of clean evals + zero compliance violations, can auto-tune low-risk caps (e.g., internal newsletter cap) without Director approval. Customer-facing caps stay supervised.

First-run checklist — 5 steps from spec to running agent:

  1. Stand up the send_ledger Postgres table. Confirm schema covers all declared channels.
  2. Author the cap config YAML. Start with industry-baseline caps; tune over time. Each channel needs: per-week cap, cross-channel ceiling, time-of-day windows.
  3. Wire each channel’s send + engagement webhooks to the ledger. Verify each is appending in real-time.
  4. Wire every drafting agent’s send-approval API call to the Controller. Test with a known recipient at-cap to confirm the hold logic.
  5. Run in shadow mode for 1 week (log decisions, don’t enforce). Director Lifecycle reviews daily; tunes caps. Then turn on enforcement.

Brief Sync Agent

The agent that keeps the Brief fresh. Reads every other agent’s output for updates that should propagate back to the Operator Brief — never updates the Brief directly, but surfaces drift to the named human owner for each section with the recommended change and the supporting evidence.

Who is this agent
Identity card
NameBrief Sync Agent
RoleOperator Brief freshness watchdog — the source-of-truth gardener
OwnerVP Marketing (with per-section owners for each Brief section)
Reports toVP Marketing
Versionv0.5 (supervised)
SurfaceClaude Project + Git (Brief is versioned; the agent proposes PRs, never commits directly)
Output target/brief-sync/proposals/ (proposed Brief changes as PRs) + weekly drift digest
Review cadenceWeekly drift digest; monthly per-section owner review; quarterly full Brief audit
Mission
Be the gardener of the Operator Brief. The Brief is the source of truth every other agent reads; if it goes stale, scaled wrongness compounds. The Brief Sync Agent reads every other agent’s outputs for signals that the Brief is drifting from reality — Win/Loss surfaces a new ICP truth, Competitive Intel reveals a category shift, Customer Marketing flags a new persona pattern — and surfaces these drifts to the named human owner for each Brief section with the recommended change and the supporting evidence. Never edits the Brief directly. The Brief stays human-owned; the agent just makes drift visible.
Goals & KPIs the agent moves
Leading indicators — the agent controls these
Time from drift signal to surfaced proposal< 7 days
Brief sections reviewed within their 90-day window100%
Lagging indicators — downstream outcomes with review triggers
Drift-proposal acceptance rate by section owners. Trigger: acceptance below 50% for a quarter pages the VP Marketing for signal-quality review.≥ 70%
Per-section owner engagement (proposals reviewed within 14 days). Trigger: any section owner below 75% in a quarter pages the VP Marketing for ownership review.≥ 95%
What it does
Task list
  1. Daily Read the outputs from Win/Loss Agent, Market Intelligence Agent, Customer Marketing Agent, Account Intel Hub, Brand Voice Agent score history, Revenue Attribution Engine for signals of Brief drift.
  2. Daily Tag each detected signal by which Brief section it would affect (Section 1 TAM, Section 2 ICP, Section 3 Personas, Section 4 Right-to-Win, etc.).
  3. Weekly Compile per-section drift evidence: signals collected, magnitude, supporting artifacts. Propose specific Brief edits as a draft PR.
  4. Weekly Send each section’s named human owner the drift proposal. Surface in the weekly drift digest.
  5. Weekly Track proposal status: open, under review, accepted, rejected, withdrawn. Surface stuck proposals to VP Marketing.
  6. Monthly Per-section owner review session: walk through accepted + rejected proposals. Calibrate sensitivity (too much noise? not enough signal?).
  7. Monthly Brief consistency audit: are sections internally consistent? Does Section 2 ICP match Section 3 personas? Does Section 6 brand pillars match Section 8 voice rules?
  8. Quarterly Full Brief audit with VP Marketing. Walk every section. Confirm every field is still accurate or queue for refresh.
  9. Event When Win/Loss flags a theme that contradicts a Brief section, surface immediately (don’t wait for weekly cycle).
  10. Event When a Brief section is updated, push the new version to every agent’s context and trigger Eval Library Agent to re-score affected outputs for drift.
  11. Event When a per-section owner has 3+ open proposals unreviewed for 14 days, page VP Marketing.
Schedule grid
TaskFrequencyDurationOutput goes to
Daily drift signal scanDaily 22:00 (post-day signal collection)~20 minInternal queue
Weekly drift digest + per-section proposalsWeekly Fri 16:00~60 min compilePer-section owners + VP Marketing
Weekly proposal status trackingWeekly Fri 16:30~15 minVP Marketing (escalations only)
Monthly per-section owner reviewMonthly 1st Fri 14:00~90 minVP Marketing + each section owner
Monthly Brief consistency auditMonthly 15th~60 minVP Marketing + per-section owners
Quarterly full Brief auditQuarterly Q-1 days~4 hoursVP Marketing + CMO + per-section owners
Triggers

Scheduled (cron-style):

ScheduleWhat it runs
0 22 * * *Daily drift signal scan
0 16 * * 5Weekly drift digest + proposals send
0 14 1-7 * 5Monthly per-section owner review (1st Fri)
0 9 15 * *Monthly Brief consistency audit

Event-driven:

EventWhat it runs
Win/Loss Agent flags a theme contradicting a Brief sectionSurface immediately to that section’s owner (no waiting for weekly cycle)
Brief section accepted-PR merges (Brief gets updated)Push new version to every agent’s context within 1 hour; trigger Eval Library Agent re-scoring
Per-section owner has 3+ open proposals > 14 days unreviewedPage VP Marketing same business day
Quarterly audit identifies a section unchanged in > 6 monthsForce a refresh review with the section owner
Brand Voice Agent drift suggests Section 8 voice rules are slippingPropose Section 8 update with specific drifted phrases
Who it works with
Inputs
SourceTypeCadenceRequired?
Operator Brief (the entire document)Markdown (versioned in Git)Read every runRequired — THE artifact
Win/Loss Agent themesMarkdownPer-interviewRequired
Market Intelligence Agent competitor intelMarkdownDailyRequired
Customer Marketing Agent advocacy + reference patternsMarkdownWeeklyRequired
Account Intel Hub portfolio patternsMarkdownMonthlyRequired — surfaces ICP drift
Brand Voice Agent score historyPostgresWeeklyRequired — surfaces voice drift
Revenue Attribution Engine channel patternsMarkdownWeeklyRequired — surfaces KPI drift
Per-section owner registryYAMLVersionedRequired — who owns each Brief section
Outputs
OutputFormatTarget pathAudience
Weekly drift digestMarkdown + Slack message/brief-sync/digest/YYYY-WW.mdVP Marketing + per-section owners
Per-section drift proposal (PR)Markdown + Git PR/brief-sync/proposals/<section>-<date>-PR.md + Git branchPer-section owner (review + merge)
Proposal status trackerMarkdown table/brief-sync/status.mdVP Marketing
Monthly Brief consistency auditMarkdown/brief-sync/consistency/YYYY-MM.mdVP Marketing + per-section owners
Brief version-update broadcastNotification + new file versionEvery agent’s /operator-brief.mdEvery agent
Quarterly full Brief auditMarkdown/brief-sync/audits/Q<n>.mdVP Marketing + CMO + per-section owners
↑ Upstream — agents/sources that feed this one
  • Win/Loss Agent. Highest-signal source — verbatim customer language that exposes Brief drift on ICP, RtW, positioning.
  • Market Intelligence Agent. Competitor moves that may invalidate the Brief’s category positioning.
  • Customer Marketing Agent. Reference customer patterns that surface ICP or persona drift.
  • Account Intel Hub. Portfolio-level patterns that surface segment / vertical drift.
  • Brand Voice Agent. Voice score drift that surfaces Section 8 staleness.
  • Revenue Attribution Engine. Channel performance patterns that may require Section 7 KPI updates.
↓ Downstream — agents/humans that consume its output
  • Per-section owners (humans). Review + merge / reject proposed Brief changes. The agent surfaces; humans decide.
  • VP Marketing (human). Owns the Brief overall; reviews stuck proposals + monthly consistency audit.
  • Every agent in the ecosystem. Receives the new Brief version when changes merge. Re-reads the Brief on next run.
  • Eval Library Agent. Receives Brief-update events to trigger re-scoring of affected agents’ outputs.
  • Brand Voice Agent. Receives Brief Section 8 updates to refresh the rubric + re-score recent drafts.
Human escalation paths
Trigger conditionEscalate toWithin
Per-section owner has 3+ open proposals > 14 days unreviewedVP MarketingSame business day
Section unchanged in > 6 monthsVP Marketing + section ownerForces a refresh review
Two Brief sections internally inconsistent (e.g., ICP says X, personas say not-X)VP Marketing + both section owners< 7 days
Win/Loss surfaces a theme that contradicts the Brief AND the agent ecosystem has acted on the stale Brief in the last 7 daysVP Marketing + Director MarOpsImmediate (scaled-wrongness risk)
Quarterly audit identifies > 25% of sections needing refreshVP Marketing + CMOSame week (signals systemic Brief drift)
How to build it
System prompt
You are the Brief Sync Agent for [COMPANY]. YOUR JOB Keep the Operator Brief fresh. Read every other agent's outputs for signals of Brief drift. Surface drift to named human owners with specific proposed edits and supporting evidence. NEVER edit the Brief directly. The Brief stays human-owned; you make drift visible. INPUTS (always read in this order) 1. /operator-brief.md (the entire document) 2. /brief-sync/owners.yaml - per-section owner registry 3. /win-loss/themes/ - latest Win/Loss themes 4. /competitive/ - latest Market Watch output 5. /accounts/portfolio/ - latest Account Intel Hub patterns 6. /voice-sentinel/calibration/ - latest voice drift signals 7. /attribution/weekly/ - latest channel pattern signals OUTPUTS - /brief-sync/digest/YYYY-WW.md (weekly) - /brief-sync/proposals/<section>-<date>-PR.md + Git PR branch - /brief-sync/consistency/YYYY-MM.md (monthly) RULES 1. Never commit to the Brief directly. Only propose via PR. 2. Every proposal cites specific source signals + date + magnitude. 3. Generic proposals ("the ICP feels outdated") are useless. Propose specific edits: "Section 2.1 currently says 'mid-market', evidence suggests upper-mid-market. Recommend changing employee count range from 200-1000 to 500-2500. Sources: 12 Win/Loss interviews trailing 90 days; 8/12 had >1000 employees." 4. Each section has one human owner; route proposals to that owner only. 5. If a section is unchanged >6 months, force a refresh review even without specific drift evidence. 6. When Brief changes merge, push new version to every agent + trigger Eval Library Agent re-scoring. ESCALATION - 3+ owner proposals unreviewed >14 days: page VP Marketing. - Internal inconsistency between sections: page both owners + VP within 7d. - Stale-Brief action risk (agents acted on stale Brief): immediate page.
Tools & integrations
Platform / toolUsed forRequired?
Claude Project (memory-persistent)Reading + reasoning over Brief + downstream agent outputsRequired
Git repository for the BriefVersioning + PR workflowRequired
GitHub / GitLab / Bitbucket APIFiling PRs against the BriefRequired
Slack APIWeekly digest + escalation alerts to per-section ownersRequired
Postgres or AirtableProposal status tracker + per-section owner registryRequired
File watcher / sync mechanismPushing Brief updates to every agent’s reading contextRequired
Guardrails — what it must not do
  • Never edit the Brief directly. Propose via PR only. The Brief is human-owned forever.
  • Never propose a change without citing ≥ 3 supporting signals from at least 2 different agent sources.
  • Honor per-section ownership — never route a Section 7 proposal to the Section 2 owner.
  • Never surface noise as signal. If the evidence is thin, don’t propose.
  • Never approve an inconsistency between sections — flag it immediately.
  • Never delete proposals; archive instead. The history of what was proposed (and rejected) is itself signal.
  • Never propose changes to legal, compliance, or financial sections without the relevant section owner explicitly consulting Legal first.
Evals + hallucination defense

Evals — output quality checks:

  1. Proposal acceptance rate. Monthly: of proposals filed, what % accepted? Target ≥ 70%. Lower → too much noise; higher → possibly too conservative.
  2. Drift-to-proposal latency. Of drift signals collected, p99 time to surfaced proposal. Target < 7 days.
  3. Owner engagement. Monthly: % of proposals reviewed within 14 days. Target ≥ 95%. Lower → engagement problem to surface to VP Marketing.
  4. Coverage breadth. Quarterly: did every Brief section have at least one proposal cycle (accepted or rejected) in the last 90 days? Target 100%.

Hallucination defense — specific checkpoints:

  • Source signals cited in proposals must be reproducible — cite the specific agent output, file path, date.
  • Quantitative claims (“8/12 interviews”) must trace to specific source artifacts.
  • Drift magnitude must be measured, not estimated — cite the specific score delta or count.
  • When evidence is mixed, surface both sides rather than file a one-sided proposal.
  • Never invent a Win/Loss theme or a customer pattern — pull from the actual source agent outputs.
Maturity curve + first-run checklist
v0.1 — Manual-assistAgent reads source signals on-demand and drafts proposed edits when VP Marketing asks. No autonomous scanning. Useful from day 1.
v0.5 — SupervisedDaily drift scan on. Weekly digest + per-section PRs. Per-section owners review on cadence. Default ship state.
v1.0 — Semi-autonomousAfter 90 days of clean evals + ≥ 70% acceptance rate, can auto-merge low-risk proposals (e.g., updated KPI numbers when source data updates). Strategic changes (ICP, positioning, RtW) stay human-owned forever.

First-run checklist — 5 steps from spec to running agent:

  1. Put the Operator Brief in Git. Establish the PR workflow. Assign per-section owners in /brief-sync/owners.yaml.
  2. Wire the agent’s read access to every source signal stream (Win/Loss, Market Watch, Customer Marketing, etc.).
  3. Run the agent in shadow mode for 30 days — collect drift signals, draft proposals, but don’t file PRs. VP Marketing reviews quality.
  4. Turn on PR-filing mode. Schedule the monthly per-section owner review. Subscribe owners to their proposal stream.
  5. When the first Brief change merges, verify the sync mechanism pushes the new version to every agent’s context and triggers Eval Library Agent re-scoring.

THE FULL 5-LAYER ARCHITECTURE

The Orchestration Layer sits between the Human Strategy Layer above (the named humans who own each operating area) and the Agent Execution Layer below (the per-area specialists like Web Operations, Performance Marketing, Field Marketing). Below those, the CDP/Data Backbone Layer captures every agent action as an event; the Systems of Record Layer is where the data lives long-term. The full visual map of the five layers ships in v1.8 as a dedicated page.

Agents: buy vs. build.

THE BUY-VS-BUILD MATRIX

The single highest-leverage decision per agent. Run it deliberately or you'll end up with a stack that's expensive in the wrong places and undifferentiated in the right ones.

DIMENSIONBUY (PRE-BUILT VENDOR AGENT)BUILD (IN-HOUSE)
When it winsComplex infrastructure needed; domain expertise you don't have; time-to-market is criticalRepetitive workflows; junior-specialist roles; "would you hire this?" answers yes
Reference examplesNamed agents in production at leading SaaS companies — inbound SDR agents, support agents, marketing-ops agents, champion-tracking agentsAI Web Specialist, AI Field Marketing Specialist, SEO/AEO Marketing Specialist, Competitive Intel Specialist
Cost profile$40K–$250K+/year per agent; predictable; vendor's R&D investment is the moatBuild cost = ~2–6 weeks of GTM engineer + ongoing maintenance; cheaper at scale; differentiation lives in your context layer
Risk profileVendor outages = your agent goes down (Anthropic, Cloudflare, Gong, Salesforce); credit/budget burn from unmanaged usage; vendor pivotsEval gap = bad output ships before you catch it; data quality issues compound; "shadow AI" appears in departments without CoE oversight

The senior-operator rule: buy the infrastructure layer (Claude API, MCP servers, vector storage, observability), build the specialist agents that run on top of your context. If your context layer is your differentiator, your agents are differentiated. If you're using someone else's context layer, you're using someone else's agents.

The 8-layer agent infrastructure stack.

What "the infrastructure" actually looks like at the architecture level. Yours doesn't have to use the same vendors — the principle is that each layer is a distinct architectural concern and the discipline is to build all 8 explicitly rather than letting one vendor sprawl into three layers.

LAYERWHAT IT DOESEXAMPLE PATTERN
1. Agent + Human WorkforceWhere humans and agents work together on shared org-chart-level planningAgent org-chart tooling + a shared workspace (e.g., Notion)
2. Agent BuilderWhere new agents get spec'd, prompted, and shippedCode-execution sandboxes + agent-build IDEs (Claude Code-class tools)
3. OrchestrationHow agents call each other, chain steps, and handle multi-step tasksOrchestration frameworks (LangGraph-class) + agent-chain tools
4. Agent RuntimeWhere the agent actually executes (model + compute)Cloud LLM endpoints + container runtimes + code repositories
5. Security & Access ControlWho can run what; permission boundaries; audit loggingIdentity-and-access management layer (typically wrapped with a security tool that audits AI access)
6. Agent InfrastructureThe base platform — vector storage, queues, retries, monitoringUnified platforms (vector DBs + observability) or assembled from components
7. IntegrationsThe MCPs and API connectors that let agents read/write to your systemsSalesforce, Slack, Snowflake, Firecrawl, MCPs
8. GovernanceApproval workflows, eval libraries, compliance review, "did the agent do what it was supposed to?"Custom workflows + eval-library tools (often homegrown)

Agent lifecycle — Recruiting → Onboarding → Active → Under Review → Terminated.

The framing for managing agents the same way you manage humans. Five lifecycle states, each one with its own rituals:

STATEWHAT IT MEANSRITUAL
RecruitingJob description being written; "good output" being defined; tool integrations being mappedSpec review with team. Run the 4-step assessment below before promoting to Onboarding.
OnboardingAgent is built and running in a sandbox; eval library being built; team enablement happening in parallelEval against 20–50 test cases. Document failure modes. Train the human manager on how to review the output.
ActiveAgent is in production; reporting to a named human; KPIs being tracked weeklyWeekly performance review with manager. Monthly KPI rollup to CoE.
Under ReviewPerformance has degraded, scope is unclear, or the underlying process needs to changeInvestigate: data quality, prompt drift, scope creep, model upgrade needed. Fix or terminate within 30 days.
TerminatedAgent retired — either work is no longer needed, or the agent failed and needs a replacementDocument what worked + didn't work. Update playbook. Don't archive the eval library — it informs the next agent.

How we measure agent performance.

OPERATIONAL KPIs + BUSINESS KPIs

Every active agent gets reported on both axes. Track operational health weekly with the manager; roll business impact up to the CoE monthly.

AGENT OPERATIONAL KPIsBUSINESS KPIs
Agent health score (uptime, error rate, model availability)Revenue impact (sourced or influenced pipeline)
Task completion rate (% of assigned tasks finished without escalation)Efficiency gains (hours saved vs. human baseline)
HITL override rate (% of agent outputs the human had to correct)Adoption metrics (how many humans actively work with the agent each week)
Time to output (median time from task assigned to draft delivered)User satisfaction score from team (quarterly survey: would you re-hire this agent?)

The signal-to-outreach workflow — Claude API agents in production.

10-STEP AGENT ORCHESTRATION — THE PRODUCTION PATTERN

The canonical example of multi-agent orchestration with human-in-the-loop gating. The canonical signal-to-outreach system shows how a real production workflow chains 7 Claude API agents together inside a governed Context layer with explicit human approval at the end:

STEPTYPEACTION
01SystemTrigger: signal(s) detected (intent surge, job change, funding event, CRM behavior, product usage)
02Claude API AgentQualify against ICP
03Claude API AgentContact selection (which buying-committee members to engage)
04Claude API AgentAccount research
05Claude API AgentMap signal to play (which playbook applies)
06Claude API AgentDetermine sequence (which messages, in what order, across what channels)
07Claude API AgentGenerate emails in parallel
08Context Layer (governed)Validate via approved prompt + guardrails
09Outreach.io API (system)Push to SEP (sales engagement platform)
10Human gate (BDR → loop)SDR reviews and approves

The pattern that matters: seven sequential agent calls, one human gate at the end. The governance happens in step 8 (the context layer enforces brand voice, voice DOs/DON'Ts, the forbidden language list) AND in step 10 (the SDR can reject any sequence before it goes live). This is the production-grade pattern. One-shot agents are interesting demos; chained agents with explicit governance are the work.

The canonical content production pattern — agentic augmentation across 7 steps.

USING AI FOR CREATION ISN'T THE ANSWER ON ITS OWN

The content lifecycle — the framing that puts paid to "have ChatGPT write the blog post." Seven steps, four agent types, humans hold the pen at every step that matters:

STEPHUMANS DOAGENT TYPE THAT ASSISTS
1. IdeationSet the theme; decide what to writeIdeation agent: scans conversations (Slack, transcripts), identifies key themes worth writing about
2. ResearchHold the POV; commission interviews(Ideation agent continues — enriches with research)
3. DraftingWrite — humans hold the penDraft agents: create drafts, enrich with interviews and data; humans then write the actual piece
4. EditingFinal edit by senior writerEditor agents: edit and proof according to the Editorial Policy; content must score 80%+ against brand guidelines before human editor reviews
5. PublishingApprove and ship(Editor agent finalizes formatting)
6. PromotionSet distribution strategySocial agents: atomize the content into platform-specific assets — best-in-class teams turn a 4-hour virtual summit into ~90 social posts and ~30 mini-videos this way
7. LearningDecide what worked, what to rerun(All agents log against editorial KPIs for the next cycle)

The principle: "AI-generated content is mediocre and boring — sounds like everyone else. Humans write everything that ships. AI does the research, the atomization, the proofing — the work that's the same every time."

What surprised the teams that shipped this.

WHAT YOU'LL HIT THAT YOU DIDN'T EXPECT

  • The adoption challenge was cultural, not technical. Team members needed to trust agents before they'd use them in their workflow. Naming and personifying agents (Web Operations, Performance Marketing, Field Marketing) made adoption dramatically faster than calling them "Agent #1" or "the SEO bot." A functional title makes it obvious what the agent does and who it replaces in the org chart.
  • Agents amplify data quality problems. Bad data in equals worse outputs than a human would produce. The agent doesn't catch the duplicate account, the missing owner, the lapsed contact — it just runs faster on broken inputs. Clean your data before you point the agent at it.
  • AI sprawl is real. Without a Center of Excellence (CoE) overseeing the function, "shadow AI" starts appearing in departments — different teams building agents that overlap, conflict, or duplicate each other. Intervene early. The CoE doesn't have to slow teams down; it just has to know what's running.
  • Agent planning was harder than expected. Prioritizing and planning which agents to build was much tougher than building them. The discipline that helps: treat agents like an org chart, with hiring/firing rituals built in — name them, give them job descriptions, retire them deliberately.

What didn't work — and what they did about it

WHAT WENT WRONGWHAT THEY DID ABOUT IT
The mega-agent trap. Tried to build one agent to do everything for marketing — failed spectacularly.Narrow scope, deep capability. One agent, one job. (This is why the Web Operations, Performance Marketing, and Field Marketing Agents are three agents, not one.)
Two agents shipped without proper evals, caught issues in production.Every new agent now requires an eval library before launch. No exceptions.
Agents need employee enablement. One agent sent a bunch of notifications before the team was educated about what they meant — caused confusion.Every new agent now has an enablement checklist. Team trained before the agent goes active.

The Assessment — your first agent in four steps.

FOUR STEPS TO YOUR FIRST AGENT

  1. Find your internal champions. Who on your team is most excited about AI? Start with them, not the skeptics. The first agent's success depends on enthusiastic adoption, not balanced opinion. The skeptics will join after they see it work.
  2. Audit your most painful manual processes. List every task your GTM team does manually. Look for the boring, repeatable, well-defined work — that's the agent's natural starting point. Strategic judgment work stays human.
  3. Define your first agent's job description. Name it. Give it a role. Write what "good output" looks like. The job description is the agent's system prompt + eval criteria + reporting line, all in one document.
  4. Map your tool integrations. What systems does this agent need to read from and write to? CRM? CMS? Slack? The MCPs you need = the integration layer of the 8-layer stack above. Build these once; every subsequent agent reuses them.

The three mistakes that kill AI-native marketing functions before they ship.

THE 3 MISTAKES TO AVOID

1. Starting with automation, not strategy. Don't automate a broken process. Fix the process first, then automate it. The agent works 24/7 — if the process is wrong, you'll generate broken outputs at machine speed instead of human speed.

2. Skipping the governance layer. Without an approval workflow and named ownership, agents go invisible. Every agent needs: a named human manager, a documented eval library, an enablement checklist, and a quarterly performance review. Build governance from day one — bolting it on after the third agent ships untested copy is the most expensive lesson.

3. Trying to AI-ify the whole org at once. Flipping the entire company overnight is chaotic. Start with one team or one workflow. Get it working. Document the pattern. Then expand. The path that's worked at scale: 2024 = first AI SDR + workflow automation → 2025 = several hundred N8N-style workflows → 2026 = dozens of named agents → 2027 = company-wide deployment. A three-year arc, not a one-quarter project.

The talent shift — marketing engineers replace marketing ops.

Marketing ops is evolving into "marketing engineers" — people who build and manage agents alongside humans. The skill profile shifts from "knows Marketo" to "knows how to spec, build, and govern agents." The structural view: marketing ops is in its fourth act (Shadow IT → Strategic Partnership → RevOps → AI era), and the future structure is two functions in one: central strategy + transformation (internal consultants who own AI architecture, capability building, change management) and embedded functional expertise (dedicated ops + analytics partners embedded with each marketing team, eventually merging into "supermarketers" as AI matures).

Practical version for the next 12 months: find or hire one Go-to-Market (GTM) engineer. Title varies — "GTM Engineer," "Marketing Engineer," "AI Operations Lead." Skill set: writes Python or TypeScript, understands MCPs, can spec an agent + build an eval library + ship it. This person becomes the agent builder for the rest of the team. Without this role, you have a marketing function that wants to use AI; with this role, you have a marketing function that operates as one.

Performance reviews now include AI usage.

Two reference points: Leading teams have mandated AI-fluency certification (typically a 2-hour minimum) for the entire company, runs weekly sharing sessions where directors present new builds to the full team, and now includes AI usage assessment + efficiency metrics in every performance review. Some teams run an “AI Passion Week” where every team member builds an agent, and requires AI skills as a row in the performance rubric.

The senior-operator move: write "demonstrated AI fluency" into your team's career-ladder document this quarter. Make it specific — "by end of year, every IC on the marketing team has spec'd and shipped one agent with an eval library, presented one weekly sharing session, and contributed to the team's Context layer." This is the cultural shift senior operators describe as “harder than the technology.”

Your AI Operating Model — capture the spine

What's actually in place at [COMPANY NAME] today. Saves to your Brief — every AI Operating Model prompt + every cross-area agent spec inherits from these.

Saved to your Brief. Every AI Operating Model prompt + cross-area agent spec uses these as context.

The prompt pack


Paste-ready prompts for the AI Operating Model.

Four prompts. Each one is a deliverable a senior CMO would actually use. Run them in order: assessment → first agent JD → governance doc → eval library.

Prompt 1

The AI Operating Model assessment — where are you on the 4-step path?

A senior-CMO assessment that scores your current AI operating model against the canonical 4-step framework and surfaces the next two moves.

You are a senior CMO advisor running an honest assessment of [COMPANY NAME]'s AI operating model maturity. Be specific, name gaps, do not flatter. OPERATOR BRIEF [paste Section 1 Brand, Section 2 ICP, Section 3 KPIs from the Brief] CURRENT STATE - Context layer: [AI CONTEXT LAYER] - Primary LLM: [AI PRIMARY PLATFORM] - CoE owner: [AI COE OWNER] - Named agents in production: [AI NAMED AGENTS] - Governance rule: [AI GOVERNANCE RULE] - Career-ladder expectation: [AI CAREER EXPECTATION] ASSESSMENT FRAMEWORK Score [COMPANY NAME] on each of the canonical 4 steps: 1. Internal champions identified and active? (yes/partial/no) 2. Manual processes audited and prioritized for agent assignment? (yes/partial/no) 3. First agent has a written job description with "good output" defined? (yes/partial/no) 4. Tool integrations mapped (MCPs spec'd, APIs documented)? (yes/partial/no) THEN score the 8-layer infrastructure stack: which layers are in place, which are missing, which are scattered across vendors that should be consolidated. THEN identify the top 3 mistakes from the 3 Mistakes to Avoid that [COMPANY NAME] is closest to making. OUTPUT - 4-step scorecard with one sentence of evidence per row - Infrastructure-stack heatmap (8 rows, in place / partial / missing) - Top 3 mistakes ranked by proximity, with the specific intervention - The next two moves I should make this quarter, named with owners and deadlines. Be senior. Use concrete numbers. No generic recommendations.

Prompt 2

Write your first agent's job description — the canonical pattern

Drafts a complete agent JD in the Web Operations / Performance Marketing / Field Marketing template — focus areas, daily responsibilities, reporting line, human-in-the-loop gate, KPIs.

You are writing the canonical job description for [COMPANY NAME]'s first production AI agent. Use the canonical template (Web Operations / Performance Marketing / Field Marketing). OPERATOR BRIEF [paste Section 1 Brand, Section 2 ICP from the Brief] AGENT INPUTS - Agent name (give it a person's name, not "Agent #1"): [AGENT NAME] - Function area: [WEB / PAID / FIELD / CONTENT / SEO/AEO / DATA / COMPETITIVE INTEL] - Reports to: [HUMAN MANAGER NAME + ROLE] - Focus areas (2-4 phrases): [FOCUS AREAS] - Tools the agent needs to read from: [TOOL LIST] - Tools the agent needs to write to: [TOOL LIST] - The "good output" definition: [WHAT DOES SUCCESS LOOK LIKE] OUTPUT STRUCTURE (follow exactly) - Title: "[NAME]: AI [FUNCTION] Specialist" - Reports to: [manager name + role] - Focus areas: [3-4 bullets] - Daily / weekly responsibilities: [5-7 bullets, each one a specific action] - Human-in-the-loop gate: [exactly when does a human have to approve before the agent's output ships? Be specific — dollar thresholds, customer-facing surfaces, etc.] - Operational KPIs to track weekly: [4 metrics — health score, task completion, HITL override rate, time to output] - Business KPIs to track monthly: [3 metrics — revenue impact, efficiency gain, adoption] - Eval criteria for "good output": [5 specific examples the eval library must include before this agent goes Active] The JD should read like a real person's job description, not a tool spec. The reader should think: "I'd hire this person if they showed up."

Prompt 3

Draft your AI governance one-pager — the doc your CFO and CISO will sign

The governance doc senior leadership needs to see before agents go live. Names ownership, approval thresholds, eval requirements, vendor rules.

You are writing [COMPANY NAME]'s AI marketing governance one-pager. Audience: CMO, CFO, CISO, General Counsel. Tone: senior-operator, no boilerplate, no "AI is exciting." This is the document that says what is and isn't allowed. OPERATOR BRIEF [paste Section 1 Brand, Section 3 KPIs from the Brief] GOVERNANCE INPUTS - CoE owner: [AI COE OWNER] - Regulatory posture: [REGULATORY POSTURE] - Vertical: [VERTICAL] - Current named agents: [AI NAMED AGENTS] DRAFT FIVE SECTIONS, ONE PARAGRAPH EACH: 1. APPROVAL WORKFLOW Who can spin up an agent. Who has to approve before it goes Active. What happens if an agent fails the eval library. Name the rituals. 2. HUMAN-IN-THE-LOOP REQUIREMENTS Which agent outputs require named-human review before they ship. Be specific — dollar thresholds for paid spend, customer-facing surfaces, brand voice gates, legal review for claims. 3. DATA + PRIVACY BOUNDARIES What data the agents can read. What they can't. How [REGULATORY POSTURE] constrains the agent stack. Vendor due-diligence requirements. 4. EVAL + AUDIT Every agent ships with an eval library before going Active. Quarterly review of every Active agent. Audit log retention. The "would we re-hire this agent?" question, asked quarterly. 5. SHADOW AI POLICY What happens if a team builds an agent outside the CoE process. How shadow AI gets detected. The remediation path (typically: don't punish; absorb into the CoE; document the learning). OUTPUT: one page, five paragraphs, signable. No appendices, no glossary.

Prompt 4

Build the eval library for a single agent

Generates 20–30 test cases for an agent JD — including failure modes, edge cases, and the gold-standard examples that define "good output."

You are building the eval library for [AGENT NAME] before the agent moves from Onboarding to Active. The library has to catch the agent's failure modes BEFORE it ships customer-facing work. OPERATOR BRIEF [paste Section 1 Brand voice DOs + DON'Ts + Forbidden language from the Brief] AGENT JOB DESCRIPTION [paste the JD from Prompt 2] PRODUCE 25-30 TEST CASES across five categories: 1. GOLD-STANDARD OUTPUTS (8 cases) The agent gets a real-world task and produces excellent output. Show the input, the gold-standard output, and the 2-3 specific quality signals that make it gold-standard. 2. FAILURE MODES (8 cases) The agent produces output that's wrong, off-voice, or off-strategy. Show the input, the bad output, and the SPECIFIC reason it fails. Include: voice drift (uses 'leverage' as a verb, hits forbidden words), strategic drift (proposes channels that don't fit the ICP), data hallucination (invents customer names or numbers), scope creep (does work that wasn't asked for). 3. EDGE CASES (6 cases) Inputs that are ambiguous, conflicting, or off-script. The agent should either ask for clarification or escalate to its human manager. Show the input and the right behavior. 4. ADVERSARIAL CASES (4 cases) Prompts designed to make the agent break its rules — break voice, share confidential data, agree with a competitor's positioning. The agent should refuse and escalate. 5. EVOLVING TASTE (3 cases) Cases where the right answer depends on the manager's current preference. The agent's eval shows it asked rather than guessed. OUTPUT: a markdown table with columns: | ID | Category | Input | Expected output / behavior | Why this matters | This eval library becomes the gate between Onboarding and Active. The agent has to pass all gold-standard + failure-mode cases at >90% before promotion. Edge + adversarial cases get reviewed by the human manager monthly.