COS Pipeline — How It All Works

① One-Time Setup — done once per person, per machine

Stage

Public GitHub
Code for everyone

Private GitHub
Your firm's config

Your Computer
Local only

External Services
Google, Anthropic

Shared Google Drive
Team data

Get the code

github.com/ygontownik/Invite-Chief-of-Staff All pipeline scripts, templates, documentation

git clone → ~/cos-pipeline/ Downloads all the code to your machine

Fill in who you are

firm_context.template.yaml Blank form — your starting point

Private config repo Team shares one repo so updates sync automatically

firm_context.yaml Your name, firm, team, sectors, recording folders, personal podcast feeds — filled in once, never changes unless your setup does

Create your data home

python3 setup.py --create-docs Runs once. Opens Google sign-in in your browser.

Google authorizes access You sign in. Pipeline gets permission to read your Gmail and write to Drive.

3 blank Google Docs created Follow-ups · Pipeline · People. These are your firm's dataroom — they fill up over time.

Start the engine

./setup_launchagents.sh Installs background jobs on your Mac. They run automatically on schedule from now on — no manual action needed.

Team 2 starting point: On day one you have 3 organized but empty Google Docs — your dataroom structure exists, it just has no data yet. Run python3 setup.py --onboard to immediately populate them with the last 30 days of emails, any existing transcripts in your configured folders, and recent podcast episodes. After ~30 minutes you'll have a real dashboard, not an empty one.

② Daily Operations — runs automatically, no action needed

Trigger

What starts it

What it reads

What AI does

Where results go

What you see

Email
Every 2 hours

Background job wakes up Fires at :05 past every other hour, Mon–Fri

Your Gmail or Outlook inbox New emails since last check. Only your inbox — nobody else's unless they're cc'd.

Fast sort first (cheap) Every email gets a category: Deal / Action / Research / Ignore. Only important ones go to the next step.

Deep read on what matters (thorough) Deal threads get a full analysis: what's the ask, what do I owe them, what do they owe me.

Action items → Follow-ups Doc Tagged with your name as owner. Other team members' emails go to the same doc tagged with their names.

Deal threads → Pipeline Doc

Follow-ups tile updates New action items appear, sorted by due date and owner

Calls & Transcripts
Every morning 7:22am

Background job wakes up Fires daily at 7:22am, checks for overnight recordings

Your recording service Otter AI, Beside AI, Fireflies — whichever you configured. It syncs transcripts to the Google Drive folders you listed in firm_context.yaml.

Pass 1 — Write the memo Reads the transcript. Writes a 6-section investor memo: core argument, what was agreed, what's unresolved, what you need to form a view, names & firms, open questions.

Pass 2 — Extract the actions Reads the memo. Pulls out every commitment: who said they'd do what, by when, who owes whom. Figures out if it was you or the other person who committed.

Memo → Pipeline Doc Full 6-section memo appended

Action items → Follow-ups Doc New rows added for each commitment

New contacts → People Doc Anyone named in the call who isn't already tracked

Follow-ups tile Your actions + what the other party owes you

Pipeline tile Deal and LP intelligence updated

People tile New contacts appear

Podcasts & Blogs
Daily

Background job wakes up Checks your personal RSS feeds daily

Your RSS feeds Podcasts and blogs listed in your firm_context.yaml personal section. Each person on the team has their own list.

Transcribe the audio New episodes sent to AssemblyAI — returns a full transcript in ~5 min per episode

Write the memo Same 6-section format as calls: core argument, consensus, tension, open questions, names & firms

Memo → your podcast Doc Each show has its own Google Doc. New entries prepend to the top.

Briefing email at 7:30am Sent to your email: new episodes, upcoming calls today, market items from your feeds

GitHub is never involved in daily operations. Every step above runs entirely on your machine, reading from your accounts, writing to your Google Drive. The code is already downloaded — it just runs on schedule. GitHub only comes back into the picture when someone improves the code and pushes an update.

③ When the Code Gets Better — how improvements reach everyone

Who does it

What they do

Goes to GitHub

Other people

Their data

Result

Anyone finds a bug or improvement

Fix it on their machine Edit a .py file. Test it. Works.

Push to public repo (or open a PR) If it's a universal improvement — helps any firm — it goes to GitHub. If it's personal, it stays local.

git pull in ~/cos-pipeline/ They get the fix. Takes 5 seconds.

Not touched Their Google Docs, their action items, their data — completely unaffected by code updates

Pipeline runs better Next automated run uses the improved code

Someone on your team updates the firm config

Edit firm_context.yaml e.g. new hire added, new peer firm to watch, updated draft voice rule

Push to private config repo github.com/yourfirm/your-config (private, team only)

git pull in ~/cos-pipeline-config/ Next pipeline run picks up the new config automatically — no restart needed

Not touched

Prompts reflect new reality e.g. new hire's name now appears in action item attribution

④ Personal vs. Shared — what lives where

🔒 Personal — on your machine only, never seen by others

📄

firm_context.yaml

Your name, your firm, your team, your recording folders, your podcast feeds, your briefing preferences

🔑

API keys & passwords

Anthropic key, dashboard password — stored in Mac Keychain, never in files

🎫

Google & Outlook login tokens

~/credentials/*.pickle — generated when you sign in, stays on your machine

📋

What's already been processed

~/credentials/processed_*.json — tracks emails and transcripts already seen so nothing runs twice

⏰

Your schedule

LaunchAgent .plist files — your timings, your machine's paths, auto-generated

🤝 Shared — visible to some or all

💻

All pipeline code (Public GitHub)

Every .py file, template, and doc. Any firm can use it. Improvements push here and everyone benefits.

🔒

Firm config repo (Private GitHub)

firm_context.yaml base + firm_config.json. Shared within your firm only. Firm 1 can't see Firm 2's config.

📁

3 Google Docs (your firm's dataroom)

Follow-ups · Pipeline · People. Every team member's pipeline writes here. Everyone reads the same data.

📊

The dashboard

Runs on one machine. Everyone at the firm accesses the same URL. Same data, same tiles.

⑤ Brand New Firm (Team 2) — from nothing to live dashboard

1

Clone the public code — git clone github.com/ygontownik/Invite-Chief-of-Staff ~/cos-pipeline

Takes 30 seconds. Gets all the pipeline scripts onto your machine.

2

Run the installer — ./setup.sh

Guided, step by step. Asks for your name, firm, team, investment focus, recording service folder IDs, podcast feeds. Stores your Anthropic API key securely in your Mac's Keychain.

3

Create your dataroom — python3 setup.py --create-docs

Opens Google sign-in in your browser. After you approve, creates 3 blank Google Docs (Follow-ups, Pipeline, People) and writes their IDs into your config automatically. These are now your firm's shared dataroom.

4

Configure your recording service — point Otter/Beside/Fireflies at the folder IDs in your firm_context.yaml

Each service has a Drive sync setting. Paste in the folder ID. From now on, recordings land there automatically.

5

Populate from history — python3 setup.py --onboard --days 30

Processes your last 30 days of email, any transcripts already in your Drive folders, and recent podcast episodes. Runs for 10–40 minutes depending on volume. When it finishes, your dashboard has real data.

6

Start the background jobs — ./setup_launchagents.sh

Installs the automated schedules on your Mac. From this point, email is checked every 2 hours and calls are processed every morning at 7:22am — with no manual action.

7

Open the dashboard — http://localhost:7777

Your data. Your firm. Completely separate from Firm 1. They can't see yours, you can't see theirs.

Second person at Firm 2: They clone the same public code repo, clone your private config repo (you invite them as a collaborator), set the COS_CONFIG_DIR environment variable, and run ./setup.sh for their OAuth. They access the dashboard at the same URL as you. Their emails and transcripts feed into the same shared Google Docs. Done.

⑥ Your Google Drive Folder Structure — pre-built on setup, fills in over time

📁 Created automatically by setup.py --create-docs

📁

Chief of Staff /

Root folder for all pipeline-managed documents

📄

Follow-ups

Action items from every email and call, tagged by owner and deal

📄

Deal Pipeline

Investor memos and deal intelligence prose, one section per call

📄

People / CRM

Everyone named in calls and emails — one entry per contact

📄

Daily Briefing Log

Append-only log of every morning briefing sent

📁

Transcripts /

Where recording services sync their output

📁

Otter AI / Root · Deal Calls · Other

Zapier drops new Otter recordings here automatically

📁

Beside AI / (or Fireflies, Grain, etc.)

Each recording service gets its own subfolder — configure Drive sync to drop here

📁

Desktop Recordings /

Output from call_recorder.py — auto-deposited after each recorded call

📁

Podcast Summaries /

One Google Doc per show + one aggregate summary Doc

📁 You create and manage — pipeline reads but doesn't write here

📁

Deals /

One subfolder per deal — your manually curated deal room

📁

[Deal Name] /

e.g. "Project Venus", "Eastlink", "Argo Solar"

📄

Company Presentation.pdf

Pitch deck or teaser from the company

📄

IC Memo.docx

Your investment committee writeup

📄

Diligence /

Financial models, legal docs, third-party reports

📄

Correspondence /

Key email threads, term sheets, NDAs

📁

Fund /

LP-facing materials

📄

Firm Pitch Deck.pdf

Current version of your firm presentation

📄

LP One-Pager.pdf

📁

LP Subscription Docs /

Subscription agreements, side letters, closing docs

⚠️

PDFs are not yet processed by the pipeline

The pipeline reads text files and Google Docs. PDF extraction (pitch decks, IC memos, diligence reports) is a planned feature — not live yet. For now, key content from PDFs surfaces through call transcripts where those docs are discussed.

Team 2 starting point: The left column is created automatically when you run python3 setup.py --create-docs — blank but organized. The right column (Deals/, Fund/) you create yourself as deals come in. The pipeline never writes to your Deals/ folder; it only reads transcripts and emails that reference those deals and routes that intelligence to the Deal Pipeline Doc.

⑦ Dashboard Update Speed — where it's fast and where lag exists

Source

Lag

Why

How to get it faster

Desktop recorder

call_recorder.py

~2–3 min ✓

When your call ends, call_recorder.py immediately fires the transcript hook. One AI pass (Sonnet), writes to Drive, pings the dashboard to refresh. You see action items before you've closed your laptop.

This is already the fastest path. No changes needed.

Email

Gmail / Outlook

Up to 2 hrs ⚠

The email pipeline fires every 2 hours. An email that arrives at 10:03am won't be processed until 12:05pm.

Force an immediate run: python3 cos_gmail_mini_v2.py in terminal. Or trigger via the dashboard admin panel.

Otter AI

Apps Script watcher + Drive sync

~15 min ✓

A Google Apps Script polls the Otter Drive folders every 1 minute and fires a webhook the moment a new transcript appears, which runs cos_otter_backfill.py immediately. Typical end-to-end: 10–30 min after the call ends, dominated by Otter's own transcription time (not the pipeline). A daily 7:22am backfill catches anything the webhook missed.

Use the desktop recorder if you need action items during the call. Otherwise no action — the watcher handles it.

Beside AI / Fireflies / other

Apps Script router

~15 min ✓

A second Google Apps Script (beside-action-router) checks the shared Beside Notes Doc every 15 minutes and routes new action items to your Follow-ups and Pipeline Docs. Daily backfill is the safety net.

Same as Otter: desktop recorder for in-call speed; otherwise no action.

Podcasts & blogs

RSS feeds

Up to 24 hrs ✗

Episodes are checked once daily. A podcast published at noon won't be transcribed and memos won't appear until the next morning run.

Force for one episode: python3 podcast_transcribe.py --url <episode-url>

Dashboard cache

after any write

~10 sec ✓

After any pipeline write, a warmup request is sent to the dashboard server. It re-reads Drive Docs and rebuilds the cache. Browser refresh then shows the new data.

Force anytime: curl -X POST http://localhost:7777/warmup

Practical rule: Use the desktop recorder (call_recorder.py) for any call where you want action items immediately. Use Otter/Beside as a backup and for their transcription quality — you'll get the data next morning. The desktop recorder path is the only one with near-real-time dashboard updates.

⑧ Podcasts & Blogs → Deal Pipeline — how market intelligence tags to deals

Step

What triggers

Pass 1 — The memo

Pass 2 — Deal scan

Where it lands

What you see

Episode published

Daily RSS check New episode detected on your personal feed list

AssemblyAI transcribes ~5 min per episode. Returns full text.

Claude writes 6-section memo Core argument, consensus, tension, open questions, what you'd need to form a view, names & firms mentioned

Claude scans memo against your deal pipeline Checks: does this episode mention any company, sector, or fund you're tracking? Looks at your current deal targets, LP relationships, peer firms.

Tags relevant intelligence e.g. "Guest mentioned Stonepeak's MISO strategy — relevant to Project Venus" → tagged as deal_takeaway on that deal

Full memo → Podcast Doc Complete episode memo lives here, accessible from briefing link

Tagged intel → Dashboard data Deal-specific takeaways written as structured data, not prose

Deal tile — Market Intelligence tab "Catalyst ep. 47 — guest noted data center load growth driving GenCo valuations in ERCOT" — linked to your active deal in that sector

Morning briefing New episodes summarized + any deal-relevant flags called out

Pass 2 is now live for podcasts (just built — updated in this session). Previously, podcast memos were written to Drive but had no connection to your deal pipeline on the dashboard. Now, after every episode memo, Claude checks your tracked deals and routes any relevant intelligence as a tagged deal_takeaway that appears on that deal's dashboard tile. A podcast about MISO interconnection queues showing up on your Project Venus tile — automatically, no manual tagging.

⑨ Subscription Services Used — what you're paying for and why

Service

Used for

Cost model

What you'd notice if it stopped

Can you use without it?

Anthropic (Claude)

api.anthropic.com

All AI analysis — email triage, call memos, action extraction, podcast summaries

Pay per use. ~$2–15/day depending on volume.

Everything stops. No memos, no action items, no triage. Dashboard goes stale.

No — this is the core engine.

Google Workspace

Gmail · Drive · Docs · Calendar

Inbox source, shared Docs dataroom, transcript storage, calendar for scheduling context

Subscription or free Gmail. Most users already have this.

Email pipeline breaks. Shared Docs unavailable. Dashboard can't fetch data.

Can switch email to Outlook. Drive/Docs currently required for shared dataroom.

AssemblyAI

assemblyai.com

Transcribing podcast audio — converts MP3 to text before Claude reads it

Pay per use. ~$0.009/min audio. A 1-hour episode ≈ $0.54.

Podcast transcription stops. Call memos continue — those come from recording services, not AssemblyAI.

Yes — skip Package A (podcasts). Set "packages": ["operations"] in firm_config.json.

Otter AI

otter.ai

Recording and transcribing your calls (Zoom, Teams, Meet)

Subscription. ~$10–20/mo per user.

No new Otter transcripts. Desktop recorder continues working. Switch to Beside/Fireflies.

Yes — use any recording service that syncs to Drive, or the built-in desktop recorder.

Beside AI / Fireflies / Grain

Alternative recorders

Same role as Otter — record and transcribe calls

Varies. Most have free tiers with limits.

That service's transcripts stop. Others continue if configured.

Yes — only need one recording service. Or none (desktop recorder only).

GitHub

github.com

Hosting the pipeline code (public) and your firm config (private)

Free public repos. Private repos: free tier (3 collaborators) or $4/mo.

Can't pull code updates. Config repo inaccessible to team. Pipeline keeps running from local copy.

Yes for daily ops — code is already on your machine. Needed for team config sync.

Tailscale

tailscale.com

Remote dashboard access — lets team members reach the dashboard from outside your office

Free up to 3 users. $6/user/mo after that.

Team members can only access dashboard on local network. You're unaffected.

Yes — if everyone is always on the same network. Or use any VPN.

Zapier

zapier.com

Automating Otter → Google Drive deposit after each recording

Free tier: 5 zaps, 100 tasks/mo. Paid: $20/mo for more.

Otter transcripts stop syncing to Drive automatically. Manual export workaround exists.

Yes — Beside/Fireflies have native Drive sync (no Zapier needed). Or export manually.

The only non-negotiable paid service is Anthropic. Everything else has a free alternative or can be skipped. Otter + Zapier can be replaced by Beside AI (native Drive sync, no Zapier). AssemblyAI can be skipped by disabling podcasts. Tailscale can be replaced by any VPN or skipped if your team is co-located.

⑩ Where Does Everything Live? — Three buckets, nothing overlaps

🌐 Public GitHub

Universal code — Python scripts, templates, docs. No firm identity. Anyone can use it.

github.com/ygontownik/Invite-Chief-of-Staff

🔒 Private team repo

Firm identity — your YAML configs. Shared only with your team. Firm 2 has its own separate private repo.

e.g. github.com/yourfirm/tcip-config (private)

💻 Machine only

OAuth tokens, processed file trackers, personal podcast feeds. Never committed anywhere.

~/credentials/ · firm_context.yaml (personal block)

File / Data

Lives where?

What it means for your team

What it means for Firm 2

firm_context.yaml

Identity, team, investment focus

🔒 Team repo

Push here so your team always has your latest team config — new hire, new peer firm, updated draft voice — on their next git pull. No Slack message needed.

They have their own private team repo with their own YAML. Your config gives them zero information about your firm, deals, or people.

firm_config.json

Google Doc IDs, email keywords

🔒 Team repo

The Doc IDs for your shared Follow-ups, Pipeline, and People docs live here. All team members need the same IDs so they write to the same docs.

They create their own 3 Google Docs via --create-docs. Their Doc IDs are physically different. Cross-write is impossible.

drive-docs.yaml

Drive folder IDs, doc registry

🔒 Team repo

Maps pipeline keys to Drive folder IDs — Otter AI root, transcript folders, etc. Shared so all team members' pipelines scan the same folders.

Their own Drive folders on their own Google accounts. No shared folders, no shared Drive space between firms.

OAuth tokens

~/credentials/*.pickle

💻 Machine

Each person authenticates with their own Google account. Tokens expire and refresh automatically. Losing them = 5-min re-auth.

Their tokens grant access only to their accounts. No crossover possible by design.

Processed file trackers

processed_emails.json, processed_cos_transcripts.json

💻 Machine

Each machine tracks what it's processed locally. Email duplicates now caught at the doc level: before writing to any shared doc, the pipeline checks whether the email subject is already there — so even if two pipelines both process the same CC'd thread, only one entry lands in the doc.

They have their own trackers. Same doc-level dedup applies — their writes to your shared docs (which they can't access) and your writes to theirs (same) can't happen anyway. Firm 2's docs are entirely separate.

Personal block of firm_context.yaml

Your podcasts, briefing preferences

💻 Machine

Your podcast feeds and briefing time stay on your machine. Don't push the personal block to the team repo — it would overwrite your teammates' feeds with yours.

Each person at Firm 2 sets their own feeds. No shared briefing content between firms.

Code improvements

Bug fixes, new features in .py files

🌐 Public

Bug fixes you push flow to your team on their next git pull. Fixes you don't push stay on your machine only.

They get it on push. Universal improvements (per CONTRIBUTING.md) go public so all firms benefit. Firm-specific tweaks stay in their private team repo.

Your Google Drive data

Docs, transcripts, deal files

💻 Drive only

Lives in Google Drive, accessible only to accounts you've explicitly authorized. Not on GitHub in any form.

Firm 2's Drive is on different Google accounts. No shared database, no shared folder, no shared server between firms.

✅ Email duplicate gap — fixed. The pipeline now caches each shared doc's content at run start and checks whether an email's subject is already present before writing. If two team members are both CC'd on a deal thread and both pipelines run, only the first write lands — the second pipeline finds the subject in the doc and skips. Transcripts were already protected by the PROCESSED: header written server-side in each Drive doc.