COS Pipeline — How It All Works

Plain-English guide to every flow: setup, daily operations, code updates, and what's personal vs. shared.

Public GitHub (code anyone can use)
Private GitHub (your firm's config)
Your computer only (never shared)
External services (Gmail, AI, recorder)
Shared Google Drive (your team's data)
Dashboard (what you see in the browser)
① One-Time Setup — done once per person, per machine
Stage
Public GitHub
Code for everyone
Private GitHub
Your firm's config
Your Computer
Local only
External Services
Google, Anthropic
Shared Google Drive
Team data
Get the code
github.com/ygontownik/Invite-Chief-of-Staff All pipeline scripts, templates, documentation
git clone → ~/cos-pipeline/ Downloads all the code to your machine
Fill in who you are
firm_context.template.yaml Blank form — your starting point
Private config repo Team shares one repo so updates sync automatically
firm_context.yaml Your name, firm, team, sectors, recording folders, personal podcast feeds — filled in once, never changes unless your setup does
Create your data home
python3 setup.py --create-docs Runs once. Opens Google sign-in in your browser.
Google authorizes access You sign in. Pipeline gets permission to read your Gmail and write to Drive.
3 blank Google Docs created Follow-ups · Pipeline · People. These are your firm's dataroom — they fill up over time.
Start the engine
./setup_launchagents.sh Installs background jobs on your Mac. They run automatically on schedule from now on — no manual action needed.
Team 2 starting point: On day one you have 3 organized but empty Google Docs — your dataroom structure exists, it just has no data yet. Run python3 setup.py --onboard to immediately populate them with the last 30 days of emails, any existing transcripts in your configured folders, and recent podcast episodes. After ~30 minutes you'll have a real dashboard, not an empty one.
② Daily Operations — runs automatically, no action needed
Trigger
What starts it
What it reads
What AI does
Where results go
What you see
Email
Every 2 hours
Background job wakes up Fires at :05 past every other hour, Mon–Fri
Your Gmail or Outlook inbox New emails since last check. Only your inbox — nobody else's unless they're cc'd.
Fast sort first (cheap) Every email gets a category: Deal / Action / Research / Ignore. Only important ones go to the next step.
Deep read on what matters (thorough) Deal threads get a full analysis: what's the ask, what do I owe them, what do they owe me.
Action items → Follow-ups Doc Tagged with your name as owner. Other team members' emails go to the same doc tagged with their names.
Deal threads → Pipeline Doc
Follow-ups tile updates New action items appear, sorted by due date and owner
Calls & Transcripts
Every morning 7:22am
Background job wakes up Fires daily at 7:22am, checks for overnight recordings
Your recording service Otter AI, Beside AI, Fireflies — whichever you configured. It syncs transcripts to the Google Drive folders you listed in firm_context.yaml.
Pass 1 — Write the memo Reads the transcript. Writes a 6-section investor memo: core argument, what was agreed, what's unresolved, what you need to form a view, names & firms, open questions.
Pass 2 — Extract the actions Reads the memo. Pulls out every commitment: who said they'd do what, by when, who owes whom. Figures out if it was you or the other person who committed.
Memo → Pipeline Doc Full 6-section memo appended
Action items → Follow-ups Doc New rows added for each commitment
New contacts → People Doc Anyone named in the call who isn't already tracked
Follow-ups tile Your actions + what the other party owes you
Pipeline tile Deal and LP intelligence updated
People tile New contacts appear
Podcasts & Blogs
Daily
Background job wakes up Checks your personal RSS feeds daily
Your RSS feeds Podcasts and blogs listed in your firm_context.yaml personal section. Each person on the team has their own list.
Transcribe the audio New episodes sent to AssemblyAI — returns a full transcript in ~5 min per episode
Write the memo Same 6-section format as calls: core argument, consensus, tension, open questions, names & firms
Memo → your podcast Doc Each show has its own Google Doc. New entries prepend to the top.
Briefing email at 7:30am Sent to your email: new episodes, upcoming calls today, market items from your feeds
GitHub is never involved in daily operations. Every step above runs entirely on your machine, reading from your accounts, writing to your Google Drive. The code is already downloaded — it just runs on schedule. GitHub only comes back into the picture when someone improves the code and pushes an update.
③ When the Code Gets Better — how improvements reach everyone
Who does it
What they do
Goes to GitHub
Other people
Their data
Result
Anyone finds a bug or improvement
Fix it on their machine Edit a .py file. Test it. Works.
Push to public repo (or open a PR) If it's a universal improvement — helps any firm — it goes to GitHub. If it's personal, it stays local.
git pull in ~/cos-pipeline/ They get the fix. Takes 5 seconds.
Not touched Their Google Docs, their action items, their data — completely unaffected by code updates
Pipeline runs better Next automated run uses the improved code
Someone on your team updates the firm config
Edit firm_context.yaml e.g. new hire added, new peer firm to watch, updated draft voice rule
Push to private config repo github.com/yourfirm/your-config (private, team only)
git pull in ~/cos-pipeline-config/ Next pipeline run picks up the new config automatically — no restart needed
Not touched
Prompts reflect new reality e.g. new hire's name now appears in action item attribution
④ Personal vs. Shared — what lives where
🔒 Personal — on your machine only, never seen by others
📄
firm_context.yaml
Your name, your firm, your team, your recording folders, your podcast feeds, your briefing preferences
🔑
API keys & passwords
Anthropic key, dashboard password — stored in Mac Keychain, never in files
🎫
Google & Outlook login tokens
~/credentials/*.pickle — generated when you sign in, stays on your machine
📋
What's already been processed
~/credentials/processed_*.json — tracks emails and transcripts already seen so nothing runs twice
Your schedule
LaunchAgent .plist files — your timings, your machine's paths, auto-generated
🤝 Shared — visible to some or all
💻
All pipeline code (Public GitHub)
Every .py file, template, and doc. Any firm can use it. Improvements push here and everyone benefits.
🔒
Firm config repo (Private GitHub)
firm_context.yaml base + firm_config.json. Shared within your firm only. Firm 1 can't see Firm 2's config.
📁
3 Google Docs (your firm's dataroom)
Follow-ups · Pipeline · People. Every team member's pipeline writes here. Everyone reads the same data.
📊
The dashboard
Runs on one machine. Everyone at the firm accesses the same URL. Same data, same tiles.
⑤ Brand New Firm (Team 2) — from nothing to live dashboard
1
Clone the public codegit clone github.com/ygontownik/Invite-Chief-of-Staff ~/cos-pipeline
Takes 30 seconds. Gets all the pipeline scripts onto your machine.
2
Run the installer./setup.sh
Guided, step by step. Asks for your name, firm, team, investment focus, recording service folder IDs, podcast feeds. Stores your Anthropic API key securely in your Mac's Keychain.
3
Create your dataroompython3 setup.py --create-docs
Opens Google sign-in in your browser. After you approve, creates 3 blank Google Docs (Follow-ups, Pipeline, People) and writes their IDs into your config automatically. These are now your firm's shared dataroom.
4
Configure your recording service — point Otter/Beside/Fireflies at the folder IDs in your firm_context.yaml
Each service has a Drive sync setting. Paste in the folder ID. From now on, recordings land there automatically.
5
Populate from historypython3 setup.py --onboard --days 30
Processes your last 30 days of email, any transcripts already in your Drive folders, and recent podcast episodes. Runs for 10–40 minutes depending on volume. When it finishes, your dashboard has real data.
6
Start the background jobs./setup_launchagents.sh
Installs the automated schedules on your Mac. From this point, email is checked every 2 hours and calls are processed every morning at 7:22am — with no manual action.
7
Open the dashboardhttp://localhost:7777
Your data. Your firm. Completely separate from Firm 1. They can't see yours, you can't see theirs.
Second person at Firm 2: They clone the same public code repo, clone your private config repo (you invite them as a collaborator), set the COS_CONFIG_DIR environment variable, and run ./setup.sh for their OAuth. They access the dashboard at the same URL as you. Their emails and transcripts feed into the same shared Google Docs. Done.
⑥ Your Google Drive Folder Structure — pre-built on setup, fills in over time
📁 Created automatically by setup.py --create-docs
📁
Chief of Staff /
Root folder for all pipeline-managed documents
📄
Follow-ups
Action items from every email and call, tagged by owner and deal
📄
Deal Pipeline
Investor memos and deal intelligence prose, one section per call
📄
People / CRM
Everyone named in calls and emails — one entry per contact
📄
Daily Briefing Log
Append-only log of every morning briefing sent
📁
Transcripts /
Where recording services sync their output
📁
Otter AI / Root · Deal Calls · Other
Zapier drops new Otter recordings here automatically
📁
Beside AI / (or Fireflies, Grain, etc.)
Each recording service gets its own subfolder — configure Drive sync to drop here
📁
Desktop Recordings /
Output from call_recorder.py — auto-deposited after each recorded call
📁
Podcast Summaries /
One Google Doc per show + one aggregate summary Doc
📁 You create and manage — pipeline reads but doesn't write here
📁
Deals /
One subfolder per deal — your manually curated deal room
📁
[Deal Name] /
e.g. "Project Venus", "Eastlink", "Argo Solar"
📄
Company Presentation.pdf
Pitch deck or teaser from the company
📄
IC Memo.docx
Your investment committee writeup
📄
Diligence /
Financial models, legal docs, third-party reports
📄
Correspondence /
Key email threads, term sheets, NDAs
📁
Fund /
LP-facing materials
📄
Firm Pitch Deck.pdf
Current version of your firm presentation
📄
LP One-Pager.pdf
📁
LP Subscription Docs /
Subscription agreements, side letters, closing docs
⚠️
PDFs are not yet processed by the pipeline
The pipeline reads text files and Google Docs. PDF extraction (pitch decks, IC memos, diligence reports) is a planned feature — not live yet. For now, key content from PDFs surfaces through call transcripts where those docs are discussed.
Team 2 starting point: The left column is created automatically when you run python3 setup.py --create-docs — blank but organized. The right column (Deals/, Fund/) you create yourself as deals come in. The pipeline never writes to your Deals/ folder; it only reads transcripts and emails that reference those deals and routes that intelligence to the Deal Pipeline Doc.
⑦ Dashboard Update Speed — where it's fast and where lag exists
Source
Lag
Why
How to get it faster
Desktop recorder
call_recorder.py
~2–3 min ✓
When your call ends, call_recorder.py immediately fires the transcript hook. One AI pass (Sonnet), writes to Drive, pings the dashboard to refresh. You see action items before you've closed your laptop.
This is already the fastest path. No changes needed.
Email
Gmail / Outlook
Up to 2 hrs ⚠
The email pipeline fires every 2 hours. An email that arrives at 10:03am won't be processed until 12:05pm.
Force an immediate run: python3 cos_gmail_mini_v2.py in terminal. Or trigger via the dashboard admin panel.
Otter AI
Apps Script watcher + Drive sync
~15 min ✓
A Google Apps Script polls the Otter Drive folders every 1 minute and fires a webhook the moment a new transcript appears, which runs cos_otter_backfill.py immediately. Typical end-to-end: 10–30 min after the call ends, dominated by Otter's own transcription time (not the pipeline). A daily 7:22am backfill catches anything the webhook missed.
Use the desktop recorder if you need action items during the call. Otherwise no action — the watcher handles it.
Beside AI / Fireflies / other
Apps Script router
~15 min ✓
A second Google Apps Script (beside-action-router) checks the shared Beside Notes Doc every 15 minutes and routes new action items to your Follow-ups and Pipeline Docs. Daily backfill is the safety net.
Same as Otter: desktop recorder for in-call speed; otherwise no action.
Podcasts & blogs
RSS feeds
Up to 24 hrs ✗
Episodes are checked once daily. A podcast published at noon won't be transcribed and memos won't appear until the next morning run.
Force for one episode: python3 podcast_transcribe.py --url <episode-url>
Dashboard cache
after any write
~10 sec ✓
After any pipeline write, a warmup request is sent to the dashboard server. It re-reads Drive Docs and rebuilds the cache. Browser refresh then shows the new data.
Force anytime: curl -X POST http://localhost:7777/warmup
Practical rule: Use the desktop recorder (call_recorder.py) for any call where you want action items immediately. Use Otter/Beside as a backup and for their transcription quality — you'll get the data next morning. The desktop recorder path is the only one with near-real-time dashboard updates.
⑧ Podcasts & Blogs → Deal Pipeline — how market intelligence tags to deals
Step
What triggers
Pass 1 — The memo
Pass 2 — Deal scan
Where it lands
What you see
Episode published
Daily RSS check New episode detected on your personal feed list
AssemblyAI transcribes ~5 min per episode. Returns full text.
Claude writes 6-section memo Core argument, consensus, tension, open questions, what you'd need to form a view, names & firms mentioned
Claude scans memo against your deal pipeline Checks: does this episode mention any company, sector, or fund you're tracking? Looks at your current deal targets, LP relationships, peer firms.
Tags relevant intelligence e.g. "Guest mentioned Stonepeak's MISO strategy — relevant to Project Venus" → tagged as deal_takeaway on that deal
Full memo → Podcast Doc Complete episode memo lives here, accessible from briefing link
Tagged intel → Dashboard data Deal-specific takeaways written as structured data, not prose
Deal tile — Market Intelligence tab "Catalyst ep. 47 — guest noted data center load growth driving GenCo valuations in ERCOT" — linked to your active deal in that sector
Morning briefing New episodes summarized + any deal-relevant flags called out
Pass 2 is now live for podcasts (just built — updated in this session). Previously, podcast memos were written to Drive but had no connection to your deal pipeline on the dashboard. Now, after every episode memo, Claude checks your tracked deals and routes any relevant intelligence as a tagged deal_takeaway that appears on that deal's dashboard tile. A podcast about MISO interconnection queues showing up on your Project Venus tile — automatically, no manual tagging.
⑨ Subscription Services Used — what you're paying for and why
Service
Used for
Cost model
What you'd notice if it stopped
Can you use without it?
Anthropic (Claude)
api.anthropic.com
All AI analysis — email triage, call memos, action extraction, podcast summaries
Pay per use. ~$2–15/day depending on volume.
Everything stops. No memos, no action items, no triage. Dashboard goes stale.
No — this is the core engine.
Google Workspace
Gmail · Drive · Docs · Calendar
Inbox source, shared Docs dataroom, transcript storage, calendar for scheduling context
Subscription or free Gmail. Most users already have this.
Email pipeline breaks. Shared Docs unavailable. Dashboard can't fetch data.
Can switch email to Outlook. Drive/Docs currently required for shared dataroom.
AssemblyAI
assemblyai.com
Transcribing podcast audio — converts MP3 to text before Claude reads it
Pay per use. ~$0.009/min audio. A 1-hour episode ≈ $0.54.
Podcast transcription stops. Call memos continue — those come from recording services, not AssemblyAI.
Yes — skip Package A (podcasts). Set "packages": ["operations"] in firm_config.json.
Otter AI
otter.ai
Recording and transcribing your calls (Zoom, Teams, Meet)
Subscription. ~$10–20/mo per user.
No new Otter transcripts. Desktop recorder continues working. Switch to Beside/Fireflies.
Yes — use any recording service that syncs to Drive, or the built-in desktop recorder.
Beside AI / Fireflies / Grain
Alternative recorders
Same role as Otter — record and transcribe calls
Varies. Most have free tiers with limits.
That service's transcripts stop. Others continue if configured.
Yes — only need one recording service. Or none (desktop recorder only).
GitHub
github.com
Hosting the pipeline code (public) and your firm config (private)
Free public repos. Private repos: free tier (3 collaborators) or $4/mo.
Can't pull code updates. Config repo inaccessible to team. Pipeline keeps running from local copy.
Yes for daily ops — code is already on your machine. Needed for team config sync.
Tailscale
tailscale.com
Remote dashboard access — lets team members reach the dashboard from outside your office
Free up to 3 users. $6/user/mo after that.
Team members can only access dashboard on local network. You're unaffected.
Yes — if everyone is always on the same network. Or use any VPN.
Zapier
zapier.com
Automating Otter → Google Drive deposit after each recording
Free tier: 5 zaps, 100 tasks/mo. Paid: $20/mo for more.
Otter transcripts stop syncing to Drive automatically. Manual export workaround exists.
Yes — Beside/Fireflies have native Drive sync (no Zapier needed). Or export manually.
The only non-negotiable paid service is Anthropic. Everything else has a free alternative or can be skipped. Otter + Zapier can be replaced by Beside AI (native Drive sync, no Zapier). AssemblyAI can be skipped by disabling podcasts. Tailscale can be replaced by any VPN or skipped if your team is co-located.
⑩ Where Does Everything Live? — Three buckets, nothing overlaps
🌐 Public GitHub
Universal code — Python scripts, templates, docs. No firm identity. Anyone can use it.
github.com/ygontownik/Invite-Chief-of-Staff
🔒 Private team repo
Firm identity — your YAML configs. Shared only with your team. Firm 2 has its own separate private repo.
e.g. github.com/yourfirm/tcip-config (private)
💻 Machine only
OAuth tokens, processed file trackers, personal podcast feeds. Never committed anywhere.
~/credentials/  ·  firm_context.yaml (personal block)
File / Data
Lives where?
What it means for your team
What it means for Firm 2
firm_context.yaml
Identity, team, investment focus
🔒 Team repo
Push here so your team always has your latest team config — new hire, new peer firm, updated draft voice — on their next git pull. No Slack message needed.
They have their own private team repo with their own YAML. Your config gives them zero information about your firm, deals, or people.
firm_config.json
Google Doc IDs, email keywords
🔒 Team repo
The Doc IDs for your shared Follow-ups, Pipeline, and People docs live here. All team members need the same IDs so they write to the same docs.
They create their own 3 Google Docs via --create-docs. Their Doc IDs are physically different. Cross-write is impossible.
drive-docs.yaml
Drive folder IDs, doc registry
🔒 Team repo
Maps pipeline keys to Drive folder IDs — Otter AI root, transcript folders, etc. Shared so all team members' pipelines scan the same folders.
Their own Drive folders on their own Google accounts. No shared folders, no shared Drive space between firms.
OAuth tokens
~/credentials/*.pickle
💻 Machine
Each person authenticates with their own Google account. Tokens expire and refresh automatically. Losing them = 5-min re-auth.
Their tokens grant access only to their accounts. No crossover possible by design.
Processed file trackers
processed_emails.json, processed_cos_transcripts.json
💻 Machine
Each machine tracks what it's processed locally. Email duplicates now caught at the doc level: before writing to any shared doc, the pipeline checks whether the email subject is already there — so even if two pipelines both process the same CC'd thread, only one entry lands in the doc.
They have their own trackers. Same doc-level dedup applies — their writes to your shared docs (which they can't access) and your writes to theirs (same) can't happen anyway. Firm 2's docs are entirely separate.
Personal block of firm_context.yaml
Your podcasts, briefing preferences
💻 Machine
Your podcast feeds and briefing time stay on your machine. Don't push the personal block to the team repo — it would overwrite your teammates' feeds with yours.
Each person at Firm 2 sets their own feeds. No shared briefing content between firms.
Code improvements
Bug fixes, new features in .py files
🌐 Public
Bug fixes you push flow to your team on their next git pull. Fixes you don't push stay on your machine only.
They get it on push. Universal improvements (per CONTRIBUTING.md) go public so all firms benefit. Firm-specific tweaks stay in their private team repo.
Your Google Drive data
Docs, transcripts, deal files
💻 Drive only
Lives in Google Drive, accessible only to accounts you've explicitly authorized. Not on GitHub in any form.
Firm 2's Drive is on different Google accounts. No shared database, no shared folder, no shared server between firms.
✅ Email duplicate gap — fixed. The pipeline now caches each shared doc's content at run start and checks whether an email's subject is already present before writing. If two team members are both CC'd on a deal thread and both pipelines run, only the first write lands — the second pipeline finds the subject in the doc and skips. Transcripts were already protected by the PROCESSED: header written server-side in each Drive doc.