From Email Chaos to AI-Powered Automation
The story of how I turned a 4-hour daily copy-paste nightmare into a system that processes emails while everyone sleeps. Complete with plot twists, debugging at 2 AM, and a healthy respect for the chaos of real-world data.
From Email Chaos to AI-Powered Automation
The story of how I turned a 4-hour daily copy-paste nightmare into a system that processes emails while everyone sleeps. Complete with plot twists, debugging at 2 AM, and a healthy respect for the chaos of real-world data.
Chapter 01 - The Origin Story
Death by a Thousand Copy-Pastes
It started with a simple observation: Sarah, the operations manager at a pediatric therapy staffing agency, was spending 4-6 hours every single day copying and pasting data from emails into spreadsheets. Let that sink in. Half a workday. Every day. Forever.
Her inbox was drowning in referral emails from Service Coordinators across New York City, each containing case information for children needing early intervention services. Speech therapy. Occupational therapy. Physical therapy. Specialized instruction. Real kids, real families, real urgency.
The referrals looked something like this:
Subject: Fw: Staffing Referral - 5 Cases
Hi,
Looking for therapists for the following cases:
1) Child's Name: Maya Rodriguez
DOB: 3/15/2022
EI #: 1234567
Address: 456 Oak Street, Queens, NY 11101
Parent: Carmen Rodriguez (718) 555-0123
Services: ST 2x30, OT 1x45
2) Child's Name: Ethan Cooper
...
Thanks,
Jessica Chen
Service Coordinator
Bright Futures Services
(646) 555-4321
Each email could contain anywhere from 1 to 15 cases. Each case needed to be entered into a Google Sheet with 26 columns. Each mistake could mean a child not receiving therapy on time.
The ask was simple: "Can you make this faster?"
The reality was anything but simple.
The Problems
-
4-6 Hours Daily - Staff spent their mornings doing nothing but copying and pasting data from emails. By the time they finished, more emails had arrived.
-
Inconsistent Formats - Every Service Coordinator formatted emails differently. Tables, bullet points, paragraphs, forwarded threads three levels deep. No standards, no structure.
-
Human Error - Typos, missed fields, wrong frequencies. A "2x30" became "230" or got lost entirely. And when you're tired from 3 hours of data entry, mistakes happen.
-
No Audit Trail - If something went wrong, there was no way to trace what happened. Which email did this case come from? Who entered it? When?
I spend my mornings copying and pasting. By the time I finish, there are already new ones waiting.
Chapter 02 - The Domain
What Even Is Early Intervention?
Before I could build anything, I needed to understand what I was building it for. Early Intervention (EI) is a federally mandated program that provides therapeutic services to children ages 0-3 with developmental delays. In New York City, the system involves:
- Evaluating agencies that assess children for developmental delays
- Service Coordinators (SCs) who manage cases and connect families with providers
- Staffing agencies (like Big Minds Tiny Hands) that employ the actual therapists
- Providers who deliver services: Speech Therapy (ST), Occupational Therapy (OT), Physical Therapy (PT), Special Instruction (SI), and Applied Behavior Analysis (ABA)
When a Service Coordinator needs a therapist for a child, they email staffing agencies with the case details. The staffing agency then matches available therapists to cases based on location, availability, and service type.
The Data Challenge
What makes this problem deceptively hard:
- No standard format - Every SC formats their emails differently
- Multi-case emails - One email can contain 1-15 children
- Multi-service cases - One child might need ST, OT, AND PT (three separate rows)
- Forwarded emails - The office manager forwards emails, adding another layer of headers
- Embedded tables - Some SCs paste from Excel, creating HTML table nightmares
- Missing information - Not all fields are always present
The Constraint
The client didn't want to change their workflow. They wanted to keep using their Google Sheet. Whatever I built had to fit into their existing process, not replace it. This turned out to be crucial.
Chapter 03 - The Prototype
Version 1: The Regex Disaster
My first instinct was to build something quick in Python. Parse the emails with regex, dump to CSV, done. I gave myself a weekend. How hard could it be?
# How I thought it would work
case_pattern = r"Child's Name:\s*(.+)\nDOB:\s*(.+)\nEI #:\s*(\d+)"
matches = re.findall(case_pattern, email_body)Reality check: After 30 minutes of testing, I had covered exactly 3 of the 47 different email formats in the inbox.
Some emails wrote "Child Name:" instead of "Child's Name:". Some used "EI Number:" instead of "EI #:". Some didn't have labels at all - just names and numbers in a semi-structured paragraph. Some were HTML tables that my regex couldn't even begin to parse.
The emails were too varied. Regex wasn't going to cut it. I needed something that could understand context, not just patterns.
Lesson Learned
Don't underestimate real-world data variability. Test data is clean. Production data is chaos wearing a trench coat.
Chapter 04 - Enter AI
Version 2: The AI Gambit
If regex couldn't handle the variability, maybe AI could. I integrated Google's Gemini AI with a carefully crafted prompt:
Extract all case referrals from this email. Return JSON with:
- child_name
- case_id (EI number, 6-7 digits)
- dob
- address, borough, zip
- service_type (ST, OT, PT, SI, ABA)
- session_minutes, weekly_sessions
- parent_name, contact_phone
- sc_name, agency, sc_contact
First test: The AI extracted 8 cases perfectly. I was ecstatic. I texted my friends. I considered this problem solved.
Second test: The AI hallucinated a phone number that didn't exist in the email. Less ecstatic.
Third test: The AI returned the service coordinator's name as "Therapists 2 Go, PLLC" (that's an agency name, not a person). Not ecstatic at all.
The AI Extraction Journey
Getting AI extraction right was an iterative process spanning weeks. Here's what I discovered:
Problem 1: Hallucinated Data
AI would invent plausible-looking phone numbers. Not maliciously - it was just trying to be helpful. But a fake phone number for a family that needs therapy services? Not helpful.
Solution: Cross-reference all extracted data against the original email text. If it's not in the email, it doesn't exist.
Problem 2: Service Coordinator Confusion
The AI couldn't distinguish SC info from other information in the email. It would grab random names from the body and call them the "service coordinator."
Solution: Restructured the prompt to explicitly separate case data from signature extraction. The SC info comes from the email signature, not the body.
Problem 3: Multi-Service Cases
"ST 2x30, OT 1x45" should create TWO rows in the spreadsheet, one for each service. The AI would sometimes return it as one case with both services mashed together.
Solution: Added explicit instructions to return a services array, then split into separate case objects during post-processing.
Problem 4: Forwarded Emails
When Sarah forwards an email, the AI would extract HER signature, not the original SC's. The forwarder's info would end up in the spreadsheet instead of the actual service coordinator.
Solution: Added forwarded email detection and explicit instructions to find the ORIGINAL sender's signature, ignoring the forwarder's.
The final prompt structure evolved into a two-part extraction:
{
"cases": [
{
"child_name": "Maya Rodriguez",
"services": [
{ "type": "ST", "minutes": 30, "sessions": 2 },
{ "type": "OT", "minutes": 45, "sessions": 1 }
]
// ... other fields
}
],
"sc_info": {
// Extracted from email signature, shared across ALL cases
"sc_name": "Jessica Chen",
"sc_phone": "646-555-4321",
"sc_agency": "Bright Futures Services"
}
}Chapter 05 - The Architecture
Three Pivots to Get It Right
Building the right architecture wasn't a straight line. It took three major pivots before I found something that actually worked reliably.
Pivot 1: The Excel Disaster
The client's original workflow used SharePoint Excel files. My first approach: sync directly to Excel via Microsoft Graph API. What could go wrong?
Everything.
- Excel files lock when someone has them open
- The API would randomly fail with "file in use" errors
- Concurrent edits caused merge conflicts that corrupted data
- SharePoint's sync delays meant changes took minutes to appear
I spent days fighting Microsoft's ecosystem before accepting defeat. The pivot: Google Sheets. Real-time collaborative editing, no file locks, instant updates, and an API that actually works. The client switched their workflow in a day and never looked back.
Pivot 2: The Sync Problem
My initial architecture looked clean on paper:
Version 2.0 Architecture: Outlook → AI Extraction → PostgreSQL → Google Sheets
The problem: that last arrow. The sync from database to Google Sheets was manual. Click a button, cases get added. But what if you click twice? Duplicates. What if you forget to click? Data isn't in the sheet.
I added an exported_to_sheet flag to track which cases had been exported. Now the export only pushed new cases.
New problem: The client was editing the Google Sheet directly (adding notes, marking cases as filled). If I ever needed to re-sync, I'd overwrite their edits.
Even newer problem: Sometimes the export would fail halfway through, leaving the database thinking cases were exported when they weren't.
This was getting messy.
Pivot 3: The "Append Only" Revelation
After several debugging sessions at 2 AM, I had an epiphany: why sync at all?
The Google Sheet is the source of truth for the client. They edit it, add formulas, color cells. I shouldn't be replacing data - I should only be adding new cases.
Final Architecture (v2.5): Outlook → AI Extraction → Google Sheets → PostgreSQL
Key insight: Append new cases directly to the sheet, immediately after extraction. Keep the database purely for tracking what's been processed (to prevent duplicates).
The Benefits
Sheet is always up to date immediately. No sync lag. No failed sync states. Client can edit freely - we never touch existing rows. Simpler code, fewer bugs. This was the turning point that made the system reliable.
Chapter 06 - The Garbage Data Saga
When AI Goes Rogue
With AI extraction working and the append-only architecture in place, I thought I was done. Then I looked at the actual data in production. Row 27 made me spit out my coffee.
Row 27: Agency = "childs area"
Row 32: Agency = "Child name Rodriguez, Maya DOB"
Row 74: Agency = "Referral for in Person Services"
What? That's not an agency. "Child name Rodriguez, Maya DOB" is case data. "Referral for in Person Services" is the email subject line. How did this end up in the agency column?
The Investigation
I spent hours tracing through logs. The AI was doing something unexpected: when it couldn't find a signature block with agency information, it would grab anything that looked vaguely agency-like from the email body. Subject lines, case data, random phrases.
The AI was trying to be helpful. It saw an empty agency field and thought "I should fill this in!" But its guesses were catastrophically wrong.
The Garbage Filter
The fix required building a garbage filter - a set of regex patterns to catch and reject invalid values:
const agencyGarbage = [
// Email subject snippets
/referral\s+for/i,
/staffing\s+referral/i,
/cases?\s+with/i,
// Case data that leaked into agency field
/child\s*'?s?\s*name/i,
/\bDOB\b/i,
/parent\s*'?s?\s*name/i,
// Contains EI numbers or phone numbers
/\d{5,7}/, // EI number pattern
/\d{10}/, // Phone number pattern
// Forwarder's agency (should find original sender)
/big\s*minds/i,
/tiny\s*hands/i
];Similar filters for SC names:
const scNameGarbage = [
/^therapists?$/i,
/^coordinator$/i,
/^services?$/i,
/\s+to$/i, // Catches truncated names like "Roots to"
];Defense in Depth
But here's the thing about production systems: bugs find a way. The AI filter was catching 95% of garbage, but edge cases kept slipping through.
I implemented a two-layer defense:
Layer 1: ai.ts filters during extraction
Layer 2: route.ts sanitizes data right before writing to the sheet
Layer 1 filters during extraction. Layer 2 is a safety net that sanitizes data right before writing to the sheet. Even if the AI returns garbage and Layer 1 somehow misses it, Layer 2 catches it before it pollutes production data.
The Lesson
A single validation layer will always have edge cases. Multiple layers catch what slips through. It's worth the extra code. Always.
Chapter 07 - Duplicates
The Duplicate Detection Challenge
Another fun problem: email threads.
When a Service Coordinator sends a referral, they might:
- Send the original email with 5 cases
- Reply to add 2 more cases
- Forward the whole thread when updating case details
The system would see the forwarded email as "new" and try to extract all 7 cases again - but 5 of them already exist! Without proper handling, we'd have duplicates everywhere.
Two-Tier Deduplication
I implemented a two-tier duplicate check:
Primary: case_id + service_type
// "EI #1039573 + ST" = unique identifier
if (existingCase.case_id === newCase.case_id &&
existingCase.service_type === newCase.service_type) {
return { isDuplicate: true };
}Fallback: child_name + service_type + borough
Why a fallback? Some referrals don't include EI numbers. A child named "Noah Williams" getting OT services in Brooklyn should only have one row, even if the email doesn't include an EI number.
// For cases without EI numbers
if (existingCase.child_name === newCase.child_name &&
existingCase.service_type === newCase.service_type &&
existingCase.borough === newCase.borough) {
return { isDuplicate: true };
}Chapter 08 - Debugging
Logging: The Unsung Hero
One of the most valuable investments was building comprehensive logging. When something goes wrong at 11 PM and you need to figure out why, logs are everything.
The system now outputs structured logs with clear prefixes:
[PROCESS] ================ Starting extraction run: abc-123 ================
[PROCESS] Found 12 unread emails
[EMAIL #1] Subject: Fw: Staffing Referral - 5 Cases
[EMAIL #1] From: sarah@staffingagency.com
[EMAIL #1] SKIPPING - Already processed (extracted 5 cases)
[EMAIL #2] Subject: New OT Referral
[EMAIL #2] From: jessica@brightfutures.com
[AI #2] Extracting cases...
[AI #2] Extracted 3 cases
[CASE] Maya Rodriguez | ST | case_id=1234567
[CASE] SC: Jessica Chen | Agency: Bright Futures Services | Phone: 646-555-4321
[CASE] NEW - Appending to sheet...
[SHEET] Appending: Maya Rodriguez | ST | SC: Jessica Chen
[SHEET] SUCCESS
[PROCESS] ================ EXTRACTION COMPLETE ================
[PROCESS] Duration: 15.2s
[PROCESS] Cases extracted: 12
[PROCESS] Cases added to sheet: 10
[PROCESS] Duplicates skipped: 2
With these logs, I can trace exactly what happened to every email, every case, every field. When Sarah says "this case looks wrong," I can find the exact email, see what the AI extracted, and understand where things went sideways.
Pro Tip
The time invested in structured logging pays for itself the first time you're debugging a production issue at midnight. Don't skimp on logs.
Chapter 09 - The Stack
Technical Decisions & Why
| Component | Technology | Why |
|---|---|---|
| Backend | Express.js | Simple HTTP server, minimal overhead, perfect for API-only service |
| AI | Gemini 2.5 Flash | Fast, cheap, good at structured extraction |
| Microsoft Graph API | Native Outlook integration | |
| Sheets | Google Sheets API | Client's existing workflow |
| Hosting | Render.com | Free tier, 24/7 uptime, internal cron support |
| Cron | node-cron (internal) + cron-job.org (backup) | Dual-layer reliability - internal runs every 5min, external as safety net |
The Extraction Pipeline
- Internal cron job (node-cron) triggers /api/process every 5 minutes
- Microsoft Graph API fetches unread emails
- For each email:
a. Check if already processed (skip if so)
b. Run relevance filter (is this a referral?)
c. Call Gemini AI with structured prompt
d. Parse JSON response (with retry on malformed JSON)
e. Clean case IDs (strip prefixes like "BW-1047126" → "1047126")
f. Extract initials from case ID if child name missing
g. Apply garbage filters to SC info
h. Read existing sheet data for duplicate detection
i. For each case:
- Check duplicates in sheet (case_id+service_type OR child_name+service_type+borough)
- Skip if duplicate found
- Sanitize data (safety net filter)
- Append to Google Sheet
- Insert into database for tracking j. Mark email as read in Outlook (fire-and-forget)
- Log summary statistics
Chapter 10 - The Results
The Numbers Don't Lie
- 4h - Daily Time Saved
- 0 - Daily Clicks Required
- ~98% - Extraction Accuracy
| Metric | Before | After |
|---|---|---|
| Time per email | 15-20 minutes | 3-5 seconds |
| Daily processing | 4-6 hours (manual) | ~8 minutes (automated) |
| Data accuracy | ~92% (human error) | ~98% (AI + validation) |
| Missed cases | Common | Zero (auditable) |
| Audit trail | None | Complete |
Extraction Quality (Current Production)
- Child name completeness: 100%
- EI number completeness: 94%
- SC name completeness: 99%
- SC phone completeness: 70-80% (depends on email signatures)
- Agency completeness: 80-85%
The missing SC data isn't a system failure - it's a data availability issue. Not all Service Coordinators include complete contact information in their email signatures. You can't extract what doesn't exist.
The Real Win
It's not about the hours saved. It's about removing a soul-crushing task that made people dread Monday mornings. Nobody got into pediatric therapy staffing because they love copying data from emails.
Chapter 11 - Lessons Learned
What I'd Tell Past Me
-
Real-World Data is Messy - No matter how clean your test data looks, production will surprise you. Build systems that degrade gracefully and log everything. The chaos is coming - be ready.
-
Defense in Depth Works - A single validation layer will always have edge cases. Multiple layers catch what slips through. It's worth the extra code. Every. Single. Time.
-
Append-Only is Powerful - Fighting sync conflicts is a losing battle. Design for append-only when possible - it eliminates entire categories of bugs. Let the user own their data; you just add to it.
-
Logs Are Worth Their Weight in Gold - The time invested in structured logging pays for itself the first time you're debugging a production issue at midnight. Future you will thank present you.
-
AI is a Tool, Not a Solution - Gemini can extract data, but it needs guidance, validation, and error handling. The "AI" part is maybe 20% of the system - the other 80% is traditional engineering. Don't let the AI hype fool you.
-
Iterate with Real Users - Every major improvement came from Melina reporting an issue: "This agency name looks weird", "I'm seeing duplicates", "Why is this case missing?". Real usage reveals real bugs. Ship early, listen hard.
-
Sometimes Less is More - The frontend was beautiful, but the client didn't need it. They just needed the data in their spreadsheet. Removing 13,000 lines of UI code made the system simpler, faster, and cheaper to run. Know when to delete.
-
Redundancy is Reliability - One cron job is a single point of failure. Two cron jobs (internal + external backup) means the system keeps running even if one fails. The extra 5 minutes to set up the backup saved hours of debugging later.
-
The Right Tool for the Job - Next.js is great for full-stack React apps. But for a backend-only API with a cron job? Express is the right tool. Don't use a framework just because it's popular - use what fits the problem. The migration from Next.js to Express cut memory usage in half and eliminated all the OOM errors. Sometimes the best solution is the simplest one.
The best code is the code you don't have to write. The second best is code that handles real-world chaos gracefully.
Chapter 12 - The Simplification
When I Realized the Frontend Was Overkill
Six months into production, I had a realization: nobody was using the dashboard. The client opened their Google Sheet, saw the data, and that was it. The beautiful Next.js UI with charts, filters, and analytics? Unused.
Meanwhile, Vercel was sending me usage alerts. The free tier was generous, but we were pushing limits. Every page load, every API call, every static asset - it all counted. And for what? A dashboard nobody looked at.
The Wake-Up Call
"Why am I paying for a React app when the client just wants data in a spreadsheet?"
The Backend-Only Refactor
I created a new branch and started deleting. Dashboard? Gone. Cases page? Gone. Analytics? Gone. Settings UI? Gone. All the Radix UI components, Framer Motion animations, Tailwind styles - 13,000 lines of code, deleted.
What remained:
- The email processing API endpoint
- The AI extraction logic
- The Google Sheets integration
- The duplicate detection
- A simple custom server with an internal cron job
The result? A system that did exactly what it needed to do, nothing more. Smaller bundle. Faster startup. Lower costs. Better.
Internal Cron: No External Dependencies
Instead of relying on cron-job.org to ping the endpoint, I added node-cron directly to the server. Now the cron job runs inside the application:
import cron from 'node-cron';
// Runs every 5 minutes
cron.schedule('*/5 * * * *', async () => {
await fetch('http://localhost:3000/api/process', {
method: 'POST',
headers: { 'Authorization': 'Bearer ...' }
});
});Simple. Self-contained. No external service needed. But I kept cron-job.org as a backup - because redundancy is reliability.
The Lesson
Build what's needed, not what's impressive. Sometimes the best feature is a feature you remove.
Chapter 13 - The Duplicate Problem
When the Same Email Ran Twice
The client called: "I'm seeing the same cases twice in the sheet. What's happening?"
I checked the logs. The same email had been processed twice - once at 9:15 AM, again at 9:20 AM. The system was working perfectly, but it had no memory of what it had already written to the sheet.
The database tracked processed emails, but if an email was marked unread again (maybe the client accidentally marked it unread, or Outlook synced weird), the system would process it again. And since we were writing directly to the sheet, duplicates appeared.
Sheet-Based Duplicate Detection
The solution: read the sheet before writing. Before appending new cases, the system now:
- Reads existing rows from the sheet (case_id, child_name, service_type, borough)
- Checks each new case against existing rows
- Uses two-tier matching:
- Primary: case_id + service_type (most reliable)
- Fallback: child_name + service_type + borough (for cases without EI numbers)
- Skips duplicates, only appends new cases
// Read existing sheet data
const existingRows = await readExistingSheetData();
// Filter duplicates
const uniqueCases = cases.filter(case => {
// Primary: case_id + service_type
if (case.case_id && case.service_type) {
return !existingRows.some(row =>
row.case_id === case.case_id &&
row.service_type === case.service_type
);
}
// Fallback: child_name + service_type + borough
return !existingRows.some(row =>
row.child_name === case.child_name &&
row.service_type === case.service_type &&
row.borough === case.borough
);
});Now when the same email runs twice, the system sees the cases already exist and skips them. Problem solved.
The Insight
Idempotency isn't just for APIs - it's for data pipelines too. The same input should never produce duplicate output, no matter how many times it runs.
Chapter 14 - The Case ID Mystery
When "BW-1047126" Wasn't a Case ID
Another client report: "The case IDs look wrong. I'm seeing 'BW-1047126' but it should just be '1047126'."
I looked at the data. Some case IDs had prefixes: "BW-1047126", "JM-900152", "RT-948655". The AI was extracting them exactly as written in the emails, but the client's system expected just the numbers.
More interesting: when the case ID was "BW-1047126", the child name was often missing. But "BW" looked like initials. Could the prefix be the child's initials?
The Clean Case ID Function
I added a function to strip prefixes and extract initials:
function cleanCaseId(caseId) {
// Match patterns like "BW-1047126" or "JM900152"
const match = caseId.match(/^([A-Z]{2,3})[-]?(\d{6,7})$/);
if (match) {
return {
cleanedCaseId: match[2], // "1047126"
extractedInitials: match[1] // "BW"
};
}
return { cleanedCaseId: caseId, extractedInitials: null };
}Now when the AI extracts "BW-1047126":
- The case_id becomes "1047126" (numeric only)
- If child_name is missing, it gets set to "BW" (the initials)
The AI prompt was also updated to encourage numeric-only case IDs and to use initials as names when full names aren't available. But the cleaning function acts as a safety net - even if the AI doesn't follow instructions perfectly, the data gets cleaned.
The Pattern
Real-world data has patterns you don't expect. "BW-1047126" isn't a case ID with a prefix - it's initials plus a case ID. Understanding the domain matters.
Chapter 15 - The Migration
Moving from Vercel to Render.com
Vercel was great, but the free tier had limits. And honestly, for a backend-only service that runs 24/7, Vercel's edge functions weren't the right fit. I needed a traditional server that could run continuously.
Enter Render.com. Free tier. 750 hours per month (enough for 24/7). Auto-restarts on crash. Perfect for a cron job that needs to run every 5 minutes, forever.
The Deployment Process
The migration was straightforward:
- Created a custom Express server (
src/server-simple.ts) with internal cron - Moved
tsxfrom devDependencies to dependencies (needed at runtime) - Updated Google Sheets auth to support environment variables (for cloud deployment)
- Configured Render.com:
- Build command:
npm install; npm run build - Start command:
npm run start:ts - All environment variables from
.env.local
- Build command:
- Set up external cron backup on cron-job.org (redundancy)
The Google Sheets Auth Fix
The original code only read credentials from a file. For cloud deployment, it needed to support environment variables:
async function getGoogleSheets() {
// Try env var first (for cloud deployments)
if (process.env.GOOGLE_SERVICE_ACCOUNT_KEY) {
const credentials = JSON.parse(process.env.GOOGLE_SERVICE_ACCOUNT_KEY);
return new google.auth.GoogleAuth({ credentials });
}
// Fallback to file (for local development)
return new google.auth.GoogleAuth({
keyFile: 'credentials/google-sheets-service-account.json'
});
}Now the same code works locally (with file) and in production (with env var). Simple.
The Result
The system now runs on Render.com, completely free. The internal cron job runs every 5 minutes. The external cron backup ensures reliability. The client's spreadsheet gets updated automatically. Zero maintenance. Zero cost. Zero problems.
The Win
Sometimes the best infrastructure decision is the simplest one. A traditional server running a cron job doesn't need edge functions or serverless magic - it just needs to run.
Chapter 16 - The Final Simplification
When Next.js Became Too Much
The system was working. Emails were processing. Cases were flowing into the spreadsheet. But then Render started sending me alerts: "Out of memory (used over 512MB)". The server was crashing on startup.
I checked the logs. The server was using 200-400MB of memory just to start up. Render's free tier has a 512MB limit. When the health check hit the server, Next.js would load all routes and middleware, causing a memory spike that exceeded the limit before the server could even finish initializing.
I tried everything. Increased Node.js heap size. Added memory logging. Implemented lightweight health checks. But Next.js itself was the problem - it's a full-stack framework designed for React applications. I was using it for a backend-only API that didn't need routing, middleware, or any of Next.js's features.
The Realization
I looked at the codebase. 13,000 lines of UI code that nobody used. A Next.js server that loaded React, routing, middleware, and a build system - just to run a simple cron job. This was overkill.
The client didn't need a dashboard. They didn't need a frontend. They just needed data in their spreadsheet. Why was I running a full-stack framework for a backend-only service?
The Wake-Up Call
"Why am I running Next.js when I just need an Express server with a cron job?"
The Migration
I created a new branch and started from scratch. Not a refactor - a complete rewrite:
- Deleted Next.js entirely - Removed all Next.js dependencies, config files, and the entire app directory
- Built a simple Express server - Just HTTP endpoints and a cron job, nothing more
- Extracted core logic - Moved processing functions to standalone modules (no framework dependencies)
- Removed Supabase - The client didn't want a database, so duplicate detection became in-memory Set-based checking
- Simplified everything - From 13,000 lines to ~1,400 lines of actual code
// Before: Next.js server with routing, middleware, React
import next from 'next';
const app = next({ dev, hostname, port });
await app.prepare(); // Loads all routes, 200MB+ memory
// After: Simple Express server
import express from 'express';
import cron from 'node-cron';
const app = express();
app.listen(port); // 50MB memory, doneThe Results
- 177MB - Memory Usage
- -50% - Memory Reduction
- 0 - OOM Errors
The new Express server uses 177MB of memory at startup. Down from 200-400MB. That's a 50% reduction. More importantly, it stays stable. No more crashes. No more OOM errors. The server starts in seconds, not minutes.
What Changed
Before: Next.js (Complex)
Next.js Server → API Routes → Processing Logic → Google Sheets
200-400MB memory, complex build process
After: Express (Simple)
Express Server → Processing Logic → Google Sheets
177MB memory, no build step needed
The cron job now calls the processing function directly - no HTTP overhead, no network calls, just a function invocation. Faster. Simpler. Better.
// Cron job calls processing directly (no HTTP)
cron.schedule('*/5 * * * *', async () => {
const result = await runProcessingCore();
console.log(`Processed ${result.cases_extracted} cases`);
});The Cleanup
I deleted everything that wasn't needed:
- Next.js config files (
next.config.ts,next-env.d.ts) - All API routes (moved logic to standalone modules)
- Supabase integration (client didn't want a database)
- PostCSS config (no Tailwind needed)
- Vercel config (deploying to Render)
- Public assets (no frontend to serve)
What remained: just the essentials. Express server. Processing logic. Google Sheets integration. Cron job. That's it. 1,400 lines of code that do exactly what they need to do, nothing more.
The Lesson
Sometimes the best optimization is deletion. Removing 13,000 lines of unused code didn't just reduce memory - it made the system simpler, faster, and easier to maintain. Know when to throw code away.
The system now runs perfectly on Render's free tier. No memory issues. No crashes. Just a simple Express server doing its job, every 5 minutes, forever.
Epilogue
The System Lives On
As you read this, the system is running. Every 5 minutes, an internal cron job wakes up, checks for new emails, extracts case data with AI, detects duplicates, and appends to Google Sheets. No human intervention required.
The code is deployed on Render.com (free tier). Express server with node-cron. The internal cron runs every 5 minutes, with an external backup on cron-job.org for redundancy. Simple services, working in harmony, doing the work that used to consume hours of human time.
And somewhere, a staff member opens their laptop, sees their spreadsheet already populated with the morning's referrals, and gets to spend their energy on what actually matters: connecting therapists with the families who need them.
That's the whole point, isn't it?
- 5 - Architecture Pivots
- 15+ - Garbage Filter Patterns
- 3 - Validation Layers
- 13K - Lines of Code Deleted
Sometimes the best solutions come from deeply understanding the problem, iterating relentlessly, and being willing to throw away code that doesn't work. This project taught me that.
What started as "can you make this faster?" became a full-fledged AI-powered workflow automation system. And I loved every frustrating minute of building it.