From Email Chaos to AI-Powered Automation

The story of how I turned a 4-hour daily copy-paste nightmare into a system that processes emails while everyone sleeps. Complete with plot twists, debugging at 2 AM, and a healthy respect for the chaos of real-world data.

Chapter 01 - The Origin Story

Death by a Thousand Copy-Pastes

It started with a simple observation: Sarah, the operations manager at a pediatric therapy staffing agency, was spending 4-6 hours every single day copying and pasting data from emails into spreadsheets. Let that sink in. Half a workday. Every day. Forever.

Her inbox was drowning in referral emails from Service Coordinators across New York City, each containing case information for children needing early intervention services. Speech therapy. Occupational therapy. Physical therapy. Specialized instruction. Real kids, real families, real urgency.

The referrals looked something like this:

Subject: Fw: Staffing Referral - 5 Cases

Hi,

Looking for therapists for the following cases:

1) Child's Name: Maya Rodriguez
   DOB: 3/15/2022
   EI #: 1234567
   Address: 456 Oak Street, Queens, NY 11101
   Parent: Carmen Rodriguez (718) 555-0123
   Services: ST 2x30, OT 1x45

2) Child's Name: Ethan Cooper
   ...

Thanks,
Jessica Chen
Service Coordinator
Bright Futures Services
(646) 555-4321

Each email could contain anywhere from 1 to 15 cases. Each case needed to be entered into a Google Sheet with 26 columns. Each mistake could mean a child not receiving therapy on time.

The ask was simple: "Can you make this faster?"

The reality was anything but simple.

The Problems

4-6 Hours Daily - Staff spent their mornings doing nothing but copying and pasting data from emails. By the time they finished, more emails had arrived.
Inconsistent Formats - Every Service Coordinator formatted emails differently. Tables, bullet points, paragraphs, forwarded threads three levels deep. No standards, no structure.
Human Error - Typos, missed fields, wrong frequencies. A "2x30" became "230" or got lost entirely. And when you're tired from 3 hours of data entry, mistakes happen.
No Audit Trail - If something went wrong, there was no way to trace what happened. Which email did this case come from? Who entered it? When?

I spend my mornings copying and pasting. By the time I finish, there are already new ones waiting.

Chapter 02 - The Domain

What Even Is Early Intervention?

Before I could build anything, I needed to understand what I was building it for. Early Intervention (EI) is a federally mandated program that provides therapeutic services to children ages 0-3 with developmental delays. In New York City, the system involves:

Evaluating agencies that assess children for developmental delays
Service Coordinators (SCs) who manage cases and connect families with providers
Staffing agencies (like Big Minds Tiny Hands) that employ the actual therapists
Providers who deliver services: Speech Therapy (ST), Occupational Therapy (OT), Physical Therapy (PT), Special Instruction (SI), and Applied Behavior Analysis (ABA)

When a Service Coordinator needs a therapist for a child, they email staffing agencies with the case details. The staffing agency then matches available therapists to cases based on location, availability, and service type.

The Data Challenge

What makes this problem deceptively hard:

No standard format - Every SC formats their emails differently
Multi-case emails - One email can contain 1-15 children
Multi-service cases - One child might need ST, OT, AND PT (three separate rows)
Forwarded emails - The office manager forwards emails, adding another layer of headers
Embedded tables - Some SCs paste from Excel, creating HTML table nightmares
Missing information - Not all fields are always present

The Constraint
The client didn't want to change their workflow. They wanted to keep using their Google Sheet. Whatever I built had to fit into their existing process, not replace it. This turned out to be crucial.

Chapter 03 - The Prototype

Version 1: The Regex Disaster

My first instinct was to build something quick in Python. Parse the emails with regex, dump to CSV, done. I gave myself a weekend. How hard could it be?

# How I thought it would work
case_pattern = r"Child's Name:\s*(.+)\nDOB:\s*(.+)\nEI #:\s*(\d+)"
matches = re.findall(case_pattern, email_body)

Reality check: After 30 minutes of testing, I had covered exactly 3 of the 47 different email formats in the inbox.

Some emails wrote "Child Name:" instead of "Child's Name:". Some used "EI Number:" instead of "EI #:". Some didn't have labels at all - just names and numbers in a semi-structured paragraph. Some were HTML tables that my regex couldn't even begin to parse.

The emails were too varied. Regex wasn't going to cut it. I needed something that could understand context, not just patterns.

Lesson Learned
Don't underestimate real-world data variability. Test data is clean. Production data is chaos wearing a trench coat.

Chapter 04 - Enter AI

Version 2: The AI Gambit

If regex couldn't handle the variability, maybe AI could. I integrated Google's Gemini AI with a carefully crafted prompt:

Extract all case referrals from this email. Return JSON with:
- child_name
- case_id (EI number, 6-7 digits)
- dob
- address, borough, zip
- service_type (ST, OT, PT, SI, ABA)
- session_minutes, weekly_sessions
- parent_name, contact_phone
- sc_name, agency, sc_contact

First test: The AI extracted 8 cases perfectly. I was ecstatic. I texted my friends. I considered this problem solved.

Second test: The AI hallucinated a phone number that didn't exist in the email. Less ecstatic.

Third test: The AI returned the service coordinator's name as "Therapists 2 Go, PLLC" (that's an agency name, not a person). Not ecstatic at all.

The AI Extraction Journey

Getting AI extraction right was an iterative process spanning weeks. Here's what I discovered:

Problem 1: Hallucinated Data

AI would invent plausible-looking phone numbers. Not maliciously - it was just trying to be helpful. But a fake phone number for a family that needs therapy services? Not helpful.

Solution: Cross-reference all extracted data against the original email text. If it's not in the email, it doesn't exist.

Problem 2: Service Coordinator Confusion

The AI couldn't distinguish SC info from other information in the email. It would grab random names from the body and call them the "service coordinator."

Solution: Restructured the prompt to explicitly separate case data from signature extraction. The SC info comes from the email signature, not the body.

Problem 3: Multi-Service Cases

"ST 2x30, OT 1x45" should create TWO rows in the spreadsheet, one for each service. The AI would sometimes return it as one case with both services mashed together.

Solution: Added explicit instructions to return a services array, then split into separate case objects during post-processing.

Problem 4: Forwarded Emails

When Sarah forwards an email, the AI would extract HER signature, not the original SC's. The forwarder's info would end up in the spreadsheet instead of the actual service coordinator.

Solution: Added forwarded email detection and explicit instructions to find the ORIGINAL sender's signature, ignoring the forwarder's.

The final prompt structure evolved into a two-part extraction:

{
  "cases": [
    {
      "child_name": "Maya Rodriguez",
      "services": [
        { "type": "ST", "minutes": 30, "sessions": 2 },
        { "type": "OT", "minutes": 45, "sessions": 1 }
      ]
      // ... other fields
    }
  ],
  "sc_info": {
    // Extracted from email signature, shared across ALL cases
    "sc_name": "Jessica Chen",
    "sc_phone": "646-555-4321",
    "sc_agency": "Bright Futures Services"
  }
}

Chapter 05 - The Architecture

Three Pivots to Get It Right

Building the right architecture wasn't a straight line. It took three major pivots before I found something that actually worked reliably.

Pivot 1: The Excel Disaster

The client's original workflow used SharePoint Excel files. My first approach: sync directly to Excel via Microsoft Graph API. What could go wrong?

Everything.

Excel files lock when someone has them open
The API would randomly fail with "file in use" errors
Concurrent edits caused merge conflicts that corrupted data
SharePoint's sync delays meant changes took minutes to appear

I spent days fighting Microsoft's ecosystem before accepting defeat. The pivot: Google Sheets. Real-time collaborative editing, no file locks, instant updates, and an API that actually works. The client switched their workflow in a day and never looked back.

Pivot 2: The Sync Problem

My initial architecture looked clean on paper:

Version 2.0 Architecture: Outlook → AI Extraction → PostgreSQL → Google Sheets

The problem: that last arrow. The sync from database to Google Sheets was manual. Click a button, cases get added. But what if you click twice? Duplicates. What if you forget to click? Data isn't in the sheet.

I added an exported_to_sheet flag to track which cases had been exported. Now the export only pushed new cases.

New problem: The client was editing the Google Sheet directly (adding notes, marking cases as filled). If I ever needed to re-sync, I'd overwrite their edits.

Even newer problem: Sometimes the export would fail halfway through, leaving the database thinking cases were exported when they weren't.

This was getting messy.

Pivot 3: The "Append Only" Revelation

After several debugging sessions at 2 AM, I had an epiphany: why sync at all?

The Google Sheet is the source of truth for the client. They edit it, add formulas, color cells. I shouldn't be replacing data - I should only be adding new cases.

Final Architecture (v2.5): Outlook → AI Extraction → Google Sheets → PostgreSQL

Key insight: Append new cases directly to the sheet, immediately after extraction. Keep the database purely for tracking what's been processed (to prevent duplicates).

The Benefits
Sheet is always up to date immediately. No sync lag. No failed sync states. Client can edit freely - we never touch existing rows. Simpler code, fewer bugs. This was the turning point that made the system reliable.

Chapter 06 - The Garbage Data Saga

When AI Goes Rogue

With AI extraction working and the append-only architecture in place, I thought I was done. Then I looked at the actual data in production. Row 27 made me spit out my coffee.

Row 27: Agency = ~~"childs area"~~
Row 32: Agency = ~~"Child name Rodriguez, Maya DOB"~~
Row 74: Agency = ~~"Referral for in Person Services"~~

What? That's not an agency. "Child name Rodriguez, Maya DOB" is case data. "Referral for in Person Services" is the email subject line. How did this end up in the agency column?

The Investigation

I spent hours tracing through logs. The AI was doing something unexpected: when it couldn't find a signature block with agency information, it would grab anything that looked vaguely agency-like from the email body. Subject lines, case data, random phrases.

The AI was trying to be helpful. It saw an empty agency field and thought "I should fill this in!" But its guesses were catastrophically wrong.

The Garbage Filter

The fix required building a garbage filter - a set of regex patterns to catch and reject invalid values:

const agencyGarbage = [
  // Email subject snippets
  /referral\s+for/i,
  /staffing\s+referral/i,
  /cases?\s+with/i,
 
  // Case data that leaked into agency field
  /child\s*'?s?\s*name/i,
  /\bDOB\b/i,
  /parent\s*'?s?\s*name/i,
 
  // Contains EI numbers or phone numbers
  /\d{5,7}/,  // EI number pattern
  /\d{10}/,   // Phone number pattern
 
  // Forwarder's agency (should find original sender)
  /big\s*minds/i,
  /tiny\s*hands/i
];

Similar filters for SC names:

const scNameGarbage = [
  /^therapists?$/i,
  /^coordinator$/i,
  /^services?$/i,
  /\s+to$/i,  // Catches truncated names like "Roots to"
];

Defense in Depth

But here's the thing about production systems: bugs find a way. The AI filter was catching 95% of garbage, but edge cases kept slipping through.

I implemented a two-layer defense:

Layer 1: ai.ts filters during extraction
Layer 2: route.ts sanitizes data right before writing to the sheet

Layer 1 filters during extraction. Layer 2 is a safety net that sanitizes data right before writing to the sheet. Even if the AI returns garbage and Layer 1 somehow misses it, Layer 2 catches it before it pollutes production data.

The Lesson
A single validation layer will always have edge cases. Multiple layers catch what slips through. It's worth the extra code. Always.

Chapter 07 - Duplicates

The Duplicate Detection Challenge

Another fun problem: email threads.

When a Service Coordinator sends a referral, they might:

Send the original email with 5 cases
Reply to add 2 more cases
Forward the whole thread when updating case details

The system would see the forwarded email as "new" and try to extract all 7 cases again - but 5 of them already exist! Without proper handling, we'd have duplicates everywhere.

Two-Tier Deduplication

I implemented a two-tier duplicate check:

Primary: case_id + service_type

// "EI #1039573 + ST" = unique identifier
if (existingCase.case_id === newCase.case_id &&
    existingCase.service_type === newCase.service_type) {
  return { isDuplicate: true };
}

Fallback: child_name + service_type + borough

Why a fallback? Some referrals don't include EI numbers. A child named "Noah Williams" getting OT services in Brooklyn should only have one row, even if the email doesn't include an EI number.

// For cases without EI numbers
if (existingCase.child_name === newCase.child_name &&
    existingCase.service_type === newCase.service_type &&
    existingCase.borough === newCase.borough) {
  return { isDuplicate: true };
}

Chapter 08 - Debugging

Logging: The Unsung Hero

One of the most valuable investments was building comprehensive logging. When something goes wrong at 11 PM and you need to figure out why, logs are everything.

The system now outputs structured logs with clear prefixes:

[PROCESS] ================ Starting extraction run: abc-123 ================
[PROCESS] Found 12 unread emails

[EMAIL #1] Subject: Fw: Staffing Referral - 5 Cases
[EMAIL #1] From: sarah@staffingagency.com
[EMAIL #1] SKIPPING - Already processed (extracted 5 cases)

[EMAIL #2] Subject: New OT Referral
[EMAIL #2] From: jessica@brightfutures.com
[AI #2] Extracting cases...
[AI #2] Extracted 3 cases

[CASE] Maya Rodriguez | ST | case_id=1234567
[CASE] SC: Jessica Chen | Agency: Bright Futures Services | Phone: 646-555-4321
[CASE] NEW - Appending to sheet...
[SHEET] Appending: Maya Rodriguez | ST | SC: Jessica Chen
[SHEET] SUCCESS

[PROCESS] ================ EXTRACTION COMPLETE ================
[PROCESS] Duration: 15.2s
[PROCESS] Cases extracted: 12
[PROCESS] Cases added to sheet: 10
[PROCESS] Duplicates skipped: 2

With these logs, I can trace exactly what happened to every email, every case, every field. When Sarah says "this case looks wrong," I can find the exact email, see what the AI extracted, and understand where things went sideways.

Pro Tip
The time invested in structured logging pays for itself the first time you're debugging a production issue at midnight. Don't skimp on logs.

Chapter 09 - The Stack

Technical Decisions & Why

Component	Technology	Why
Backend	Express.js	Simple HTTP server, minimal overhead, perfect for API-only service
AI	Gemini 2.5 Flash	Fast, cheap, good at structured extraction
Email	Microsoft Graph API	Native Outlook integration
Sheets	Google Sheets API	Client's existing workflow
Hosting	Render.com	Free tier, 24/7 uptime, internal cron support
Cron	node-cron (internal) + cron-job.org (backup)	Dual-layer reliability - internal runs every 5min, external as safety net

The Extraction Pipeline

Internal cron job (node-cron) triggers /api/process every 5 minutes
Microsoft Graph API fetches unread emails
For each email: a. Check if already processed (skip if so) b. Run relevance filter (is this a referral?) c. Call Gemini AI with structured prompt d. Parse JSON response (with retry on malformed JSON) e. Clean case IDs (strip prefixes like "BW-1047126" → "1047126") f. Extract initials from case ID if child name missing g. Apply garbage filters to SC info h. Read existing sheet data for duplicate detection i. For each case:
- Check duplicates in sheet (case_id+service_type OR child_name+service_type+borough)
- Skip if duplicate found
- Sanitize data (safety net filter)
- Append to Google Sheet
- Insert into database for tracking j. Mark email as read in Outlook (fire-and-forget)
Log summary statistics

Chapter 10 - The Results

The Numbers Don't Lie

4h - Daily Time Saved
0 - Daily Clicks Required
~98% - Extraction Accuracy

Metric	Before	After
Time per email	15-20 minutes	3-5 seconds
Daily processing	4-6 hours (manual)	~8 minutes (automated)
Data accuracy	~92% (human error)	~98% (AI + validation)
Missed cases	Common	Zero (auditable)
Audit trail	None	Complete

Extraction Quality (Current Production)

Child name completeness: 100%
EI number completeness: 94%
SC name completeness: 99%
SC phone completeness: 70-80% (depends on email signatures)
Agency completeness: 80-85%

The missing SC data isn't a system failure - it's a data availability issue. Not all Service Coordinators include complete contact information in their email signatures. You can't extract what doesn't exist.

The Real Win
It's not about the hours saved. It's about removing a soul-crushing task that made people dread Monday mornings. Nobody got into pediatric therapy staffing because they love copying data from emails.

Chapter 11 - Lessons Learned

What I'd Tell Past Me

Real-World Data is Messy - No matter how clean your test data looks, production will surprise you. Build systems that degrade gracefully and log everything. The chaos is coming - be ready.
Defense in Depth Works - A single validation layer will always have edge cases. Multiple layers catch what slips through. It's worth the extra code. Every. Single. Time.
Append-Only is Powerful - Fighting sync conflicts is a losing battle. Design for append-only when possible - it eliminates entire categories of bugs. Let the user own their data; you just add to it.
Logs Are Worth Their Weight in Gold - The time invested in structured logging pays for itself the first time you're debugging a production issue at midnight. Future you will thank present you.
AI is a Tool, Not a Solution - Gemini can extract data, but it needs guidance, validation, and error handling. The "AI" part is maybe 20% of the system - the other 80% is traditional engineering. Don't let the AI hype fool you.
Iterate with Real Users - Every major improvement came from Melina reporting an issue: "This agency name looks weird", "I'm seeing duplicates", "Why is this case missing?". Real usage reveals real bugs. Ship early, listen hard.
Sometimes Less is More - The frontend was beautiful, but the client didn't need it. They just needed the data in their spreadsheet. Removing 13,000 lines of UI code made the system simpler, faster, and cheaper to run. Know when to delete.
Redundancy is Reliability - One cron job is a single point of failure. Two cron jobs (internal + external backup) means the system keeps running even if one fails. The extra 5 minutes to set up the backup saved hours of debugging later.
The Right Tool for the Job - Next.js is great for full-stack React apps. But for a backend-only API with a cron job? Express is the right tool. Don't use a framework just because it's popular - use what fits the problem. The migration from Next.js to Express cut memory usage in half and eliminated all the OOM errors. Sometimes the best solution is the simplest one.

The best code is the code you don't have to write. The second best is code that handles real-world chaos gracefully.

Chapter 12 - The Simplification

When I Realized the Frontend Was Overkill

Six months into production, I had a realization: nobody was using the dashboard. The client opened their Google Sheet, saw the data, and that was it. The beautiful Next.js UI with charts, filters, and analytics? Unused.

Meanwhile, Vercel was sending me usage alerts. The free tier was generous, but we were pushing limits. Every page load, every API call, every static asset - it all counted. And for what? A dashboard nobody looked at.

The Wake-Up Call
"Why am I paying for a React app when the client just wants data in a spreadsheet?"

The Backend-Only Refactor

I created a new branch and started deleting. Dashboard? Gone. Cases page? Gone. Analytics? Gone. Settings UI? Gone. All the Radix UI components, Framer Motion animations, Tailwind styles - 13,000 lines of code, deleted.

What remained:

The email processing API endpoint
The AI extraction logic
The Google Sheets integration
The duplicate detection
A simple custom server with an internal cron job

The result? A system that did exactly what it needed to do, nothing more. Smaller bundle. Faster startup. Lower costs. Better.

Internal Cron: No External Dependencies

Instead of relying on cron-job.org to ping the endpoint, I added node-cron directly to the server. Now the cron job runs inside the application:

import cron from 'node-cron';
 
// Runs every 5 minutes
cron.schedule('*/5 * * * *', async () => {
  await fetch('http://localhost:3000/api/process', {
    method: 'POST',
    headers: { 'Authorization': 'Bearer ...' }
  });
});

Simple. Self-contained. No external service needed. But I kept cron-job.org as a backup - because redundancy is reliability.

The Lesson
Build what's needed, not what's impressive. Sometimes the best feature is a feature you remove.

Chapter 13 - The Duplicate Problem

When the Same Email Ran Twice

The client called: "I'm seeing the same cases twice in the sheet. What's happening?"

I checked the logs. The same email had been processed twice - once at 9:15 AM, again at 9:20 AM. The system was working perfectly, but it had no memory of what it had already written to the sheet.

The database tracked processed emails, but if an email was marked unread again (maybe the client accidentally marked it unread, or Outlook synced weird), the system would process it again. And since we were writing directly to the sheet, duplicates appeared.

Sheet-Based Duplicate Detection

The solution: read the sheet before writing. Before appending new cases, the system now:

Reads existing rows from the sheet (case_id, child_name, service_type, borough)
Checks each new case against existing rows
Uses two-tier matching:
- Primary: case_id + service_type (most reliable)
- Fallback: child_name + service_type + borough (for cases without EI numbers)
Skips duplicates, only appends new cases

// Read existing sheet data
const existingRows = await readExistingSheetData();
 
// Filter duplicates
const uniqueCases = cases.filter(case => {
  // Primary: case_id + service_type
  if (case.case_id && case.service_type) {
    return !existingRows.some(row => 
      row.case_id === case.case_id && 
      row.service_type === case.service_type
    );
  }
  // Fallback: child_name + service_type + borough
  return !existingRows.some(row =>
    row.child_name === case.child_name &&
    row.service_type === case.service_type &&
    row.borough === case.borough
  );
});

Now when the same email runs twice, the system sees the cases already exist and skips them. Problem solved.

The Insight
Idempotency isn't just for APIs - it's for data pipelines too. The same input should never produce duplicate output, no matter how many times it runs.

Chapter 14 - The Case ID Mystery

When "BW-1047126" Wasn't a Case ID

Another client report: "The case IDs look wrong. I'm seeing 'BW-1047126' but it should just be '1047126'."

I looked at the data. Some case IDs had prefixes: "BW-1047126", "JM-900152", "RT-948655". The AI was extracting them exactly as written in the emails, but the client's system expected just the numbers.

More interesting: when the case ID was "BW-1047126", the child name was often missing. But "BW" looked like initials. Could the prefix be the child's initials?

The Clean Case ID Function

I added a function to strip prefixes and extract initials:

function cleanCaseId(caseId) {
  // Match patterns like "BW-1047126" or "JM900152"
  const match = caseId.match(/^([A-Z]{2,3})[-]?(\d{6,7})$/);
  if (match) {
    return {
      cleanedCaseId: match[2], // "1047126"
      extractedInitials: match[1]  // "BW"
    };
  }
  return { cleanedCaseId: caseId, extractedInitials: null };
}

Now when the AI extracts "BW-1047126":

The case_id becomes "1047126" (numeric only)
If child_name is missing, it gets set to "BW" (the initials)

The AI prompt was also updated to encourage numeric-only case IDs and to use initials as names when full names aren't available. But the cleaning function acts as a safety net - even if the AI doesn't follow instructions perfectly, the data gets cleaned.

The Pattern
Real-world data has patterns you don't expect. "BW-1047126" isn't a case ID with a prefix - it's initials plus a case ID. Understanding the domain matters.

Chapter 15 - The Migration

Moving from Vercel to Render.com

Vercel was great, but the free tier had limits. And honestly, for a backend-only service that runs 24/7, Vercel's edge functions weren't the right fit. I needed a traditional server that could run continuously.

Enter Render.com. Free tier. 750 hours per month (enough for 24/7). Auto-restarts on crash. Perfect for a cron job that needs to run every 5 minutes, forever.

The Deployment Process

The migration was straightforward:

Created a custom Express server (src/server-simple.ts) with internal cron
Moved tsx from devDependencies to dependencies (needed at runtime)
Updated Google Sheets auth to support environment variables (for cloud deployment)
Configured Render.com:
- Build command: npm install; npm run build
- Start command: npm run start:ts
- All environment variables from .env.local
Set up external cron backup on cron-job.org (redundancy)

The Google Sheets Auth Fix

The original code only read credentials from a file. For cloud deployment, it needed to support environment variables:

async function getGoogleSheets() {
  // Try env var first (for cloud deployments)
  if (process.env.GOOGLE_SERVICE_ACCOUNT_KEY) {
    const credentials = JSON.parse(process.env.GOOGLE_SERVICE_ACCOUNT_KEY);
    return new google.auth.GoogleAuth({ credentials });
  }
  // Fallback to file (for local development)
  return new google.auth.GoogleAuth({ 
    keyFile: 'credentials/google-sheets-service-account.json' 
  });
}

Now the same code works locally (with file) and in production (with env var). Simple.

The Result

The system now runs on Render.com, completely free. The internal cron job runs every 5 minutes. The external cron backup ensures reliability. The client's spreadsheet gets updated automatically. Zero maintenance. Zero cost. Zero problems.

The Win
Sometimes the best infrastructure decision is the simplest one. A traditional server running a cron job doesn't need edge functions or serverless magic - it just needs to run.

Chapter 16 - The Final Simplification

When Next.js Became Too Much

The system was working. Emails were processing. Cases were flowing into the spreadsheet. But then Render started sending me alerts: "Out of memory (used over 512MB)". The server was crashing on startup.

I checked the logs. The server was using 200-400MB of memory just to start up. Render's free tier has a 512MB limit. When the health check hit the server, Next.js would load all routes and middleware, causing a memory spike that exceeded the limit before the server could even finish initializing.

I tried everything. Increased Node.js heap size. Added memory logging. Implemented lightweight health checks. But Next.js itself was the problem - it's a full-stack framework designed for React applications. I was using it for a backend-only API that didn't need routing, middleware, or any of Next.js's features.

The Realization

I looked at the codebase. 13,000 lines of UI code that nobody used. A Next.js server that loaded React, routing, middleware, and a build system - just to run a simple cron job. This was overkill.

The client didn't need a dashboard. They didn't need a frontend. They just needed data in their spreadsheet. Why was I running a full-stack framework for a backend-only service?

The Wake-Up Call
"Why am I running Next.js when I just need an Express server with a cron job?"

The Migration

I created a new branch and started from scratch. Not a refactor - a complete rewrite:

Deleted Next.js entirely - Removed all Next.js dependencies, config files, and the entire app directory
Built a simple Express server - Just HTTP endpoints and a cron job, nothing more
Extracted core logic - Moved processing functions to standalone modules (no framework dependencies)
Removed Supabase - The client didn't want a database, so duplicate detection became in-memory Set-based checking
Simplified everything - From 13,000 lines to ~1,400 lines of actual code

// Before: Next.js server with routing, middleware, React
import next from 'next';
const app = next({ dev, hostname, port });
await app.prepare(); // Loads all routes, 200MB+ memory
 
// After: Simple Express server
import express from 'express';
import cron from 'node-cron';
const app = express();
app.listen(port); // 50MB memory, done

The Results

177MB - Memory Usage
-50% - Memory Reduction
0 - OOM Errors

The new Express server uses 177MB of memory at startup. Down from 200-400MB. That's a 50% reduction. More importantly, it stays stable. No more crashes. No more OOM errors. The server starts in seconds, not minutes.

What Changed

Before: Next.js (Complex)
Next.js Server → API Routes → Processing Logic → Google Sheets
200-400MB memory, complex build process

After: Express (Simple)
Express Server → Processing Logic → Google Sheets
177MB memory, no build step needed

The cron job now calls the processing function directly - no HTTP overhead, no network calls, just a function invocation. Faster. Simpler. Better.

// Cron job calls processing directly (no HTTP)
cron.schedule('*/5 * * * *', async () => {
  const result = await runProcessingCore();
  console.log(`Processed ${result.cases_extracted} cases`);
});

The Cleanup

I deleted everything that wasn't needed:

Next.js config files (next.config.ts, next-env.d.ts)
All API routes (moved logic to standalone modules)
Supabase integration (client didn't want a database)
PostCSS config (no Tailwind needed)
Vercel config (deploying to Render)
Public assets (no frontend to serve)

What remained: just the essentials. Express server. Processing logic. Google Sheets integration. Cron job. That's it. 1,400 lines of code that do exactly what they need to do, nothing more.

The Lesson
Sometimes the best optimization is deletion. Removing 13,000 lines of unused code didn't just reduce memory - it made the system simpler, faster, and easier to maintain. Know when to throw code away.

The system now runs perfectly on Render's free tier. No memory issues. No crashes. Just a simple Express server doing its job, every 5 minutes, forever.

Epilogue

The System Lives On

As you read this, the system is running. Every 5 minutes, an internal cron job wakes up, checks for new emails, extracts case data with AI, detects duplicates, and appends to Google Sheets. No human intervention required.

The code is deployed on Render.com (free tier). Express server with node-cron. The internal cron runs every 5 minutes, with an external backup on cron-job.org for redundancy. Simple services, working in harmony, doing the work that used to consume hours of human time.

And somewhere, a staff member opens their laptop, sees their spreadsheet already populated with the morning's referrals, and gets to spend their energy on what actually matters: connecting therapists with the families who need them.

That's the whole point, isn't it?

5 - Architecture Pivots
15+ - Garbage Filter Patterns
3 - Validation Layers
13K - Lines of Code Deleted

Sometimes the best solutions come from deeply understanding the problem, iterating relentlessly, and being willing to throw away code that doesn't work. This project taught me that.

What started as "can you make this faster?" became a full-fledged AI-powered workflow automation system. And I loved every frustrating minute of building it.