I Built an AI-Powered Content Pipeline That Turns URLs Into SEO-Ready Blog Posts (and It Almost Broke Me)

What started as a “quick automation project” turned into a multi-day odyssey of Docker demons, missing node modules, hallucinating GPT models, and a whole lot of stubborn perseverance. But in the end, I built a fully automated pipeline that rewrites web articles, generates SEO metadata, and drafts them in WordPress—all triggered by a single URL.

The Vision
I didn’t want to reinvent blogging. I just wanted something simple:
- Drop in a URL to a news article
- Extract the content
- Rewrite it using GPT in my brand voice
- Create a featured image with GPT-4.5
- Post the whole thing to WordPress with SEO meta title and description
- Email me the tweet and preview URL
Sounds easy, right? Right?
I figured I could stitch it all together with n8n, a few OpenAI calls, and some light server glue. But like every founder who thought they were “just automating a thing,” I soon realized the gap between a great idea and a working system is filled with fire.
The Stack
- DigitalOcean Droplet: (2GB at $12 /mo) Hosting both n8n and custom services
- n8n: The core automation engine (Free before th enterprise upgrade)
- OpenAI API: For rewriting articles, creating SEO metadata, and generating featured images
- Cheerio (later): For article content scraping
- WordPress REST API: To publish the content directly to my website
At first, I figured I’d use Mercury Parser to extract the body content from any given URL. It’s a clean open-source project built for exactly that. Unfortunately, it’s also unmaintained and nearly impossible to install in 2025.
The Mercury Meltdown
Without a proper parser, GPT has no idea what part of the webpage to focus on. It might get the navigation menu, the cookie disclaimer, or an ad block instead of the actual story. Clean article extraction was critical to give the AI accurate input and prevent it from rewriting nonsense.
I tried four different Docker images claiming to offer a Mercury Parser API. All were either dead links, broken builds, or outdated forks. I finally tried to install the package myself using Node.js—and quickly descended into dependency hell:
- cheerio errors (“Cannot find module ‘./lib/cheerio'”)
- iconv-lite errors nested 4 layers deep
- Broken peer dependencies
- Outdated package.json entries
After nearly a full day of npm cache clearing, version pinning, and burning through RAM on my DigitalOcean droplet, I gave up. Mercury wasn’t coming back.
Cheerio to the Rescue
Instead of Mercury, I pivoted to a regex + cheerio approach using a Code node inside n8n. That way I could pull in the raw HTML from any page using a basic HTTP request, then extract the <article> or <main> content manually.
The final code ended up looking like:
const html = $json.body;
const articleMatch = html.match(/]*>([\s\S]*?)<\/article>/i);
const mainMatch = html.match(/]*>([\s\S]*?)<\/main>/i);
const rawContent = articleMatch?.[1] || mainMatch?.[1] || html;
const cleaned = rawContent
.replace(/
Learn More
© 2025 Wisecrowd, Inc. d/b/a ChadGPT & Chad GPT, All rights reserved.