Ten Years Later: An Old Problem, Solved Differently Today — Using AI to Modernize Internal Tooling
AITen years ago, we ran into a problem. A client had a bunch of content locked into an old CMS, and through the process of building out their new site we realized that there wasn’t a nice way to get all that content moved across. So we wrote a tool to solve that problem. And then we wrote a blog post about that tool. Over the years, a lot of people have read that blog post, starred that GitHub repo, and hopefully found some value in all of it. So we decided to ask, what would this same problem look like today? Unfortunately (or fortunately?), it still holds up and the overall approach still feels… right? Having an internal tool to help us solve a real-world problem faster than manual key-bashing is still the right call — one that saves time (and money) for both us and our clients.
All of which left us in a spot that probably isn’t all that unique. We’ve got a fairly outdated internal tool that still mostly solves our problems, but just never got properly maintained. Ten years ago, we would have done a tidy little cost analysis — “effort to update this” vs. “pain we feel each time we use it” — landed on some conclusion, and probably written a blog post about it. Today though?
The world of tomorrow, today!
Today, we’ve got a few more tools in the toolbox than we did back then. Namely: AI. And with AI, there’s two paths we can take.
The first is the most obvious — ask AI to do the scrape for us, on demand, per project, every time we need it. Help is just a “Please scrape the content from <insert-old-site-here>.com and drop it into my project” away. There are some real wins here: the AI agent can be hyper-specific to each project’s content needs, you can work through individual issues in the context of that specific project, and there’s nothing to maintain — that one-time content import truly becomes one-time. But as they say, time is money tokens (and tokens are money), so there’s a better answer than redoing all that work every time we need a content scrape.
It turns out, we’ve already solved that problem. Build an internal tool once, reuse it as needed, maintain that project forev—oh. Sounds familiar. But once again, we can turn to AI — not to totally reinvent the wheel, but to bring our old internal tooling into a more modern context, with a much lower “effort to update this” than we would have needed a decade ago. We get to keep all the benefits of the “write once, reuse it a bunch” approach, combined with the small upfront time investment of a simple little prompt.
“Hey, Claude…”
Hey Claude! This is a pretty old codebase. I would like you to audit the existing code and suggest improvements that a) would provide tangible benefits to an end-user, b) would make the collection of scripts more portable, and c) would improve the overall security or reliability of the codebase.
Claude had a think — we use the Superpowers plugin to work through specs alongside Claude before it starts churning out code — and came back with a few ideas:
- A new CLI interface (allowing for non-browser-based interaction)
- Dependency removal to ditch the old jQuery reliance
- A unified config.json to handle all per-project setup without needing to run big find-and-replaces every time we used the script
We validated the new directory structure, then had some back-and-forth on the config.json itself. Claude’s initial proposal — like our original code — was limited to handling one post type at a time. As we reviewed the new configuration, I was able to suggest a way to define multiple post-type configurations at once (and propose some more human-friendly parameter names while we were at it).
This configuration file structure makes sense – one current limitation with the script is that it can only do one set of ‘post types’ at a time. Can we add in the ability to define multiple sources / selector rules / outputs at a time?
The short answer is: yes. A key insight is that posts from the same CMS almost always share the same HTML structure, so selectors are global defaults. What actually varies per-category is:
- Which URLs to crawl
- Which WordPress category to map to
- Occasionally a selector override if one section of the old site was structured differently
So rather than duplicating the full config per source, we’d keep global defaults and add a sources array for per-run variation:
{
“source”: {
“domain”: “https://www.example-source-site.com”
},
“selectors”: {
“page”: “#content”,
“title”: “#ctl00_mainContent_pageTitle”,
“content”: “#ctl00_mainContent_pageContent”,
“attachmentLinks”: “#content a[href^=’doc.aspx’]”,
“attachmentImages”: “#content img[src^=’image.aspx’]”
},
“wordpress”: {
“postType”: “page”,
“postStatus”: “publish”,
“defaultAuthorSlug”: “admin”
},
“sources”: [
{
“name”: “news”,
“categories”: [4],
“selectorOverrides”: {}
},
{
“name”: “blog”,
“categories”: [7],
“selectorOverrides”: {
“title”: “#content h1”
}
}
]
}
Each source produces its own output file (e.g., results-news.json, results-blog.json) derived from its name, so the WP-CLI importer can process them independently or all at once.
The added implementation complexity is minimal: both the browser version and CLI just loop over sources instead of using a single hardcoded URL list. selectorOverrides merges with global selectors, so it’s only needed when something actually differs.
Does this shape work for you?
After that, Claude was off and running — working through the updates we’d discussed, with the occasional check-in when it needed to verify an assumption or get permission to run a command. This process might have been a bit quicker without the Superpowers back-and-forth, but we find it does a great job of unpacking feature design requirements early on. The extra conversation pays off in the quality of output.
But does it work?
The final step in any code-writing exercise — AI or not — is to test it. And that’s doubly important with AI. We fired up the new tool to take it for a spin, and… it mostly worked. The output JSON generated as expected, posts imported properly — but images were still referencing the old site. Hm. Back to Claude:
The WordPress importer side isn’t sideloading the attachments. It should download the attachments as referenced in the output JSON, and replace the old links in the content with the new ones.
Root cause: The function assumes $file->link is a relative path and prepends the domain, but the crawler outputs full absolute URLs. Let me also check the crawler to confirm this is the intended output format:
Root cause confirmed. The crawler stores raw src/href attributes — which are already absolute URLs. The fix is one line in the sideload function — use $file->link as-is when it’s already absolute, only prepend $domain for relative paths.
Less a bug in the code, and more a misunderstanding on Claude’s part — which is a pretty common pattern we run into with AI-generated code. With that fix in place, we re-ran the CLI and imported everything properly: posts, images sideloaded, attached, and referenced correctly. There’s certainly more to test — making sure the browser-based export still functions, validating the crawler on larger inputs — but for what we needed this time, it was plenty. It also proved the concept: the time spent configuring the crawler was significantly less than it would have been with the original version. Not too shabby for three hours of AI-assisted work.
What AI actually did (and didn’t do) here
If I wanted to make a nice callback to the original post (and appease the SEO machine), I’d say something like “AI saved us 15 hours of development!” or “Automating the automated solution!” or “Project managers love this one simple trick!” But the real takeaway is that AI has dramatically lowered the barrier to entry for internal tooling. During development, we constantly run into uniquely-shaped, niche problems — specific to this client, this context, this code stack. In the past, it took something like our original 70-hour copy-paste problem to make building a tool like Commander Vee feel “worth the time.” AI has taken that barrier away almost entirely. A lunch hour, a couple of prompts, and that niche tool is ready before 3pm.
Is the output production-ready and set to ship to end users? Maybe not — but it doesn’t need to be. It just needs to let you move faster, so that you can write the production-ready code, instead of copy-pasting blog post content (or autofilling complex multi-step forms during UAT, or any of the other small problems that come up every day).
So — what’s next?
We’ll keep using Commander Vee — and we’ll keep adjusting it. Clients will always have content, and we’ll always need to move it somewhere. We’ll keep using AI and other tools to create great work. And hopefully one day we’ll get to write a blog post about the next iteration of this tool — if for no other reason than we’d get to call it Commander Vee: Version Three. And that’s a great rhyme.