I Built a Streamlit App to Clean Up Problematic Internal Links

I Built a Streamlit App to Clean Up Problematic Internal Links


Streamlit

As an SEO editor, my job revolves around optimizing the structure and content quality of our company website, Lifepal.co.id. But there’s one persistent issue that keeps frustrating me: our internal links are a mess—some return 404 errors, some are irrelevant, and others are redirects.

Until now, we’ve been cleaning them manually—checking each internal link one by one, deleting those that lead to 404 pages, and updating those that return a 301 to their final destination. The problem is, we have thousands of articles, and I realized we needed a more efficient solution.

So, I built a small Streamlit-based app to help me and my team automatically and quickly clean up problematic internal links—even though it’s not yet fully automated.

At my company, we recently made major changes to our content structure.

  • First, we restructured the taxonomy for both product and article pages.
  • Second, we deleted many articles that were considered irrelevant or underperforming as part of our strategy to strengthen our website authority.

Another issue arose from the use of UTM parameters in internal links in the past. Initially, these were intended for campaign tracking, but when the target pages were deleted or moved, the URLs with UTMs remained linked, resulting in broken links.

As a result, many internal links that were once valid became redirects, and some even ended up returning 404 errors.

As someone who enjoys tinkering—especially since ChatGPT came along—I felt it was time to combine my SEO and programming skills. I won’t go into the technical development details, but here’s how it works:

  1. The user copy-pastes the article content (text mode) directly from WordPress for link inspection.

  2. The app extracts every URL within <a> tags in the text, then:

    • Removes any UTM parameters if present.
    • Sends HTTP requests to follow redirects.
    • Detects whether the final URL is valid, redirected, or a 404.
    • Analyzes and decides whether the link should be kept, updated, or removed.
  3. The output is shown in a textbox containing cleaned-up HTML and a table listing the current URLs.

This feature makes the internal link cleanup process much faster and more structured. If you’re interested, you can try it here: https://inlinkcleaner.streamlit.app/