How to clear duplicate and outdated URLs from Google Search Console?

MontgomeryDixon · July 3, 2024, 4:01pm

I have about 150 pages on my website, but close to 6,000 pages that Google has crawled. 2,500 pages are “Alternate page with proper canonical tag”, 1,000 pages “Not found (404)”, 1,000 pages “Page with redirect”, 1,000 pages “Crawled - currently not indexed”.

I previously used Shopify before switching. I had misconfigured Shopify in such a way that each country I added, it created duplicate pages for every product (en-us, en-eu, …).

When I left Shopify, I setup redirects for every product, however some pages were missed apparently.

Is there any way to clear out the nearly 6,000 pages that shouldn’t have been crawled/are outdated? Maybe the “Removals” page (which I can’t seem to get to work)?

Any help is appreciated. Thank you!

tk11 · July 4, 2024, 5:23pm

Is there some reason to remove them?

404, redirects, crawled but not indexed, duplicates with a canonical… none of these will show in search results.

These entries are useful for seeing which resources had ranking and/or were getting clicks and could benefit from setting up redirects. The presence of these entries are an important feature of the console so I’m having trouble imagining how removing them could be of benefit.

MontgomeryDixon · July 4, 2024, 5:46pm

When you put it that way, I guess not

I assumed that they were like programming warnings (not errors), something that it was best to clear up. Not breaking issues, just misconfiguration or something.

It is entirely possible I have been looking at it wrong. Which would explain why I couldn’t find anything online.

tk11 · July 4, 2024, 8:39pm

They can be indicators of potential issues but from the sounds of it most if not all of them are just what would be expected considering the history you described.

Just setup redirects as needed to keep your old but popular URLs from being 404s and you should be good. The performance section is where you’ll find these. Select the “pages” tab in the section below the graph and you can sort all your top pages by clicks or impressions. You can probably tell at a glance if they’re now 404s and there’s a link for each page if you’re not sure if it’s a dead link now that site has moved.

I think the only thing you need to be concerned about in the indexing section is that all your pages that should be indexed, are indexed. If you notice some that aren’t then you make changes as appropriate and then click the validation link and then “validate fix” to let google know that it should check these pages again ASAP because you’ve resolved the cause of their exclusion. Other than that though, it’s all good. The cruft will fall away eventually.

MontgomeryDixon · July 5, 2024, 1:48pm

Thank you so much for explaining this! I didn’t know this and when I asked on Reddit, no one responded. So, thank you for taking the time to clear it up for me!

globalgigde2 · July 6, 2024, 4:37am

i think you have to follow these steps To clear duplicate and outdated URLs from Google Search Console

Remove URLs: Use the “Removals” tool to temporarily hide URLs.
Update Sitemap: Ensure your sitemap is up-to-date without duplicates.
Set 301 Redirects: Redirect outdated URLs to relevant, active pages.
Use Canonical Tags: Implement canonical tags to indicate the preferred version of duplicate pages.
Submit for Indexing: Request indexing for updated URLs to refresh Google’s index.

These actions help clean up your indexed URLs efficiently.

MontgomeryDixon · July 6, 2024, 11:24pm

Thank you!