I have about 150 pages on my website, but close to 6,000 pages that Google has crawled. 2,500 pages are “Alternate page with proper canonical tag”, 1,000 pages “Not found (404)”, 1,000 pages “Page with redirect”, 1,000 pages “Crawled - currently not indexed”.
I previously used Shopify before switching. I had misconfigured Shopify in such a way that each country I added, it created duplicate pages for every product (en-us, en-eu, …).
When I left Shopify, I setup redirects for every product, however some pages were missed apparently.
Is there any way to clear out the nearly 6,000 pages that shouldn’t have been crawled/are outdated? Maybe the “Removals” page (which I can’t seem to get to work)?
404, redirects, crawled but not indexed, duplicates with a canonical… none of these will show in search results.
These entries are useful for seeing which resources had ranking and/or were getting clicks and could benefit from setting up redirects. The presence of these entries are an important feature of the console so I’m having trouble imagining how removing them could be of benefit.
I assumed that they were like programming warnings (not errors), something that it was best to clear up. Not breaking issues, just misconfiguration or something.
It is entirely possible I have been looking at it wrong. Which would explain why I couldn’t find anything online.
They can be indicators of potential issues but from the sounds of it most if not all of them are just what would be expected considering the history you described.
Just setup redirects as needed to keep your old but popular URLs from being 404s and you should be good. The performance section is where you’ll find these. Select the “pages” tab in the section below the graph and you can sort all your top pages by clicks or impressions. You can probably tell at a glance if they’re now 404s and there’s a link for each page if you’re not sure if it’s a dead link now that site has moved.
I think the only thing you need to be concerned about in the indexing section is that all your pages that should be indexed, are indexed. If you notice some that aren’t then you make changes as appropriate and then click the validation link and then “validate fix” to let google know that it should check these pages again ASAP because you’ve resolved the cause of their exclusion. Other than that though, it’s all good. The cruft will fall away eventually.
Thank you so much for explaining this! I didn’t know this and when I asked on Reddit, no one responded. So, thank you for taking the time to clear it up for me!