I recently published an article about why and how you should submit a sitemap to Google, so I decided to post an article about why you should NOT submit a sitemap.
Confused? I understand. Please read on.
An XML sitemap is a great and fast way to make sure that Google catches all the URLs on your website that you want to get indexed in Google.
However, if you do not pay attention to how the sitemap is configured and you do not check your internal linking structure, you could end up submitting a sitemap to Google that will hide internal linking problems on your site.
A problem with using XML sitemaps is that you might not discover orphan pages or internal SEO structure problems on your website.
If you do not submit a sitemap, that means Google will access and crawl your website like you would expect, visit a URL and note down all links and then crawl those, note down new URLs, and so on.
Now, do you have your internal linking structure in place? Is your content adequately inter-linked? If not, then it means some of your content might not be discoverable/crawlable. This way you would know you have problems with your website internal linking structure and do fix it.
If you use SEO tools such as Screaming Frog (a favorite of mine), you will crawl the site with and without using the sitemap.
A regular crawl of your website would show if you had problems with your internal links and perhaps missing out a part of your site. For example, if a crawl of your website discovered only a hundred pages when you know there should be 500+ pages, you can pinpoint the problem.
Orphan pages – That is a term referring to pages that still return content, but is not linked to anywhere else on the site. This could be products gone out of stock, pages left over from older versions of the website – content for any reason that is still linked to externally, but just not internally.
If you use a sitemap, Google and other search engines will find the pages that way, all good – right? No, not really. Even if you had external links, other pages linking to those URLs – you would still have a problem.
Your content might still be indexed in Google and show up, but that does not mean you would get good rankings on those pages. If Google does not see you link to your content anymore, they will evaluate the URL lower in their rankings as a result.
Misusing crawl budget
The Google crawler has a lot of pages to visit; the internet is growing fast – as of March 2017 it is 3,732 million users – about half the world population, according to Internet World Stats – https://www.internetworldstats.com/stats.htm
There are 1,195,869,511 domains on the public web (internetlivestats.com/total-number-of-websites/) meaning you should expect to share your website with a lot of other sites who also has webpages to be crawled.
Check your XML sitemap
So, keep in mind you have a crawl budget allocated by Google, and you want to make sure you do not include parts in your sitemap that you do not want to be indexed.
In most cases, you do not want to have your category pages indexed, unless you add valuable content to those pages and use them as central points for the main keyword terms on your site. If you do not, you most likely want to set your category pages to be noindexed.
By noindexing your category pages, you would not want to point to the category pages in your sitemap, wasting the resources Google gives you.
Taking crawl rate and crawl demand together we define crawl budget as the number of URLs Googlebot can and wants to crawl.
From Google Webmaster Central Blog – https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for-googlebot.html
So, as always with most things in web development and SEO there are pros and cons, remember this when you make your choice about sitemaps and if you do use them, make sure to check there are internal links pointing to all pages in your sitemap or you might be creating problems for yourself.
If this helped you out, you should also check out 6 SEO Fundamentals That Everyone Should Utilize