This is what my robots.txt looks like:
User-Agent: *
Disallow: /info/privacy
Disallow: /info/cookies
Allow: /
Allow : /search/Jaén
Allow : /search/Tarragona
Allow : /search/Rioja
Sitemap: https://www.alquileres.xyz/sitemap.xml
but when I check the sitemap on websiteplanet, it says Sitemap URL not defined in robots.txt
Sitemap not return correct Content-type header. I checked in Postman, is application/xml, I don't know if it has to be another content-type
Also. Isn't redundant to specify other URLs when I insert Allow: /? My sitemap eventually is going to be massive, does my robots need to be as well?
I'm working with NextJs
I've tried running both the sitemap and robots in ts files as NextJs offers with its Metadata Object Type. I've also tried inserting the plain files in the app directory
The
Content-Typeforrobots.txtshould betext/plain. Maybe this would be helpful for figuring out how to change it: Changing the header of the response on NodeJS using requestThe default assumption that bots make is that everything on your site is crawlable. There is no need to include
Allowrules unless they disagree withDisallowrules. In fact, in the originalrobots.txtspec, there was noAllowdirective at all.Allowing
/searchURLs worries me. If those are site-search results pages, that could be horrible for SEO. Search engines don't like sending users from their search results only to land on other search results. They consider it bad user experience and often penalize sites that allow their search results to be crawled and indexed. If your/SearchURLs represent a database query rather than free form text entry to a search results page, I would recommend using some other word in your URL.You should let your privacy and cookies policies get crawled. Search engines consider sites that have them to be higher quality. If you let them get crawled, they would rarely show up in the search results other than if somebody searches for it specifically like
<your site> privacy policy.I'd recommend you use this: