Optimization of the Sitemap
What is a Sitemap?
The Sitemap is a file that tells browsers about what pages can be found on you website (it is a map of the places on your domain). A Sitemap is alwas a xml file that should be placed in the root folder of the domain it is entailing. Every page should be mentioned within the sitemap and per page it should contain the location of the page (loc) the date the page was last modivied (lastmod), the change frequency (changefreq) and information about how important the page is to the website (priority). Having a sitmap is not mandatory to having a website, but it helps automated bots with understanding the structure of your website.
How is it used?
- The sitemap should always be a xml document with UTF-8 encoding. This means it should start with a xml declaration <?xml version="1.0" encoding="UTF-8"?>
- The sitemap should mention all the pages (URLs) of a website. Before you can do this, the pages should be preceded by the urlset tag. This tag should be opend before you enlist the pages and should be closed at the end of the document. The urlset tag also contains references to the standard protocolset: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> [...] </urlset>
- All information about a single page should be within a url tag: <url> [...] </url>
- Every url tag should have a loc tag in which you place the full url of the page: <loc>http://www.theinternetofbots.com</loc>. Remember to close the loc tag at the end and remember to use a full url (starting with http:// or https://). The url tag can contain a maximum of 2048 characters.
- Every url tag should have a lastmod tag in which you place the date and the time you last modivied the page: <lastmod>2017-05-09T12:18:44+00:00</lastmod>. Remember to close the lastmod tag at the end. Also remember that the date is written as yyyy-mm-dd (years, months, days) and the time as HH:MM:SS (hours, minutes, seconds). The time is the Coordinated Universal Time (UTC). If you are living in a different timezone, you should also mention +HH:MM of your timezone. You can also choose to mention only the date (not the time).
- Every url tag should have a changefreq tag in which you mention how often the content of the page will change: <changefreq>yearly</changefreq>. Remember to close the changefreq tag at the end. You cannot choose your own values and have to enter one of the following: always, hourly, daily, weekly, monthly, yearly and never. The value "always" should be used to describe documents that change each time they are accessed. The value "never" should be used to describe archived URLs. Please note that web crawlers may not necessarily crawl pages marked "always" more often.
- Every url tag should have a priority tag in which you mention the importance of the page in referece to other pages on your website: <priority>0.5</priority>. Remember to close the priority tag at the end. The values within the tag should range from 0.0 tot 1.0. The standard priority is 0.5.
- It doesn't matter which program you use for writing a sitemap. You can use software ranging from Adobe Dreamweaver to the Notepad, as long as you conform to the rules and the sitemap will look something like this:
<?xml version="1.0" encoding="UTF-8"?>
- It doesn't matter how you name the sitemap (as long as it is a xml file). The name "sitemap.xml" is often used as a default standard.
- The sitemap should be placed within the highest folder of the domain, from which all URLs are in the same or in lower folders. In the case of www.theinternetofbots.com, the location of the sitemap will be: www.theinternetofbots.com/sitemap.xml.
- The sitemap cannot contain more then 50.000 URLs and cannot exceed the 50MB. You can compress a sitemap (with gzip) in order to make it smaller (also if the original size is above 50 MB).
- If you want to make more then one sitemap file for a single site, all the sitemaps should be included in a sitemap-index.
- If you are not sure on how to (correctly) make a sitemap, use an online tool such as xml-sitemaps.com
Why is it important?
- Web crawlers will use the sitemap to index your website, by which a search engine can better understand the structure, changes and priorities of a website, which in turn can help to improve search results.
- A sitemap can improve the chance that a web crawler will find all the pages of a website (in stead of one or some).
What can I do to check if my sitemap is good?
- Make sure you have a sitemap for your website
- Make sure you have included every page of your website within the sitemap
- Make sure the <lastmod> and the <changefreq> reflect reality. Don't give an unrealistic changefreq in the hope of getting more visits from search engines and don't have a lastmod that is much older then the changefreq is telling.
- Use <priority> to acurately depict which pages are more important then others.
- Make sure the sitemap is defined correctly (see ´How is it used?´).
- Making a sitemap online (for free)
- Wikipedia on sitemaps
- w3schools on xml
- Wikipedia on UTF-8
- Google on making a Sitemap-index