What is Heritrix?

Heritrix is a bot for the website of the Biblioteca Nacional de Espana (bne.es). When following the URL of the user agent they say: If you have linked directly to this page, your website is being collected by the National Library of Spain, for being subject of legal deposit. Heritrix is cooperating with the bot Bnf.fr_bot, which is the bot for the National Library of France. Both bots appear to work together and will both leave its user agent when crawling a page. It is however not fully transparent what is the difference between Bnf.fr_bot and Heritrix nor what these bots are actually collecting.

BotRank for Heritrix (34%):

The Internet of Bots has evaluated Heritrix against 50 different checkpoints, of which 17 have been confirmed as being positive. These checkpoints evaluate the transparency and occurrence of a bot and don´t necessarily say something about its quality. The BotRank is calculated as [17*2=] 34%. The details of the bot and the BotRank can be seen below. For more information on how the BotRank is made, you can visit the page Botrank.

User Agent(s) (4/5):

  1. Distinguishable: Yes
  2. Botname: Heritrix
  3. Email: Not mentioned
  4. Version: 1
  5. Mozilla: 1

Whois (0/5):

  1. Public: Unknown
  2. Organization: Not specified
  3. Country: Not specified
  4. City: Not specified
  5. Street: Not specified

Usage (5/5):

  1. Recommended: Yes
  2. Category: Web archive
  3. Free query: Yes
  4. Register: Yes
  5. Logo: Yes

Occurrence (3/15):

  1. Has visited during 3 of the 15 control months

Webdepth (1/15):

  1. Has visited 1 of the 15 control sites