CCBot

  • For a Web archive
  • Called Common Crawl
  • From The United States
  • By an unknown organization
  • Gets a score of 70%

What is CCBot?

CCBot is a bot for website Common Crawl. Its websites describes its own database as follows: The Common Crawl corpus contains petabytes of data collected over the last 7 years. It contains raw web page data, extracted metadata and text extractions. These datasets can be downloaded for free! As to why CCBot is employed, the website offers the following information: Common Crawl is a non-profit organization dedicated to providing a copy of the internet to internet researchers, companies and individuals at no cost for the purpose of research and analysis. A visit by CCBot means that your domain will be included in the Common Crawl database.


BotRank for CCBot (70%):

The Internet of Bots has evaluated CCBot against 50 different checkpoints, of which 35 have been confirmed as being positive. These checkpoints evaluate the transparency and occurrence of a bot and don´t necessarily say something about its quality. The BotRank is calculated as [35*2=] 70%. The details of the bot and the BotRank can be seen below. For more information on how the BotRank is made, you can visit the page Botrank.

User Agent(s) (3/5):

  1. Distinguishable: Yes
  2. Botname: CCBot
  3. Email: Not mentioned
  4. Version: 2.0
  5. Mozilla: Not mentioned

Whois (4/5):

  1. Public: Visit Whois
  2. Organization: Not specified
  3. Country: The United States
  4. City: Los Angeles
  5. Street: Yes

Weblinks (2/5):

  1. User agent: Visit Common Crawl
  2. Crawler: Not available
  3. Homepage: Visit Common Crawl
  4. Query: Not available
  5. Adding: Not available

Usage (3/5):

  1. Recommended: Yes
  2. Category: Web archive
  3. Free query: No
  4. Register: No
  5. Logo: Yes

Occurrence (14/15):

  1. Has visited during 14 of the 15 control months

Webdepth (9/15):

  1. Has visited 9 of the 15 control sites