NEW LIST: The most active crawlers and bots on the web
It's been a while since we looked into the most active web crawlers and bots (2016) on the web. For our latest dive into the data, we looked at the numbers for H1 2018, January to June inclusive.
The data is sourced from thousands of websites built and hosted on the goMobi web publishing platform. Below is the breakdown of all the bots we saw.
The most active crawlers are page preview bots
Unsurprisingly, given its omnipotence, Facebook's externalhit bot is a regular visitor. We saw 7 different types of Facebook crawler in our data.
With over 8.7% of the total visits counted, "Facebookexternalhit" has been busy crawling pages that people share on the the platform, but not quite as busy as the first three months of the year, when its share was 12% of the overall hit count.
When a link is pasted, Facebook crawl the target page and pull information such as title, description and preview/featured images, as below:
The Facebook crawler UAs seen most often are:
|6.5% (of total bot visits) | Facebook | Social Media Agent | Desktop bot|
|2.1% | Facebook | Social Media Agent | Desktop bot|
Although not seen in our data this time, "Facebot" is their advertising crawler, which may make an appearance on sites that use paid promotion on the platform.
You can read more about Facebook's crawlers and how to handle them here.
Bing more active than Google
The most active individual bot on our network was BingPreview. Combined, the 7 Bing bots account for 32.2% of all visits on our network.
BingPreview, "used to generate page snapshots", was closely followed by the standard crawler Bingbot. Interestingly, Bing's mobile crawler announces itself as an iPhone - the very first version, released way back in 2007.
You can read about all crawlers used by Bing, and exactly what each of them does here. Below are the user agents for the main three.
|9.8% | BingPreview | Search Engine | Desktop|
|Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b|
|9.2% | Bingbot | Search Engine | Mobile|
|Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 BingPreview/1.0b|
|7.1% | Bingbot | Search Engine | Desktop|
|Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)|
Googlebot - less active, or just more efficient?
Given their dominance of all things search, it may be surprising to see Google so far down the list for 2018. However, if you've developed the most intelligent, efficient web-crawling and indexing system the planet has ever seen, as well as the most popular free analytics package, you might not need to crawl every site quite as often as your competitors.
That said, we did spot 146 variations of Google crawlers and bots in our data, up from 102 in the first three months of 2018. Given Google's integration with the wider web - think AMP, Structured Data etc - and their move to offer Google services (A.K.A. ads) on KaiOS earlier this year, there are numerous reasons why a Googlebot would visit your site.
Below are the most common Google crawlers, and their UAs, that we saw in the first half of 2018.
|5.0% | Googlebot | Search Engine | Mobile|
|Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|
|2.7% | AdsBot Google | Advertising Bot | Desktop|
|2.2% | Googlebot | Advertising Bot | Mobile|
|Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)|
|2.2% | Googlebot | Search Engine (Images) | Desktop|
|1% | Googlebot | Web Preview Analytics | Desktop|
|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview Analytics) Chrome/41.0.2272.118 Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|
|0.3% | AppEngine Google | App Bot | Mobile|
|AppEngine-Google; (+http://code.google.com/appengine; appid: s~snapchat-proxy)|
|0.09% | Googlebot | Media Partners | Desktop|
The Complete Guide To User Agents.
Download our free e-book on User Agents to learn:
- What is a User Agent?
- How do you parse them?
- What can you do with them?
One of the earliest internet pioneers, Yahoo has undergone major changes in recent years. In 1994 they were called "Jerry and David's guide to the World Wide Web", and simply listed other websites. There was no search offered, but in the year 2000 they integrated Google's product. By 2004, they had developed their own search functionality.
What we see in our data is similar, but on a lesser scale to Google. There is one main crawler - Yahoo! Slurp - and 8 variations, all with different jobs to perform as they navigate the internet.
Unlike Google, Yahoo doesn't appear to have a dedicated mobile web crawler, certainly not one operating at the same scale. Given Google's recent push to the mobile-first index, it's not surprising.
Yahoo's crawlers account for 1.5% of the total bot traffic included, down from 2.6% in Q1 2008. Here are the main crawlers, and their respective UAs.
|0.13% | Yahoo! Slurp | Search Engine | Desktop|
|Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)|
|0.02% | Yahoo! Pipes | Ad Monitoring | Desktop|
|Mozilla/5.0 (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html)|
|0.01% | Yahoo! Pipes | Preview/Fetch client | Desktop|
|Mozilla/5.0 (compatible; Yahoo Link Preview; https://help.yahoo.com/kb/mail/yahoo-link-preview-SLN23615.html)|
Others to note
So we've looked at the "Big Four" and their crawlers and bots, and provided the most common UserAgents we see across our network. What else in the data is interesting?
With 4% of the overall traffic, Sogou Spider is the 6th busiest bot on the list. It's the web crawler for Beijing-based search technology provider, Sogou.com.
|4% | Sogou Spider | Search Engine | Desktop|
|Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)|
ScoutJet, the web crawler for IBM's Watson, clocks in with 1.6% of the total, while another Chinese search engine, so.com, is visible at 1.7% with its 360Spider bot.
|1.6% | ScoutJet | Search Engine | Desktop|
|Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)|
|1.7% | 360Spider | Search Engine | Desktop|
|Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36; 360Spider|
With a DeviceAtlas Cloud Standard, Premium or Enterprise account, you can identify non-human traffic (robots, crawlers, checkers, download agents, spam harvesters and feed readers) in real-time. You can then decide how to act on this information, whether to block all undesired bots at the door, or just treat them in a different way to legitimate human visitors.
Read more about bot detection and how it can help your business to:
Get Instant access to a DeviceAtlas Cloud trial
DeviceAtlas Cloud offer a great way to start detecting mobile device traffic to your site:
- Optimize website content for mobile, tablet, and other devices
- Boost website loading time and minimize page weight
- Handle traffic from any device as you want
Get started with a DeviceAtlas Cloud trial today.