The Most Active Bots On The Web - 2018

It's been a while since we looked into the most active web crawlers and bots on the web.

For our latest dive into the data, we looked at the numbers for Q1 2018, and discovered that the situation has changed quite a bit.

The data is sourced from thousands of websites built and hosted on the goMobi web publishing platform.

The most active crawler is no longer a search engine

Unsurprisingly, given its omnipotence, Facebook's externalhit bot is a regular visitor. We saw 7 different types of Facebook crawler in our Q1 data.

With over 12% of the total visits counted, "Facebookexternalhit" has been busy crawling pages that people share on the the platform.

When a link is pasted, Facebook quickly crawls the target page and pull information such as title, description and preview/featured images, as below:

The Facebook crawler UAs seen most often are:

8.5% (of total bot visits) | Facebook | Social Media Agent | Desktop bot
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
3.5% | Facebook | Social Media Agent | Desktop bot
facebookexternalhit/1.1

Although not seen in our data this time, "Facebot" is their advertising crawler, which may make an appearance on sites that use paid promotion on the platform.

You can read more about Facebook's crawlers and how to handle them here.

Bing more active than Google

The most active individual bot on our network was BingPreview. Combined, the 7 Bing bots account for 23.5% of all visits on our network.

BingPreview, "used to generate page snapshots", was closely followed by the standard crawler Bingbot (both desktop and mobile varieties).

You can read about all crawlers used by Bing, and exactly what each of them does here. Below are the user agents for the main three.

9.9% | BingPreview | Search Engine | Desktop
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b
7.3% | Bingbot | Search Engine | Desktop
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
6.1% | Bingbot | Search Engine | Mobile
Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

Googlebot - less active, or just more efficient?

Given their dominance of all things search, it may be surprising to see Google so far down the list for 2018. However, if you've developed the most intelligent, efficient web-crawling and indexing system the planet has ever seen, as well as the most popular free analytics package, you might not need to crawl every site quite as often as your competitors.

That said, we did spot 102 variations of Google crawlers and bots in our data. Given Google's integration with the wider web - think AMP, Structured Data etc - there are numerous reasons why a Googlebot would visit your site.

Below are the most common Google crawlers, and their UAs, that we saw in early 2018.

5.5% | Googlebot | Search Engine | Mobile
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
3% | AdsBot Google | Advertising Bot | Desktop
AdsBot-Google (+http://www.google.com/adsbot.html)
2.8% | Googlebot | Search Engine (Images) | Desktop
Googlebot-Image/1.0
1.6% | Googlebot | Search Engine | Desktop
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview Analytics) Chrome/41.0.2272.118 Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
1% | Googlebot | Mediapartners | Desktop
Mediapartners-Google
0.3% | AppEngine Google | App Bot | Mobile
AppEngine-Google; (+http://code.google.com/appengine; appid: s~snapchat-proxy)
0.3% | Google StructuredDataTestingTool | Testing Tool | Desktop
Mozilla/5.0 (compatible; Google-Site-Verification/1.0)

Yahoo!

One of the earliest internet pioneers, Yahoo has undergone major changes in recent years. In 1994 they were called "Jerry and David's guide to the World Wide Web", and simply listed other websites. There was no search offered, but in the year 2000 they integrated Google's product. By 2004, they had developed their own search functionality.

What we see in our data is similar, but on a lesser scale to Google. There is one main crawler - Yahoo! Slurp - and 8 variations, all with different jobs to perform as they navigate the internet.

Unlike Google, Yahoo doesn't appear to have a dedicated mobile web crawler, certainly not one operating at the same scale. Given Google's recent push to the mobile-first index, it's not surprising.

Yahoo's crawlers account for 2.6% of the total bot traffic included. Here are the main crawlers, and their respective UAs.

2.5% | Yahoo! Slurp | Search Engine | Desktop
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
0.03% | Yahoo! Pipes | Ad Monitoring | Desktop
Mozilla/5.0 (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html)
0.01% | Yahoo! Pipes | Preview/Fetch client | Desktop
Mozilla/5.0 (compatible; Yahoo Link Preview; https://help.yahoo.com/kb/mail/yahoo-link-preview-SLN23615.html)

Others to note

So we've looked at the "Big Four" and their crawlers and bots, and provided the most common UserAgents we see across our network. What else in the data is interesting?

With 5% of the overall traffic, Sogou Spider is the 6th busiest bot on the list. It's the web crawler for Beijing-based search technology provider, Sogou.com.

5% | Sogou Spider | Search Engine | Desktop
Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)

ScoutJet, the web crawler for IBM's Watson, clocks in with 1.8% of the total, while another Chinese search engine, so.com, is visible at 2.2% with its 360Spider bot.

1.8% | ScoutJet | Search Engine | Desktop
Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)
2.2% | 360Spider | Search Engine | Desktop
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36; 360Spider

With a DeviceAtlas Cloud Standard, Premium or Enterprise account, you can identify non-human traffic (robots, crawlers, checkers, download agents, spam harvesters and feed readers) in real-time. You can then decide how to act on this information, whether to block all undesired bots at the door, or just treat them in a different way to legitimate human visitors.

Read more about bot detection and how it can help your business to:

Get Instant access to a DeviceAtlas Cloud trial

DeviceAtlas Cloud offer a great way to start detecting mobile device traffic to your site:

  • Optimize website content for mobile, tablet, and other devices
  • Boost website loading time and minimize page weight
  • Handle traffic from any device as you want

Get started with a DeviceAtlas Cloud trial today.

Get started