Bot Analytics Series Part 2: The Importance of Monitoring Bot Traffic

Introduction

As discussed in part one of this blog series, bots represent around 40% of internet traffic. Analysing your bot traffic can help you understand how your site is being used (for example by preview bots) and lets you verify that it is being correctly indexed by search engines. It is also important to get visibility on this traffic to minimise its impact on bandwidth and servers while mitigating for the risk of nefarious bot activity. In this post, we will discuss the importance of monitoring bot traffic and outline how DeviceAtlas can help you to identify and measure bot traffic.

How can I view and monitor bot traffic?

One way to identify known bot traffic is through the International Spiders and Bots List, maintained by the Interactive Advertising Bureau.

Popular analytics software solutions such as Google Analytics automatically exclude traffic from known bots and spiders and so to focus on human traffic. However, understanding bot traffic is important, as understanding it can bring a number of benefits.

Why should I include bot traffic in my reporting?

  • Reporting allows site owners to verify coverage by important bots e.g. Google, Bing, DuckDuckGo
  • Measuring bot traffic enables you to know how much of your traffic is bots versus human users

There are some analytics solutions that can identify some but not all bot traffic. Identifying all bot traffic is challenging because some bots go to great lengths to avoid identification. The DeviceAtlas solution provides high reliability identification of bots where the identification can be made based on the HTTP headers, but there are malicious bots that deliberately try to evade this.

How can DeviceAtlas help?

A key benefit of DeviceAtlas is our bot identification capability. We have our own content serving network that incorporates methods to separate human visitors from robots. Good bots self-identify via their user agent string, and in these cases DeviceAtlas is able to provide the bot name.

 Our bot identification analytics help customers answer questions about their traffic by analysing the ever-increasing number of events (HTTP requests, Workers requests, Spectrum events) that we log every day.

DeviceAtlas provides a set of virtual properties, one of which is an umbrella identifier of non-human traffic (isRobot). Where isRobot is true, a sub-classification is provided to identify the nature of the bot, with a range of properties (isCrawler, isFilter, isDownloader, isFeedReader, Bot name etc).

No other device intelligence solution in the market hosts websites themselves, and as a result they only see HTTP headers from their traffic sources, without the context provided by visibility of the activity of the visitor. As a result, they are unable to distinguish between bot and human visitors through direct measurement.

In addition, the DeviceAtlas API algorithm examines every character in the UA string, rather than just looking for tokens or patterns. As a result, it is sensitive to very minor changes in the UA string, such as spacing or single character or casing changes, and this permits such traffic to be separately identified as bot traffic where applicable.

DeviceAtlas is a member of the International Advertising Bureau (IAB) and subscribes to the IAB Spider and Bot list. DeviceAtlas is regularly tested against the IAB list to ensure completeness and accuracy. The IAB list contains accept list and deny list token information, but the DeviceAtlas approach is more robust than this due to the parsing approach described previously and is able to identify bots which the IAB approach cannot.

Bot traffic that we cannot track

  • Botnets Where a desktop is compromised via malware, it can become part of a botnet. In this situation, it becomes a source of both human and bot traffic. Since the UA headers are identical in each case, DeviceAtlas classifies this as human traffic.
  • Masquerading bots Where a bot masquerades perfectly as a desktop or mobile browser, with identical headers to the real device, it is not possible to classify it as a bot based on header inspection alone.

Conclusion

There are many approaches to analysing your website’s bot traffic. To find out more about DeviceAtlas’ solution, sign up for a free trial today.

Related tags


bot detection