Sign up below to view device data and get your trial account.

We communicate via email to process your request in line with our privacy policy. Please check the box to give us your permission to do this.

Cancel

Choose a category below to quickly discover all our device intelligence articles.


  • Share
  • DeviceAtlas LinkedIn

You are here

A Regex Approach to Analyzing User-Agents: Pros & Cons

Using regular expressions to analyze User-Agent headers is one of the most popular ways to understand device traffic. But is it the most effective approach? In this article, we look at some of the pros and cons of a regex approach to analyzing User-Agents.

Isabel Hughes - 15 Sep 2023
4 min read

Analyzing User-Agents

A regular expression, or regex, approach to analyzing User-Agent headers is commonplace among businesses that want to identify either its own device traffic or the traffic of its clients - regexes use pattern matching to find specific keywords that can identify a given device. It is hugely popular method of device identification, and many companies opt for this approach over a commercial solution. So let's dive into the pros and cons of it!

Pros of a Regex Approach

Financially cost effective

One of the main advantages of using a regex approach for User-Agent parsing is the relatively low cost involved from a financial perspective. While commercial solutions will involve a monthly or annual fee, regexes can be written and managed by developers in-house with no requirement for an external solution. This saves significant costs on budget spend that might be utilized on data partnerships.

Easily accessible

Regex-based User-Agent parsing can be used in nearly every programming language and information on implementation is readily available from online communities such as Stack Overflow, GitHub, and Reddit. This makes it quite straightforward for developers to identify the inputs needed to implement a regex solution in-house.

Useful for basic needs

User-Agent matching with regexes can be helpful for specific use cases, e.g. websites where simple mobile or tablet redirection is sufficient, or where precise detection of properties is not particularly important. Depending on specific requirements, the regex approach can be hugely useful.

Cons of Regex Approach

Accuracy

One of the biggest disadvantages of regex for User-Agent parsing is its level of accuracy vs a commercial solution. A typical regex approach only looks for the presence of keywords such as ‘iPhone’ or ‘Android’ in the User-Agent, but this can produce mixed results. Being able to distinguish between Android tablets and Android smartphones is one obvious weakness, and the presence of the iPhone token may offer as little value as the Mozilla token, since many devices pretend to be iPhones. User-Agent headers by their very nature do not conform to standard patterns and using a technique such as this is not future-proof and will fail silently—only the end-user will see the problem.

User-Agent Client Hints

Until recently, considering a device's User-Agent header was all a developer needed to do in most cases to identify a device. In the last year or so, however, Google's User-Agent Client Hint (UA-CH) proposal has been rolled out in many browsers (including Chrome and Edge) and now affects a substantial and increasing proportion of web traffic. The net effect of this change is that looking at a device's User-Agent header alone will not often be sufficient to determine what it is—the full set of Sec-CH-UA-* headers must be considered collectively for a determination to be made.

This complicates the logic considerably as only some of these headers are present on the first request from a new visitor, and sometimes they contain inconsistent information. Simple regex-based solutions will need to be adapted to consider all of the headers, and also add logic around presence and priority of headers in order to retain accuracy. These changes move a regex-based header parsing solution from a quick and easy fix to a much more complex and involved project that requires a lot more expertise.

Performance

Regular expressions can be slow to execute, especially when dealing with intricate patterns. If optimal performance is a crucial consideration for a business, then this approach may not be the most suitable choice.

Maintenance

Regularly updating regex patterns is necessary to ensure that they remain current and encompass new devices that might not have been previously included. However, keeping up to date with the ever-changing landscape of devices, browsers, and operating systems is a real challenge. Managing this becomes even more demanding when factoring in millions of potential combinations that include language, locale, or side-loaded browsers. Furthermore, the introduction of User-Agent Client Hints has added the complexity of examining numerous additional HTTP headers to accurately identify a device, which was previously accomplished solely through the User-Agent header.

The Commercial Solution

A regex-based approach to device identification offers both flexibility and precision. However, commercial solutions may offer a more comprehensive and dynamic database of devices, freeing up developer resources from constantly maintaining the solution. Commercial solutions can efficiently tackle the challenges posed by millions of possible device permutations, diverse languages, locales, and evolving HTTP headers like User-Agent Client Hints.

For complex and high-maintenance needs, a commercial solution often proves to be the smarter choice, ensuring accurate and up-to-date device identification without the headache of regex upkeep.

In Summary

Deciding on whether to opt for regex-based approach or a commercial solution for device identification will come down to specific needs and challenges. While regexes are relatively inexpensive and offers some degree of customization, a solution based on them can become cumbersome and inefficient for complex scenarios. On the other hand, commercial solutions offer a reliable, high-performance alternative, particularly suited for high-maintenance environments needing to stay on top of the diversifying device landscape.

The key lies in recognizing that both approaches have their merits, and the decision should align with the scale and complexity of the task. Ultimately, the goal remains the same: accurate and efficient device identification to to ensure web accessibility for all.