Building a device database and User Agent strings

Building and maintaining your own device database is a time-consuming effort, given that each entry in the base must include not just device’s name and detailed characteristics, but, most importantly, the User Agent string header. Read on to learn what User Agent strings are used for and how they're created.

What is a User Agent string?

User Agent string is one of the HTTP headers sent by the browser as initial part of the request working as an introduction allowing the web server to decide what should and what shouldn’t be sent to the device. Device database used for device detection purposes also includes User Agent strings.

UAs are defined in the HTTP/1.1 standard (RFC7231) specifying the following fields of use of the user agents:

  • identifying the scope of reported interoperability problems,
  • working around or tailoring responses to avoid particular User Agent limitations,
  • analysing browser or operating system use.

For some historical reasons, User Agent strings often contain information which isn’t true for the requesting device. The UA header may contain keywords aimed to trigger the web server to return any content desired by the software or hardware makers.

We explored the origins of this problem in another blog post on DeviceAtlas.

While User Agent strings contain some information about the requesting device, they shouldn't contain any user-unique information such as IMEIs, device IDs, usernames, etc. This information is dealt with via e.g. cookies.

How are User Agent strings born?

If you run your own website and analyse server logs, you have probably wondered why UA headers are sometime nonsensical. We, at DeviceAtlas, often come across strange User Agent strings such as:

Mozilla/4.0

&as_qdr=all

/usr/local/websense/af_mobile/user_agent/ipad

anonymous

User-Agent

silly_that_i_have_to_do_this

\'"\\'\");|]*{%0d%0a<%00>

 

All User Agent strings are built by software makers using the HTTP standard for requesting online content. These software are mostly browsers, but also apps, bots, crawlers, etc. Of course not every software developer adheres to the HTTP standards.

User Agents are built with product tokens

HTTP/1.1 standard explains that User Agent strings consist of multiple ‘product tokens’:

Product tokens are used to allow communicating applications to identify themselves by software name and version. Most fields using product tokens also allow sub-products which form a significant part of the application to be listed, separated by white space.

 

Each product token includes a product name and its version separated by a “/” sign with some optional information in brackets. The tokens are typically listed by significance, however this is up to the software maker.

Software makers can use these tokens to send browser-specific information. They also use tokens to acquire device-specific information from the device’s ROM, such as the model ID, operating system and its version, etc.  

Here are some examples of product tokens:

  • Mozilla/5.0 (Linux; U; Android 4.3; en-us; SGH-T999N Build/JSS15J)
  • Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X)
  • Mozilla/5.0 (Linux; Android 4.4.3; XT1039 Build/KXB21.14-L1.31)

 

We highlighted device model names in the above examples including Samsung Galaxy S3, iPhone and Motorola Moto G.  

Meaningless User Agents can limit user experience

As you might already know, device detection solutions analyse User Agent strings and other HTTP headers to identify devices requesting websites. These information is used by many businesses all over the globe to enhance their online presence in a number of ways:

Given that User Agent string rules are rather superficial, there is no consistent structure for UAs. This world is still the 'Wild West'. Software developers and device makers can freely conceal the device-specific information, or build nonsensical User Agent strings.

Of course in this way the sites their users visit won't know what to do with the request, and they will likely receive a fallback or default experience. This also means that the software, or the device would be unidentifiable in a sea of analytics data practically disappearing from the market.

Get access to the most comprehensive device intelligence

If you’re looking for a comprehensive device database, look no further. DeviceAtlas offers device database, as part of the detection service, which is constantly updated and maintained by a team of data experts. New entries are included in the number of ways:

  • Mobile Network Operator and device maker partnerships
  • Free user and customer contributions
  • goMobi-generated mobile traffic

DeviceAtlas data scientists manage external partnerships and explore goMobi-generated mobile traffic data to ensure global device coverage. User and customer contributions also help maintain and update the device database.

 

Get started with a local device detection trial

DeviceAtlas is a high-speed device detection solution used by some of the largest companies in the online space to:

  • Optimize UX and conversion rate on mobile
  • Boost web performance
  • Target ads and analyze web traffic

Get started with a locally-installed trial to test DeviceAtlas at no cost.

Get started