Sign up below to view device data and get your trial account.

We communicate via email to process your request in line with our privacy policy. Please check the box to give us your permission to do this.

Cancel

Choose a category below to quickly discover all our device intelligence articles.


  • Share
  • DeviceAtlas LinkedIn

You are here

The speed imperative—DeviceAtlas and GPUs

DeviceAtlas has some of the world's most demanding customers, and that's just how we like it. A who's-who of brands, they use DeviceAtlas to make sense of—and cater to—trillions of requests every month from their own customers.

Ronan Cremin - 22 May 2024
3 min read
DeviceAtlas GPUs
 
DeviceAtlas has some of the world's most demanding customers, and that's just how we like it. A who's-who of brands, they use DeviceAtlas to make sense of—and cater to—trillions of requests every month from their own customers. 
 
Regardless of the use case, be it content adaptation, targeting or analytics, it is vital that DeviceAtlas performs its role without hindering our customers' needs. This imperative drives a need for efficiency and speed that pervades our entire data and engineering organisations. While DeviceAtlas already deliver class-leading speed, faster is always better when it comes to catering to online activity. Recent advances in GPU technology prompted an idea: could DeviceAtlas be made to work on a GPU and, if so, how fast could we make it go?
 
A quick test suggested that this was a workable idea. Developed entirely in-house, DeviceAtlas’ modern API is unencumbered by external dependencies such as regular expression libraries, allowing it to be readily ported to a range of GPUs without being restricted by CPU-bound logic. While the overall algorithm has many rules and transformations, they all reduce down to a series of simple steps that GPUs excel at. With the exception of our optional data file download logic which can fetch a new data file each day, the complete algorithm is implementable entirely in the GPU. 
 
Once the proof of concept looked promising enough to continue with, we ported the algorithm fully using NVIDIA's CUDA. Once that was working well we moved on to AMD’s HIP and the OpenACC standard, allowing for a wide variety of GPU hardware to be supported. 
 

Results

We tested DeviceAtlas on an ASUS ProArt GeForce RTX 4060 Ti. This GPU has 16GB RAM and was running on a Linux server running a Intel Core i9-14900KF CPU. On this GPU we were able to achieve lookup rates in excess of 50M recognitions per second on a mix of web traffic, more than an order of magnitude higher than what can be expected from a similarly priced CPU. The GPU in question cost €462, yielding an exceptionally good price for performance figure.
 
It is notable that running DeviceAtlas this way frees up the CPU almost entirely for other work, allowing DeviceAtlas to be run on an already-busy server without getting in the way.  
 
The details:
  • GPU: ASUS ProArt GeForce RTX 4060 Ti with 16GB RAM
  • CPU: Intel Core i9-14900KF
  • DeviceAtlas properties: 27 of the most-used properties
  • Traffic: mixed web traffic consisting of 1M header sets played back repeatedly through the DeviceAtlas API
 

Use cases

While this level of performance is always welcome, the nature of this version of our library probably lends itself best to after-the-fact use cases such as web analytics where processing of already-captured bulk data is required, since a single server typically won’t be capable of serving this level of traffic over a network in real time. Alternatively, if the CPU of a server running DeviceAtlas is at risk of becoming a bottleneck adding a GPU into the mix could buy a lot of headroom. Regardless of use case, the order of magnitude gains possible from running DeviceAtlas on GPUs enables significant cost savings to be achieved on any device intelligence workloads.
 
Customers wishing to avail of this version of the DeviceAtlas library should get in contact for further details.