Analysing G-Forces using a Smartphone, Azure Event Hubs and Apache HBase & Hive in HDInsight

Over the years, I’ve been amazed by how an F1 team is able to accurately analyse the G-Forces experienced by a driver on every bend of every lap of a race. As a teenager, I hoped that someday I would get the opportunity to possess the tools and gadgetry needed to perform such detailed analysis of sensory information.

Today, it’s amazing that our smartphones possess numerous sensors which measure luminosity, orientation relative to the earth’s magnetic field, etc. along with the G-Force vector. By coupling this with a few other platforms and technologies, I was able to build a solution that can be leveraged by any Enterprise to build a highly scalable and cost effective solution that can be used to analyse millions of events every second in real time and simultaneously store the data in a NoSQL database for historical and potentially predictive analysis.

The Testing Ground

The closest I’ve come to experiencing what it might be like in an F1 car is on my bus ride to work! A particular stretch of the route includes several sharp turns and signals, so depending on the driver’s mood on a given day, I could be in for a world of pain. The blue line in the map (from Lot 2 Christie Rd to 97 Waterloo Rd) shows this stretch. My goal was to measure how many G-Forces my body experienced on this part of the route.

The Client

I wanted to use the sensor on my smartphone to capture & transmit the data, however I also wanted to ensure the client-side solution was platform & device independent. So I built a native app for my Windows Phone, but ensured that it made use of HTTPS to transmit the data to the event gathering system. This same logic can be replicated on an Android or iOS device, or even a wearable capable of communicating over HTTPS. Details on how to capture the G-Force vector for a Windows Phone are available here. Along with the vector, I also wrote some basic code to detect and transmit the current geo-codes.

The Event Gathering System

I wanted to use an Event Gathering system that was highly scalable, but also cost effective and easy to set-up & maintain. Apache Kafka & Microsoft Azure Event Hubs are, in my opinion, the top two worthy candidates. The two systems are architecturally similar (from a publisher & consumer’s perspective) and based on benchmark tests I’ve come across, they seem to perform similarly under heavy workloads. I would strongly recommend that if you are deciding between the two, you test both systems adequately to ensure you pick the one which satisfies your performance requirements if you are expecting an extremely large number of messages to be processed every second.

The main reasons for choosing Event Hubs are –

  1. It is offered as a PaaS (Platform as a Service) by Microsoft Azure
  2. You only pay for the number of events consumed (not the time the system is up and running or the underlying infrastructure on which it runs)
  3. Creating it and scaling it up are not complex operations and do not require specialised product knowledge.

Real Time Analysis

To analyse the data in real time, I wrote a simple Event Processor (Example available here) and added logic to it to detect sharp turns and heavy forward acceleration and braking. The snapshot from the output window shown below indicates a few such occurrences.

Screen Shot 2016-05-16 at 9.57.03 PM

Historical Analysis

I added logic to the Event Processor to save the incoming records (after analysing it in real time and displaying it on the console) in a Microsoft HDInsight HBase cluster. Apache HBase is an open source, highly scalable, NoSQL column store database. The data was exposed through an external table in Hive, the create statement for which is shown below.

CREATE EXTERNAL TABLE GForceDataTable(T STRING, Gforce_X FLOAT, Gforce_Y FLOAT, Gforce_Z FLOAT, Latitude FLOAT, Longitude FLOAT)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key,GForce:x,GForce:y,GForce:z,GeoCodes:latitude,GeoCodes:longitude”)
TBLPROPERTIES (“” = “GForceData”);

Once the data was saved, I was able to run a MapReduce job (using HiveQL) which produced the result set shown below. Given the number of sharp turns on the route, I’d expected to see high max and low min values for G-Force X, however it was surprising to see that the max G-Force Y value was the highest among all, which meant the driver had used excessive braking that day!



Using a humble smartphone coupled with Azure Event Hubs & HDInsight, I was able to measure G-Forces experienced by my body on my bus ride to work, both in real time and historically. The architecture of the solution lends itself to processing millions of events per second, while making it extremely simple to build, deploy and manage.

I hope the G-Force is with me on my commute tomorrow!