Analysing G-Forces using a Smartphone, Azure Event Hubs and Apache HBase & Hive in HDInsight

Over the years, I’ve been amazed by how an F1 team is able to accurately analyse the G-Forces experienced by a driver on every bend of every lap of a race. As a teenager, I hoped that someday I would get the opportunity to possess the tools and gadgetry needed to perform such detailed analysis of sensory information.

Today, it’s amazing that our smartphones possess numerous sensors which measure luminosity, orientation relative to the earth’s magnetic field, etc. along with the G-Force vector. By coupling this with a few other platforms and technologies, I was able to build a solution that can be leveraged by any Enterprise to build a highly scalable and cost effective solution that can be used to analyse millions of events every second in real time and simultaneously store the data in a NoSQL database for historical and potentially predictive analysis.

The Testing Ground

The closest I’ve come to experiencing what it might be like in an F1 car is on my bus ride to work! A particular stretch of the route includes several sharp turns and signals, so depending on the driver’s mood on a given day, I could be in for a world of pain. The blue line in the map (from Lot 2 Christie Rd to 97 Waterloo Rd) shows this stretch. My goal was to measure how many G-Forces my body experienced on this part of the route.

The Client

I wanted to use the sensor on my smartphone to capture & transmit the data, however I also wanted to ensure the client-side solution was platform & device independent. So I built a native app for my Windows Phone, but ensured that it made use of HTTPS to transmit the data to the event gathering system. This same logic can be replicated on an Android or iOS device, or even a wearable capable of communicating over HTTPS. Details on how to capture the G-Force vector for a Windows Phone are available here. Along with the vector, I also wrote some basic code to detect and transmit the current geo-codes.

The Event Gathering System

I wanted to use an Event Gathering system that was highly scalable, but also cost effective and easy to set-up & maintain. Apache Kafka & Microsoft Azure Event Hubs are, in my opinion, the top two worthy candidates. The two systems are architecturally similar (from a publisher & consumer’s perspective) and based on benchmark tests I’ve come across, they seem to perform similarly under heavy workloads. I would strongly recommend that if you are deciding between the two, you test both systems adequately to ensure you pick the one which satisfies your performance requirements if you are expecting an extremely large number of messages to be processed every second.

The main reasons for choosing Event Hubs are –

  1. It is offered as a PaaS (Platform as a Service) by Microsoft Azure
  2. You only pay for the number of events consumed (not the time the system is up and running or the underlying infrastructure on which it runs)
  3. Creating it and scaling it up are not complex operations and do not require specialised product knowledge.

Real Time Analysis

To analyse the data in real time, I wrote a simple Event Processor (Example available here) and added logic to it to detect sharp turns and heavy forward acceleration and braking. The snapshot from the output window shown below indicates a few such occurrences.

Screen Shot 2016-05-16 at 9.57.03 PM

Historical Analysis

I added logic to the Event Processor to save the incoming records (after analysing it in real time and displaying it on the console) in a Microsoft HDInsight HBase cluster. Apache HBase is an open source, highly scalable, NoSQL column store database. The data was exposed through an external table in Hive, the create statement for which is shown below.

CREATE EXTERNAL TABLE GForceDataTable(T STRING, Gforce_X FLOAT, Gforce_Y FLOAT, Gforce_Z FLOAT, Latitude FLOAT, Longitude FLOAT)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key,GForce:x,GForce:y,GForce:z,GeoCodes:latitude,GeoCodes:longitude”)
TBLPROPERTIES (“” = “GForceData”);

Once the data was saved, I was able to run a MapReduce job (using HiveQL) which produced the result set shown below. Given the number of sharp turns on the route, I’d expected to see high max and low min values for G-Force X, however it was surprising to see that the max G-Force Y value was the highest among all, which meant the driver had used excessive braking that day!



Using a humble smartphone coupled with Azure Event Hubs & HDInsight, I was able to measure G-Forces experienced by my body on my bus ride to work, both in real time and historically. The architecture of the solution lends itself to processing millions of events per second, while making it extremely simple to build, deploy and manage.

I hope the G-Force is with me on my commute tomorrow!


Scripts to Force a Deadlock in SQL Server

As DBAs, we are constantly looking to proactively find and fix issues before they become major problems. And on most occasions, we are at the mercy of software vendors or in-house application developers who believe it is their God-given right to leave complex database issues to ‘someone else’. That’s where we come in.

Troubleshooting deadlocks is one such example. Unless you have a system in place for detecting and analysing deadlocks, you’re probably going to get an email with a snapshot of a screen in some application where a user has encountered a cryptic message which has the word ‘deadlock’ in it. You would then try and find ways to log details when the next deadlock occurs.

There are many possible ways to log the details of a deadlock such as setting trace flags 1204 & 1222, running a SQL trace with the TSQL_Locks template, or using third party detection tools like SQL Deadlock Detector from Lakeside SQL. There are plenty of posts on the internet which will give you adequate information on all of these techniques.

After you’ve decided on a particular technique, you should look to test the results of the chosen technique in a non-production environment. However, deadlocks are rarely easily reproducible through users running applications. More often than not, deadlocks require a certain level of concurrency and timing which users are rarely able to simulate, requiring you as the DBA to find programmatic ways to force a deadlock.

This post contains scripts I wrote recently to simulate a deadlock. The logic behind why the deadlock occurs has also been described below. I hope this assists you in evaluating your chosen method to log details on deadlocks.

Scripts to create required objects in the database

The script shown below contains the logic to create 2 simple tables containing 2 integer columns each. The id column in each table is the primary key (with a clustered index) and the col field in the second table references the id field in the first (i.e. there is a foreign key between the 2 tables on the col field in the second table).



Script to be used in Session #1

Once you’ve created the above mentioned tables, open up a New Query window and execute the following script.

INSERT dbo.Tbl1 (id, col) VALUES (2, 999)

This script will start a transaction and insert a record into table Tbl1. The new record will contain a primary key field value of 2. The transaction will remain open, i.e. it won’t be committed or rolled back (yet).

Script to be used in Session #2

Open up another New Query window and execute the following script.

INSERT dbo.Tbl2 (id, col) VALUES (111, 2)

This script will also start a new transaction. It will then insert a record into table Tbl2 with a value of 2 for the ‘col’ field, which references the primary key in Tbl1. Since the transaction which inserted a record with a primary key value of 2 into Tbl1 hasn’t committed or rolled back yet, the transaction on Tbl2 waits for the first transaction to finish. This transaction will also be left uncommitted. The primary key field value being inserted into Tbl2 is 111.

Script to be added to Session #1

Add the script shown below to the first Query window and execute it.

INSERT dbo.Tbl2 (id, col) VALUES (111, 555)

This script attempts to create a record with a primary key field value of 111 in Tbl2 as part of the open transaction in Session #1. At this stage, the SQL Engine will detect a deadlock and roll back one of the two open transactions. You should see the deadlock error message (such as the one shown below) in one of the two session windows.

The reason the deadlock occurred is quite simple. Session #1 was hanging on to a record in Tbl1 with a primary key field value of 2. Session #2 was hanging on to a record in Tbl2 with a referencing field value of 2 while it was waiting on the record in Tbl1 to be committed. Session #1 then executed a new SQL which inserted the primary key field value of 111 in Tbl2, which had to wait for Session #2 to either commit or rollback since it was also attempting to insert a record in Tbl2 with the same primary key value. Since both sessions were waiting on each other to proceed, the result was a deadlock which had to be resolved by SQL Server’s automatic deadlock detection feature.

I hope this blog post helps you in reviewing your chosen deadlock detection methodology, and eventually in smashing those pesky deadlocks in your production environment. If you have any questions or need any additional help for detecting and analysing deadlocks, leave a comment on this post.

Foreign Keys – Dependency Tracking

Every now and again, we come across the need to determine the relationships between a set of tables, and the order in which data should be inserted or deleted in them, in accordance with the rules of referential integrity. There is a dearth of information on the internet regarding this subject, so I decided to write my own script and publish it.

The script accepts a list of tables as an input parameter. It then works through the relationships between these tables and the tables they refer in the database. If the latter tables reference additional tables, then they get pulled in as well. Therefore, when the script completes, it prints the entire referential integrity ‘spider web’ of tables linked to the tables supplied in the input list.

It also prints the tables in order, i.e. the order in which data should be inserted. Therefore, the order in which data should be deleted is the opposite of the printed order.

The script’s code is shown below. The output is from the AdventureWorks database. You can download the script’s code from the attachment at the end of the article.

Multiple tables can be fed into the script. The example shown below uses only 1 table (Sales.Customer) from the AdventureWorks database.

The output of the script is shown below. It contains the list of referenced tables listed in order. The ‘lvl’ column indicates the level of depth in the referential integrity chain. The table passed in as the input (Sales.Customer) is not listed in the output. It can be assumed that it is present at level zero.

The script’s code can be downloaded from here –> Script.

Auditing Data Changes in SQL Server using Service Broker’s External Activator

Auditing Data Changes

Auditing of changes to data is an integral part of any Enterprise system’s database. Not only is this information crucial for tracking ‘who-did-what’, but at times, it is also a legal requirement.

In this article, I will attempt to demonstrate the use of Service Broker and it’s External Activator Service for auditing data changes. Before getting to the code that can be used to view this in action, I will validate the Service’s use by looking at other available methodologies. All code used here can be downloaded from a link provided at the end of the article.

So, lets first look at some techniques available in SQL Server 2008 for capturing information when data is modified.

1. SQL Server Audit –

Using database-level audit actions in SQL Server Audit, we can track which user is reading (running select statements) or performing DML actions (insert, update & delete) on chosen tables in the database. While this information can be extremely useful for a DBA, it is not very detailed from a data auditing perspective, i.e. it doesn’t give us ‘before-after’ snapshots of the modified data.

2. Change Tracking –

If we are interested in finding out which records in a table were added, deleted or modified since the last time we checked, Change Tracking is a wonderful mechanism for satisfying our requirements. However, like SQL Server Audit, this only tells us which records were modified, and not the details of what the modifications were.

Change Tracking can be extremely useful in relational databases which act as sources for data warehouses. Since ETL (Extract, transform & load) jobs require information on records to propagate to a warehouse, this method can be used to determine which records to read from the chosen tables.

3. Change Data Capture (CDC) –

While Change Tracking just tells us what was changed, Change Data Capture goes a step further and gives us the details of the data that was modified. However, it is not customizable to a great degree, and therefore, if we wanted to make use of it for our data auditing requirements, and hence we needed to track the user who made the change, we will have to create a field in the audited table for capturing the user’s details (either user name or user id, etc.) and ensure that this field is updated with every insert & update operation so that this info is propagated to the CDC tables. Hence, a delete operation cannot hold the user’s info in the CDC tables, unless we update the user’s info first and then delete the record.

Because of this, CDC falls short of a full blown auditing solution. One could argue that CDC was never built with data auditing in mind, and therefore it is not really a shortcoming of the methodology. CDC, like Change Tracking, is extremely powerful for data warehousing, as it frees us up from having to write any ‘infrastructure’ code to capture changes to data in a relational database.

Where to then?

So, how do we go about setting up the architecture to manage our auditing requirements? Can we have a system in place which will be highly customizable, as well as low on server load?

Some would say the answer is to make use of triggers which look at the inserted and deleted tables and then capture the required audit info in audit tables. This approach is certainly well suited for customization, as the logic for capturing the audit information is completely in the developer’s control.

For example, the user performing the operation can be easily captured with a simple function such as SUSER_NAME (or ORIGINAL_LOGIN in case the context has been switched). Moreover, with this approach, we can choose to audit information only if key fields in the table were audited by making use of the UPDATE(FieldName) function within the trigger’s body. This logic can be extended by using the COLUMNS_UPDATED function, which returns a bit mask representing the actual columns that were updated in the operation.

Having said that, a drawback to this approach is that all operations within the trigger’s body extend the duration of an end user’s transaction. I.e. When an end user using an application (which hits a table in a database) inserts, updates or deletes a record, the trigger’s operations become part of the user’s transaction. This is not very good from a performance point of view, as we do not want end users to be affected by auditing requirements.

Service Broker

This is where SQL Server’s Service Broker comes to the rescue. Service Broker gives us the ability to make these trigger operations asynchronous. I.e. By converting the information contained within the inserted and deleted tables to XML chunks and passing it to a Service Broker queue, a trigger can return control back to the user’s transaction, while Service Broker can take care of performing the operations required for auditing as part of a separate asynchronous thread.

It is important to note that even though it is asynchronous, Service Broker is a ‘reliable’ messaging system. I.e. the above mentioned auditing operation performed by Service Broker will only commit when the user transaction which initiated it commits. Service Broker also scales out extremely well, so we can have multiple instances of ‘listeners’ which will be activated by Service Broker based on its current load (Details about how to make use of Service Broker will be covered in the sections below).

1. Components

The main Service Broker components comprise of message types, contracts, queues and services.

A typical Service Broker ‘conversation’ is quite easy to understand. When such a ‘conversation’ is initiated between 2 services, a message is passed from the initiator service to the recipient service. This message resides in a queue, which belongs to the recipient service. The message is of a particular message type, which includes validation such as WELL_FORMED_XML. The conversation is governed by a contract of message types between the sender and recipient services.

Once Service Broker has processed the message in the queue via Activation (covered in the next section), the recipient service sends an acknowledgement back to the initiator service, and the conversation is completed.

2. Activation

Service Broker supports 2 forms of activation – Internal and External.

Internal Activation occurs when Service Broker makes use of stored procedures to process messages in its queues. External Activation occurs when Service Broker raises events which are monitored by external applications (outside the database). These events notify the application that messages have arrived in the monitored queue, and therefore processing can begin.

For Internal Activation, stored procedures can be written and assigned to queues as their readers. A value can be set on the queue which indicates the maximum number of readers that can be used in parallel when the load on the queue increases, i.e. when there is a large number of messages to be processed. This architecture is well suited towards scaling out.

However, with Internal Activation, all of the processing power needs to be drawn from the database’s server. If the messages passed to the queues are in XML format, performing complex manipulations on XML using SQL can be CPU intensive. For OLTP systems which cater to a large number of end user sessions and transactions, adding additional load to the production system’s database server is most certainly undesirable.

This leaves us with External Activation. External Activation can take the load of processing Service Broker messages away from the database server, as the applications which respond to Service Broker events can sit on an application server, separate from the database server. Moreover, when using XML messages, there is no question that an application written in C# or VB.Net can perform complex manipulations a lot faster than SQL. Also, multiple instances of the application can respond to events being raised from Service Broker, thereby allowing for a scale out solution (This will be demonstrated in the sample code in following sections).


So, we have looked at the mechanisms available in SQL Server for capturing data changes. We’ve also worked through the use of triggers with and without the use of Service Broker, and understood the pros and cons in the different types of Service Broker Activations.

From this, I hope you would have inferred that making use of triggers which communicate with Service Broker, which in turn makes use of External Activation is a powerful strategy for auditing data changes.

The next section will demonstrate this mechanism in action, along with details on how to work with the External Activator Service.

Ok, enough theory! Let’s cut some code!

In this section, I will attempt to demonstrate the use of Service Broker’s External Activator Service with some basic sample code. This code can be extended to suit individual auditing requirements. The scripts for the code can be downloaded from the attachment at the end of the article.

1. Databases

Using the above mentioned components, it becomes easy to split a single OLTP database which contains all tables (including tables for storing the audit information) into separate databases, such as one for the OLTP database and another for the audit info. Most OLTP systems generate a lot of data every day, and if a large part of such a system is audited, then the database can quickly grow in size.

Using External Activation allows for the audit database to be moved to a separate instance of SQL Server, ideally on a separate box, which is great for a busy OLTP database as the load involved in writing data back to the auditing tables is taken away from the production box.

So, with this in mind, let’s create 2 databases. As mentioned above, the auditing database should ideally be on a separate instance of SQL Server on a box other than the production SQL Server’s box. In the sample code, both databases are created on the same instance.

2. Tables

 To keep things simple, we’ll create a simple table in the OLTP database which will get audited for any data change (insert, update or delete). The audit information will get stored in the Audit database. We’ll create 2 tables (parent-child) which will store the audit information.

The design of the audit tables shown here is just a template, for you to use as a starting point in your design. It contains tables with some key fields that are usually needed in auditing. These include the time of the change, the user who made the change, the type of change, the fields in the table that were changed and the value of the field before and after the change. For large OLTP systems, most transaction tables will usually generate huge amounts of audit data, so it might be worthwhile having 2 tables per audited table instead of 2 generic audit tables, for quicker read access. This however incurs an overhead whereby new structures have to be created each time a new table is audited, so if this cost is undesirable, then the audit table structure(s) should probably be created in a generic way whereby all audit data is always stored in the same table(s).

In the sample code, 2 separate tables are created for the table being audited.

3. Service Broker

As mentioned in previous sections, Service Broker architecture makes use of a conversation between 2 services. The script below shows the objects that need to be created to get such a conversation going. If required, the destination service can be in a database other than the initiator service, or even in a separate instance of SQL Server by making use of Routes. In this article, all Service Broker objects are created in the database where the audit data originates.

The Service Broker conversation is initiated by the AUDIT_INITIATOR Service. The audit information, which is data from the inserted and deleted tables available from a trigger under the table being audited, is sent as an XML message from the AUDIT_INITIATOR Service to the AUDIT_TARGET Service. This message also contains additional information such as name of the user who performed the operation, the time of the operation, etc.

The code shown below creates all required database objects (the code can be downloaded from the attachment at the end of the article). After executing the script, once you add, edit or delete a record in the INVOICES table, a message should be created in the AUDIT_MESSAGES queue.

4. External Activator Service and Auditing Application

Now that we have all the database components in place, we need to create the application which will perform the auditing operation and save the audit information back to the AUDIT_DB database. Instances of this application will be spawned by the External Activator Service.

The External Activator Service is a freely available download from Microsoft (While the download says it is a SQL 2008 component, the service can also be used for a SQL 2005 database). The service is a windows service that should ideally be installed on a separate application terminal (i.e. not the database server). The service will be configured to point to the OLTP_DB database (configuration file is shown below). When there are messages in the AUDIT_MESSAGES queue, the service will spawn an instance of the Auditing application that will be written to process the XML messages.

The service ‘knows’ that there are messages to be processed because Service Broker raises a Queue Activation event (AUDIT_EVENT) when messages arrive. When this happens, a record is added to the EXTERNAL_ACTIVATION queue. The service keeps polling this queue, and when messages arrive, instances of the Auditing application are spawned.

The service can be configured to spawn multiple instances of the Auditing application, to cope with heavy load. The recommended maximum number of instances is the number of CPU cores on the terminal.

The first step is to download and install the External Activator Service. The download links are as follows –

X86 Package –

X64 Package –

IA64 Package –

If you have followed the default installation options, the service’s binaries, config and log files would have been installed in “C:\Program Files\Service Broker\External Activator” or “C:\Program Files (x86)\Service Broker\External Activator” depending on your machine’s platform. An explanatory document called SSBEA.doc is also provided as part of this intallation (under “Bin\en” folder), which provides detailed information about the component. The configuration file that needs to be set up is called EAService.config (under Config folder).

The configuration file with all the required settings is shown below (The code for this file can be downloaded from the attachment at the end of the article). Remember that the server name, database name, etc. are case sensitive, so be sure you enter them in the right case. If you are unsure of the case of the SQL Server’s name, run @@SERVERNAME on any database to get the value in the correct case. The server name needs to be changed in the highlighted sections.

Also make sure the account under which the External Activator service is running has adequate permissions to access and work with the OLTP_DB, since the sample application makes use of Integrated Security.

All that remains after this is to create the Auditing application. For this article, I’ve used the sample code provided by Microsoft’s Service Broker team from and modified it to process the XML audit messages (The code has been translated to VB.Net). The modified code is available as part of the download at the end of this article.

Once the code has been compiled as a console application (called AuditApp.exe) and placed in “C:\audit” folder, the External Activator service can be started (from Windows Services). If all settings are correct, the service will spawn an instance of the Auditing application when there are auditing messages to be processed. The application will pull the audit messages from the AUDIT_MESSAGES queue, process the XML and create records in the AUDIT_DB database’s INVOICES_AUDIT and INVOICES_AUDIT_DETAILS tables.

The snapshots shown below demonstrate the auditing for an Insert, Update and Delete operation on the same invoice. The results are cumulative, so the final snapshot has the results of all 3 operations.


In this article, we first went through the features available in SQL Server 2008 for auditing changes to data. We then verified the pros and cons with each of them, and then reviewed Service Broker’s capabilities. We then took a look at the 2 types of Activation used by Service Broker, and then talked about the External Activator service and the advantages in its use in auditing. Finally, we built some sample code and visualized the entire process.

I hope this article gives you enough information before you start building you own custom auditing solution. The purpose behind writing this article was to provide a starting point, from which you can tailor your own strategy as per your business requirements. If you have any questions or comments/recommendations, please send me an email or alternatively leave a comment at the end of this post.


The code used in this article can be downloaded from here

Use of Identity Property to Resolve Concurrency Issues

At times, simple technical solutions help resolve complex problems. This is especially true for an SQL programmer, who, if he knows what he’s doing, can provide massive performance gains with the simplest of changes.

I helped resolve one such performance issue in the past. I wrote about it in a technical publication on SQL Server Central. This article can be viewed at