So I was standing at the Chatswood station, listening to songs and minding my own business, when my gaze fell upon one of those gigantic tv displays playing an advertisement for a random surfing channel. At the same time, Spotify with its annoying yet brilliant form of targeted advertising, started playing a commercial about Bond university. This got me thinking, the amount of capital these companies spend for these digital billboards on the station must be enormous, yet it doesn’t have a specific target audience. I mean, if you are a company that creates an anti-aging cream, you wouldn’t want your ad to be played at a time and location where the majority of the audience is 18-25-year-olds, right?
There’s a very simple solution to this issue, though. Imagine a display smart enough to connect with your phone and find out your preferences, likes, dislikes and displays an ad, specifically for you.
Planning on taking up running? A Nike ad plays at the mall! Looking to buy a new laptop? a Dell ad is to be seen everywhere! Interested in pop music? Get the latest Katy Perry album at 20% off!
With the amount of metadata that our phones have been collecting from us, all it takes is an AI smart enough to filter it and we have a system that can recognise the likes, dislikes, brand loyalty and brand preference of pretty much every individual on the planet! How is that for target advertising?
Although this technology raises a few questions in the privacy and ethical departments but you have to admit that just a few years down the road, maybe two, maybe five, we’ll be looking at smart, personalised billboards that will change displays based off of your likes and dislikes, sent from your phones.
Measure Data Usage of a Client/User in a Scalable manner @ server
Problem Statement was very clear, we wanted to measure the data usage of users to understand the data efficiency of our product and usage patterns.
This was inherently a hard problem to solve and if you look around everyone has built custom solutions to solve this problem, especially when you need to measure at an API/Session level. As the application information is embedded in packet and the measuring is happening at an upper layer.
Fortunately in our case the User and API information is at the URI level :).
We use Netty as our application server and we tried multiple ways to measure the data usage at an API level but couldn’t do so. We tried using the
SimpleChannelHandler and built some counters which would be incremented based on the MessageEvent. But due to various reasons it didn’t work, thats for another post.
Anyway as we grew as a business the importance of this metric/feature increased drastically. I re-looked at all the options and then there was a Eureka moment. What about Nginx ? Use Nginx as a pass through proxy and measure the incoming and outgoing traffic.
Experiment : was very simple, setup a Nginx instance before the Netty server as a pass through proxy and setup appropriate log_format. The one of importance are
body_bytes_sent, request_length, bytes_sent. Ran a client and wireshark, measured the API calls via Wireshark and Nginx Log and they were exactly the same. Problem solved.
Next Step: Build a service which parses the Nginx log and pushes the metrics to a data source from various nodes. Make this data source accessible to the reporting module so that we can run all kinds of Analytics and make the product better.
Stats: While doing this experiment, I measured that the there is an overhead of around 300bytes on each API call, which is due to headers. Which will lead me into the next problem of optimising our API usage.