Measure Data Usage of a Client/User in a Scalable manner @ server
Problem Statement was very clear, we wanted to measure the data usage of users to understand the data efficiency of our product and usage patterns.
This was inherently a hard problem to solve and if you look around everyone has built custom solutions to solve this problem, especially when you need to measure at an API/Session level. As the application information is embedded in packet and the measuring is happening at an upper layer.
Fortunately in our case the User and API information is at the URI level :).
We use Netty as our application server and we tried multiple ways to measure the data usage at an API level but couldn’t do so. We tried using the
SimpleChannelHandler and built some counters which would be incremented based on the MessageEvent. But due to various reasons it didn’t work, thats for another post.
Anyway as we grew as a business the importance of this metric/feature increased drastically. I re-looked at all the options and then there was a Eureka moment. What about Nginx ? Use Nginx as a pass through proxy and measure the incoming and outgoing traffic.
Experiment : was very simple, setup a Nginx instance before the Netty server as a pass through proxy and setup appropriate log_format. The one of importance are
body_bytes_sent, request_length, bytes_sent. Ran a client and wireshark, measured the API calls via Wireshark and Nginx Log and they were exactly the same. Problem solved.
Next Step: Build a service which parses the Nginx log and pushes the metrics to a data source from various nodes. Make this data source accessible to the reporting module so that we can run all kinds of Analytics and make the product better.
Stats: While doing this experiment, I measured that the there is an overhead of around 300bytes on each API call, which is due to headers. Which will lead me into the next problem of optimising our API usage.