Using Hadoop for Data Analytics

Question

I have a question regarding implementation of hadoop in one of my projects. Basically the requirement is that, we receive bunch of logs on daily basis containing information regarding videos(When it was played, when it stopped, which user playe it etc).

What we have to do is analyze these files and return stats data in response to an HTTP request. Example request: http://somesite/requestData?startDate=someDate&endDate=anotherDate. Basically this request asks for count of all videos played between a date Range.

My question is can we use hadoop to solve this?

I have read in various articles hadoop is not real time. So to approach this scenario should i use hadoop in conjunction with MySQL?

What i have thought of doing is to write a Map/Reduce job and store count for each video for each day in mysql. The hadoop job can be scheduled to run like once a day. Mysql data can then be used to serve the request in real time.

Is this approach correct? Is hive useful in this in any way? Please provide some guidance on this.

Frankie · Answer 1 · Sep 28, 2018

Yes, your approach is correct - you can create the per day data with MR job or Hive and store them in MySQL for serving in real time.

However newer versions of Hive when configured with Tez can provide decent query performance. You could try storing your per day data in Hive serve them directly from there. If the query is a simple select, it should be fast enough.