DP 203: Data Engineering on Microsoft Azure
- 6k Enrolled Learners
- Live Class
In this blog, we are going to execute a real-time storm project.
ICD9Data.com provides disease codes ranging from 000-999. These codes are assigned to different diseases uniquely. It is available as a mobile app also.
Let us consider any mobile app used by doctors is generating sensor data. This sensor data has information such as device id, device network, user, latitude, longitude, time, disease code etc. For our scenario, we will be considering only latitude, longitude, time and disease code.
These data are processed using storm and trident topology to detect disease outbreak within any geographic location for disease code less than or equal to 322 at a particular time. If the count of a particular disease increases the threshold, in this case 2,000, then the system will send an alert message with the name of the country, disease code and time. To simplify things for this scenario, we will map every diagnosis event to the closest country.
In this scenario, we will use disease codes between 315 and 322. Refer the table below for diseases and their codes.
Below is the flow of the project which is very self explanatory.
Step 1: Sensor data is streamed by spout from the mobile app.
Step 2: Spout sends only latitude, longitude, time and disease code as tuple.
Step 3: The disease codes below or equal to 322 are filtered and they are sent forward.
Step 4: Depending on the latitude and longitude, a country is assigned.
Step 5: Hour is assigned to each tuple.
Step 6: GroupBy key is done where key is country, dcode, time. Count is calculated after each tuple is processed.
Step 7: Outbreak Trend State ensures that whatever is processed once should not be processed second time.
Step 8: We have kept a threshold of 2,000, if the count crosses the threshold, we consider it as an outbreak and alert message is dispatched with the details of the outbreak.
Download the storm project.
Firstly, start the storm cluster, then run the storm project using below commands.
Command: cd storm/
Command: bin/storm jar /home/edureka/SensorDataAnalysis/SDAnalysis/target/sensor-data-0.0.1-SNAPSHOT.jar in.edureka.trident.topology.DiseaseAlertTopology
The threshold of 2,000 crossed in Turkey for disease code 320 and Disease Outbreak Alert was sent.
This project is just to give you a practical view of storm’s trident topology concept. Here we have taken all the inputs randomly. You can put your business logic in the code according to your data and test it.
Got a question for us? Mention them in the comments section and we will get back to you.