Apache Storm is popular because of it real-time processing features and many organizations have implemented it as a part of their system for this very reason. Let’s take a look at how organizations are integrating Apache Storm.
Apache Storm Use Cases:
Storm is used to power a variety of Twitter systems like real-time analytics, personalization, search, revenue optimization and many more. Apache Storm assimilates with the rest of Twitter’s infrastructure which includes, database systems like Cassandra, Memcached, etc, the messaging infrastructure, Mesos and the monitoring & alerting systems. Storm’s isolation scheduler makes it feasible to utilize the same cluster for production applications and in-development applications as well. It provides an efficient way for capacity planning.
Yahoo! is working on a next generation platform that enables merging of Big Data and low-latency processing. Though Hadoop is the primary technology used here for batch processing, Apache Storm allows stream processing of user events, content feeds, and application logs.
Infochimps uses Apache Storm as the source for one of three of its cloud data services- Data Delivery Services (DDS), which employs Storm to provide a fault-tolerant and linearly scalable enterprise data collection, transport, and complex in-stream processing cloud service. Similar to Hadoop, which provides batch ETL and large scale batch analytical processing, DDS also provides real-time ETL and large scale real-time processing.
Flipboard is a single place to explore, collect and share news that interests you. Flipboard uses storm for a wide range of services like content search, real-time analytics, custom magazine feeds, etc. Apache Storm is integrated with the infrastructure that includes systems like ElasticSearch, Hadoop, HBase and HDFS, to create highly scalable data platform.
Ooyala is a venture-backed, privately held company that provides online video technology products and services for some of the world’s largest networks, brands and media companies. Ooyala has an analytics engine that processes over two billion analytics events each day, generated from nearly 200 million viewers worldwide who watch video on an Ooyala-powered player. Ooyala uses Apache Storm to provide their customers, rela-time streaming analytics on consumer viewing behaviour and digital content trends. Storm permits swift mining of their online video data sets to deliver current business intelligence like real-time pattern viewing, personalized content suggestions, programming guides and valuable insights on ways to increase revenue.
Taobao, with the help of Apache Storm, creates statistics of logs and extracts useful information from the statistics in real-time. Logs are read from persistent message queues into spouts, processed and then passed over to the topologies, to compute required outcomes. Taobao’s input log count varies anywhere between 2 million to 1.5 billion each day.
Klout is an application that uses social media analytics to rank its users bases on online social influence through “Klout Score”, which is a numerical value between 1 and 100. Klout uses Apache Storm’s in-built Trident abstraction to create complex topologies that stream data from network collectors via Kafka, then processed and written on to HDFS.
Wega is world’s comprehensive travel metasearch engine, operating worldwide and used by countless travelers to get more options to pay less and travel more. Wego compares and displays real-time flight schedules, hotel availability, price and displays other travel sites around the globe. Here, Apache Storm streams real-time metasearch data from affiliates to end-users. The topology concepts in Storm resolves concurrency issues and at the same time helps them to relentlessly integrate, dissect and clean the data. Additionally, the tools provided in Storm enables incremental update to enhance their data.
Rocket Fuel delivers a leading media-buying platform at Big Data scale that harnesses the power of artificial intelligence (AI) to expand marketing ROI in digital media. They are building a real-time platform on top of Storm, which imitates time critical work flows already existing in Hadoop-based ETL pipeline. This platform tracks impressions, clicks, conversions, bid requests etc. in real time.
Navsite is using Apache Storm as part of their server event log monitoring & auditing system. The log messages from thousands of servers are sent to RabbitMQ cluster and Storm is used to compare each message with a set of regular expressions. If there is a match, then the message is sent to a bolt that stores data in MongoDB. At the moment, 5-10k messages per second are being handled, however the existing RabbitMQ + Storm clusters have been tested up to about 50k per second.
There are many more organizations implementing Apache Storm and even more are expected to join this game, as Apache Storm is is continuing to be a leader in real-time analytics.
Check out our video and presentation on what Apache Storm is all about.