The surge in Big Data solutions has spawned the need for applications that focus on Data Analytics. SAP-developed High-Performance Analytic Appliance or HANA, is widely known as a memory column-oriented relational database management system made for real time analytics. HANA’s capabilities include Operational Reporting, Data Warehousing & Predictive/Text Analysis of Big Data.
Hadoop which is widely an open-source technology supports the analysis and processing of large sets of structured/semi-structured/unstructured data. The demand for a platform that can manage volumes of varied data is the need of the hour with SAP HANA being seen as an ideal platform.
SAP HANA – Origin and History
SAP HANA originates from developed or acquired technologies, including:
- TREX search engine: An in-memory column-oriented search engine
- P*TIME: An in-memory OLTP database acquired by SAP in 2005
- MaxDB: An in-memory live Cache engine
In 2008, teams from SAP AG working with HassoPlattner Institute and Stanford University demonstrated an application architecture for real-time analytics and aggregation, mentioned as ‘Hasso’s New Architecture.’ Before the name HANA settled in, people referred to this product as ‘New Database.’ The product was officially announced in May 2010. In November 2010, SAP AG announced the release of SAP HANA 1.0, an in-memory appliance for business applications and business Intelligence allowing real-time response.
Why Combine SAP HANA with Apache Hadoop?
The SAP HANA platform for Big Data, along with its analytics database, data services, event stream processing software and Apache Hadoop can help organizations acquire and harness Big Data at the speed of business:
- Gain fast, meaningful insight and run processes that are 10,000 to 100,000 times faster in memory
- Turn large volumes of data into insight more effectively
- Harness insight and mine massive volumes of data to find nuggets of relevant information
- Analyze streaming data in real time and store significant events for deeper analysis
- Virtualize access to data across different data stores, and gain insight without moving data
- Extract, transform, and load data across a variety of stores for a complete view of enterprise data
Future Plans of SAP
SAP has also announced SAP Real-Time Data Platform, which combines SAP HANA with SAP Sybase IQ and other SAP technologies as well as with non-SAP technologies, especially Hadoop, which is the focus of this post. SAP Real-Time Data Platform can be used for both analytics and online transaction processing (OLTP). When used alone, each technology delivers business value. When used together, however, they can combine, analyze, and process all the data a business has, providing deeper insights into the business and opening up new business opportunities.
To achieve the best balance of data technologies to solve its business problems, a business must take into account many factors. Besides the cost of hardware and software, it must consider development tools, the operational costs associated with meeting its own service levels, and how it will fulfill its policies concerning security, high availability, secure backup, and recovery.
Using Hadoop Along With SAP Solutions and Technology
- There are some major differences between these technologies. At a high level, Hadoop uses commodity servers to handle data sizes in the petabyte and potentially the exabyte2 range, which is much higher than the 100 TB range (or less) that SAP HANA and conventional relational database management systems (RDBMS) typically handle.
- On the other hand, current versions of Hadoop are significantly slower than a conventional RDBMS, as well as much slower than SAP HANA, taking minutes or hours to provide analytic results. However, these versions can handle arbitrary data structures more easily and usually at much lower hardware storage costs per terabyte.
Contrasts Between RDBMS, SAP HANA/ In-Memory and Hadoop
Choosing the data technology to use in an OLTP or analytical solution requires understanding the differences between the choices. The table below highlights the main differences between a conventional RDBMS and in-memory database specifically SAP HANA and Hadoop.
Note: The table is not product specific, but a generalization:
Technology is undergoing rapid innovation and the details shown in the table above are sure to change. However, it is worth noting that the following key differentiating characteristics of each database type will likely continue to hold true in the future:
- RDBMS will continue to be an acceptable solution for many problems, especially for straightforward OLTP, where time constraint is not of importance.
- SAP HANA is best suited when speed is important. For example, for real-time data updates and analytics, but the volume of data is not excessively large and the cost is justifiable given the business need.
Hadoop is better suited when the data volume is very large. The type of data is difficult for other database technologies to store (for example, unstructured text), the processing speed gets reduced to a great extent as it works on distributed file system.
Choosing the Best Data Technology
When deciding on the best balance of technologies to manage your business challenges, you will have to make some trade-offs.
Hadoop software is an open-source software; has no licensing fee, and can run on low-cost commodity servers. However, the total cost of running a Hadoop cluster can be significant when you consider the hundreds or potentially thousands of servers that will need to be managed. In achieving the best balance, one must consider that the relative performance and costs of the different components are also changing.
For example, the cost of memory is steadily decreasing – it is also getting faster. As a result, the cost of hardware required to store a terabyte of data in memory will probably also decrease, which might make SAP HANA a better technology to use for a specific situation. Moreover, if your application requires real-time analysis, then an in-memory computing technology, specifically SAP HANA, is likely to be the only one that will meet the need.
- HANA and Hadoop are very good friends.
- HANA is a great place to store high-value, often used data, and Hadoop is a great place to persist information for archival and retrieval in new ways, especially information which you don’t want to structure in advance, like web logs or other large information sources.
- Holding this information in an in-memory database has relatively little value.
- You can connect HANA into Hadoop and run batch jobs in Hadoop to load more information into HANA, which you can then perform super-fast aggregations within HANA. This is a very co-operative existence.
According to Gartner, Big Data is one of the top technology trends impacting information infrastructure in 2014 and SAP HANA is already helping businesses to unlock this information by addressing one very important aspect of Big Data – fast access to and real-time analytics of very large data sets – that allows managers and executives to understand their business at the speed of thought. The net result is that by putting SAP HANA and Hadoop together you have the potential to handle really big data, really fast.
Got a question for us? Mention them in the comments section and we will get back to you.