BigData in MPI

0 votes

Please let me know what is MPI in bigdata. 

Feb 13 in Big Data Hadoop by Ganesh
76 views

1 answer to this question.

0 votes

MPI is a communication protocol for programming parallel computers. MPI's goals are high performance, scalability, and portability. The MPI interface is meant to provide essential virtual topology, synchronization, and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances).MPI has been widely used in High-Performance Computing. In contrast, such efficient communication support is lacking in the field of Big Data Computing, where communication is realized by time-consuming techniques such as HTTP/RPC. This paper takes a step in bridging these two fields by extending MPI to support Hadoop-like Big Data Computing jobs, where processing and communication of a large number of key-value pair instances are needed through distributed computation models such as MapReduce, Iteration, and Streaming. We abstract the characteristics of key-value communication patterns into a bipartite communication model, which reveals four distinctions from MPI: Dichotomic, Dynamic, Data-centric, and Diversified features. Utilizing this model, we propose the specification of a minimalistic extension to MPI. An open source communication library, DataMPI, is developed to implement this specification. Performance experiments show that DataMPI has significant advantages in performance and flexibility while maintaining high productivity, scalability, and fault tolerance of Hadoop.

answered Feb 13 by Disha

Related Questions In Big Data Hadoop

0 votes
0 answers

why do we need MaPReduce in BigData Hadoop?

What is the need of MapReduce in ...READ MORE

Jun 3 in Big Data Hadoop by Rajnish
36 views
0 votes
0 answers

How to run Hadoop in Docker containers?

I want to incorporate Hadoop in Docker ...READ MORE

Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 10,730 points
95 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,730 points
353 views
0 votes
1 answer

Is Kafka and Zookeeper are required in a Big Data Cluster?

Apache Kafka is one of the components ...READ MORE

answered Mar 22, 2018 in Big Data Hadoop by nitinrawat895
• 10,730 points
445 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
3,371 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
406 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,753 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,225 views
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
6,117 views
0 votes
5 answers