Most viewed questions in Apache Spark

+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21 in Apache Spark by anonymous
30,792 views
0 votes
1 answer

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3, ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,300 points
27,785 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 23,627 views
0 votes
6 answers

How to replace null values in Spark DataFrame?

Hi i hope this will help for ...READ MORE

Feb 5 in Apache Spark by Srinivasreddy
• 140 points
20,986 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,580 points
15,608 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

Dec 10, 2018 in Apache Spark by Vini
14,752 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,240 points
14,029 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

Dec 10, 2018 in Apache Spark by Kuber
11,043 views
0 votes
5 answers

groupByKey vs reduceByKey in Apache Spark.

Below Images are self explainatry for reducebykey ...READ MORE

Apr 22 in Apache Spark by Gunjan Kumar
10,092 views
0 votes
2 answers

Sorting rows in descending order in Spark SQL

df.orderBy(org.apache.spark.sql.functions.col("columnname").desc) READ MORE

Jan 8 in Apache Spark by Ram Reddymasi
7,650 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

May 23, 2018 in Apache Spark by nitinrawat895
• 10,670 points
6,659 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

Dec 31, 2018 in Apache Spark by anonymous
5,985 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
4,593 views
+1 vote
2 answers

Apache Spark vs Apache Spark 2

Spark 2 doesn't differ much architecture-wise from ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,240 points
3,581 views
0 votes
1 answer

Spark - repartition() vs coalesce()

It avoids a full shuffle. If it's ...READ MORE

Oct 11, 2018 in Apache Spark by nitinrawat895
• 10,670 points
3,373 views
0 votes
3 answers

Can anyone explain fold() operation in Spark?

Fold in spark Fold is a very powerful ...READ MORE

Aug 22, 2018 in Apache Spark by samarth295
• 2,190 points
3,322 views
0 votes
1 answer

How to find the number of elements present in the array in a Spark DataFame column?

You can select the column and apply ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,300 points
3,302 views
0 votes
1 answer

map vs mapValues in Spark

There is a difference between the two: mapValues ...READ MORE

Jun 29, 2018 in Apache Spark by nitinrawat895
• 10,670 points
2,729 views
0 votes
3 answers

I don't understand the reason behind Spark RDD being immutable.

There are few reasons for keeping RDD ...READ MORE

Apr 18 in Apache Spark by santlal561987@gmail.com
2,561 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

May 25, 2018 in Apache Spark by nitinrawat895
• 10,670 points
2,415 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,360 points
2,337 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,300 points
2,261 views
0 votes
1 answer

Is it better to have one large parquet file or lots of smaller parquet files?

Ideally, you would use snappy compression (default) ...READ MORE

May 23, 2018 in Apache Spark by nitinrawat895
• 10,670 points
2,179 views
0 votes
2 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17 in Apache Spark by vishal
• 160 points
2,089 views
0 votes
1 answer

What is the difference between Apache Spark SQLContext vs HiveContext?

Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE

May 25, 2018 in Apache Spark by nitinrawat895
• 10,670 points
2,020 views
0 votes
1 answer

Difference between createOrReplaceTempView and registerTempTable

createOrReplaceTempView() creates/replaces a local temp view with the dataframe provided. Lifetime of this ...READ MORE

Apr 25, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,945 views
0 votes
3 answers

Lineage Graph in Spark

Whenever a series of transformations are performed ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,580 points
1,895 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,841 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve "`id`" given input columns

I have used a header-less csv file ...READ MORE

Jul 13 in Apache Spark by Puneet
1,706 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

Jun 19, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,611 views
0 votes
1 answer

Difference between sparkContext, JavaSparkContext, SQLContext, & SparkSession?

Yes, there is a difference between the ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,610 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,447 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
1,427 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,300 points
1,369 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,347 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,317 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,299 views
0 votes
1 answer

How can I remove headers from dataframe?

You can use filter to do this. ...READ MORE

Feb 14 in Apache Spark by Aryan
1,204 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,184 views
0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,165 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

Jul 4 in Apache Spark by Dhara dhruve
1,023 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

The string Productivity has to be enclosed between single ...READ MORE

Jul 10 in Apache Spark by Tina
959 views
0 votes
1 answer

cache tables in apache spark sql

Caching the tables puts the whole table ...READ MORE

May 4, 2018 in Apache Spark by Data_Nerd
• 2,360 points
774 views
0 votes
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24 in Apache Spark by Suri
768 views
0 votes
1 answer

Query regarding Appending " to a string in Scala

You can perform this task in two ...READ MORE

Jul 10 in Apache Spark by Esha
711 views
0 votes
1 answer

Query regarding Operator Overloading in Scala

All prefix operators' symbols are predefined: +, -, ...READ MORE

Jul 10 in Apache Spark by Karan
688 views
0 votes
1 answer

what are the spark real time issues ?

Some of the issues I have faced ...READ MORE

Mar 18 in Apache Spark by Sharman
676 views
0 votes
1 answer

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

I guess you need provide this kafka.bootstrap.servers ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,300 points
626 views
0 votes
0 answers

Why doesn't my Spark Yarn client runs on all available worker machines?

I am running an application on Spark ...READ MORE

Feb 22 in Apache Spark by Uzair Ahmad

edited Feb 22 by Omkar 612 views
0 votes
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,300 points
602 views