Most viewed questions in Apache Spark

+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21 in Apache Spark by anonymous
36,414 views
0 votes
1 answer

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3, ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,350 points
35,242 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 30,280 views
0 votes
6 answers

How to replace null values in Spark DataFrame?

Hi i hope this will help for ...READ MORE

Feb 5 in Apache Spark by Srinivasreddy
• 140 points
26,253 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,580 points
18,221 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

Dec 10, 2018 in Apache Spark by Vini
18,017 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,280 points
17,111 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

Dec 10, 2018 in Apache Spark by Kuber
14,364 views
0 votes
5 answers

groupByKey vs reduceByKey in Apache Spark.

Below Images are self explainatry for reducebykey ...READ MORE

Apr 22 in Apache Spark by Gunjan Kumar
13,854 views
0 votes
2 answers

Sorting rows in descending order in Spark SQL

df.orderBy(org.apache.spark.sql.functions.col("columnname").desc) READ MORE

Jan 8 in Apache Spark by Ram Reddymasi
9,278 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

May 23, 2018 in Apache Spark by nitinrawat895
• 10,800 points
8,509 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

Dec 31, 2018 in Apache Spark by anonymous
7,320 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
5,230 views
0 votes
2 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17 in Apache Spark by vishal
• 160 points
4,974 views
0 votes
1 answer

How to find the number of elements present in the array in a Spark DataFame column?

You can select the column and apply ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,350 points
4,506 views
+1 vote
2 answers

Apache Spark vs Apache Spark 2

Spark 2 doesn't differ much architecture-wise from ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,280 points
4,145 views
0 votes
3 answers

Can anyone explain fold() operation in Spark?

Fold in spark Fold is a very powerful ...READ MORE

Aug 22, 2018 in Apache Spark by samarth295
• 2,190 points
4,119 views
0 votes
1 answer

Spark - repartition() vs coalesce()

It avoids a full shuffle. If it's ...READ MORE

Oct 11, 2018 in Apache Spark by nitinrawat895
• 10,800 points
3,949 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve "`id`" given input columns

I have used a header-less csv file ...READ MORE

Jul 13 in Apache Spark by Puneet
3,539 views
0 votes
1 answer

map vs mapValues in Spark

There is a difference between the two: mapValues ...READ MORE

Jun 29, 2018 in Apache Spark by nitinrawat895
• 10,800 points
3,414 views
0 votes
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24 in Apache Spark by Suri
3,312 views
0 votes
3 answers

I don't understand the reason behind Spark RDD being immutable.

There are few reasons for keeping RDD ...READ MORE

Apr 18 in Apache Spark by santlal561987@gmail.com
2,993 views
0 votes
1 answer

Is it better to have one large parquet file or lots of smaller parquet files?

Ideally, you would use snappy compression (default) ...READ MORE

May 23, 2018 in Apache Spark by nitinrawat895
• 10,800 points
2,876 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,350 points
2,821 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

May 25, 2018 in Apache Spark by nitinrawat895
• 10,800 points
2,743 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,360 points
2,689 views
0 votes
1 answer

Difference between createOrReplaceTempView and registerTempTable

createOrReplaceTempView() creates/replaces a local temp view with the dataframe provided. Lifetime of this ...READ MORE

Apr 25, 2018 in Apache Spark by kurt_cobain
• 9,280 points
2,609 views
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

Jul 9, 2018 in Apache Spark by zombie
• 3,710 points
2,500 views
0 votes
1 answer

How can I remove headers from dataframe?

You can use filter to do this. ...READ MORE

Feb 14 in Apache Spark by Aryan
2,418 views
0 votes
3 answers

Lineage Graph in Spark

Whenever a series of transformations are performed ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,580 points
2,388 views
0 votes
1 answer

What is the difference between Apache Spark SQLContext vs HiveContext?

Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE

May 25, 2018 in Apache Spark by nitinrawat895
• 10,800 points
2,217 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,280 points
2,107 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

Jun 19, 2018 in Apache Spark by nitinrawat895
• 10,800 points
1,996 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

The string Productivity has to be enclosed between single ...READ MORE

Jul 10 in Apache Spark by Tina
1,982 views
0 votes
1 answer

Difference between sparkContext, JavaSparkContext, SQLContext, & SparkSession?

Yes, there is a difference between the ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,800 points
1,794 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,350 points
1,769 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 10,800 points
1,674 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,800 points
1,631 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,800 points
1,622 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,280 points
1,508 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,280 points
1,470 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

Jul 4 in Apache Spark by Dhara dhruve
1,364 views
0 votes
1 answer

Difference between cogroup and full outer join in spark

Please go through the below explanation : Full ...READ MORE

Jul 13 in Apache Spark by Kiran
1,323 views
0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 10,800 points
1,310 views
0 votes
1 answer

Cache() vs persist() in Spark

The cache() is used only the default storage level ...READ MORE

Mar 8 in Apache Spark by Raj
1,229 views
0 votes
1 answer

How to work with Matrix Multiplication in Apache Spark?

Hey, You can follow this below solution for ...READ MORE

Jul 31 in Apache Spark by Gitika
• 25,420 points
1,084 views
0 votes
1 answer

what are the spark real time issues ?

Some of the issues I have faced ...READ MORE

Mar 18 in Apache Spark by Sharman
986 views
0 votes
0 answers

Why doesn't my Spark Yarn client runs on all available worker machines?

I am running an application on Spark ...READ MORE

Feb 22 in Apache Spark by Uzair Ahmad

edited Feb 22 by Omkar 956 views
0 votes
1 answer

cache tables in apache spark sql

Caching the tables puts the whole table ...READ MORE

May 4, 2018 in Apache Spark by Data_Nerd
• 2,360 points
942 views
0 votes
1 answer