Most answered questions in Apache Spark

0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 23,807 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21 in Apache Spark by anonymous
30,955 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

Dec 10, 2018 in Apache Spark by Kuber
11,094 views
0 votes
6 answers

How to replace null values in Spark DataFrame?

Hi i hope this will help for ...READ MORE

Feb 5 in Apache Spark by Srinivasreddy
• 140 points
21,138 views
0 votes
5 answers

groupByKey vs reduceByKey in Apache Spark.

Below Images are self explainatry for reducebykey ...READ MORE

Apr 22 in Apache Spark by Gunjan Kumar
10,171 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

Dec 10, 2018 in Apache Spark by Vini
14,843 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,260 points
14,110 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,580 points
15,718 views
0 votes
3 answers

Can anyone explain fold() operation in Spark?

Fold in spark Fold is a very powerful ...READ MORE

Aug 22, 2018 in Apache Spark by samarth295
• 2,190 points
3,338 views
0 votes
3 answers

I don't understand the reason behind Spark RDD being immutable.

There are few reasons for keeping RDD ...READ MORE

Apr 18 in Apache Spark by santlal561987@gmail.com
2,569 views
+1 vote
3 answers

Which cluster type should I choose for Spark?

According to me, start with a standalone ...READ MORE

Jun 27, 2018 in Apache Spark by nitinrawat895
• 10,690 points
141 views
0 votes
3 answers

Lineage Graph in Spark

Whenever a series of transformations are performed ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,580 points
1,904 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

Dec 31, 2018 in Apache Spark by anonymous
6,017 views
0 votes
2 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17 in Apache Spark by vishal
• 160 points
2,123 views
0 votes
2 answers

Which cluster type should I choose for Spark?

Spark is agnostic to the underlying cluster ...READ MORE

Aug 21, 2018 in Apache Spark by zombie
• 3,690 points
134 views
0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

Aug 16, 2018 in Apache Spark by zombie
• 3,690 points
505 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
1,445 views
0 votes
2 answers

Sorting rows in descending order in Spark SQL

df.orderBy(org.apache.spark.sql.functions.col("columnname").desc) READ MORE

Jan 8 in Apache Spark by Ram Reddymasi
7,691 views
0 votes
2 answers

Parquet Files Advantages

Parquet is a columnar format supported by ...READ MORE

Jul 3, 2018 in Apache Spark by zombie
• 3,690 points
331 views
0 votes
2 answers

map() and flatmap()

map(): Return a new distributed dataset formed by ...READ MORE

Jul 3, 2018 in Apache Spark by zombie
• 3,690 points
84 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

Jul 4 in Apache Spark by Dhara dhruve
1,031 views
+1 vote
2 answers

Apache Spark vs Apache Spark 2

Spark 2 doesn't differ much architecture-wise from ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,260 points
3,590 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
1,849 views
0 votes
1 answer

How to convert a json file structure with values in single quotes to quoteless ?

You can do this by turning off ...READ MORE

Oct 4 in Apache Spark by Jisha
63 views
0 votes
1 answer

Primary keys in Apache Spark

I found the following solution to be ...READ MORE

Sep 11 in Apache Spark by ravikiran
• 4,560 points
51 views
0 votes
1 answer

How do I connect to a HIVE Meta store through a program in SparkSQL?

In spark 2.0.+ it should look something ...READ MORE

Sep 5 in Apache Spark by ravikiran
• 4,560 points
55 views
+1 vote
1 answer

How to convert JSON file to AVRO file and vise versa

Try including the package while starting the ...READ MORE

Aug 26 in Apache Spark by Karan
64 views
0 votes
1 answer

How to extract record from one RDD using another RDD

Hey, you can use "contains" filter to extract ...READ MORE

Aug 23 in Apache Spark by Karan
48 views
0 votes
1 answer

Spark: Can we add column to dataframe?

Yes we can add a column using withColumn with ...READ MORE

Aug 9 in Apache Spark by Shirish
46 views
0 votes
1 answer

Monitoring Spark application

Spark-submit jobs are also run from client/edge ...READ MORE

Aug 9 in Apache Spark by Umesh
36 views
0 votes
1 answer

Primary keys in Apache Spark

import sqlContext.implicits._ import org.apache.spark.sql.Row import org.apache.spark.sql.types.{StructType, StructField, LongType} val df ...READ MORE

Aug 9 in Apache Spark by ravikiran
• 4,560 points
65 views
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,340 points
150 views
0 votes
1 answer

How to start spark history server?

Hi, You can use this command to start ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,340 points
36 views
0 votes
1 answer

What is Spark UI and how to monitor a spark job?

Hey, Jobs- to view all the spark jobs Stages- ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,340 points
56 views
0 votes
1 answer

How to handle data shuffle in Spark

Hi, You can do it using map partition ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,340 points
49 views
0 votes
1 answer

How Foreach Operation works in Apache Spark?

Hi, foreach() operation is an action. It does not ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
39 views
0 votes
1 answer

How SortBykey() operation works in Spark?

Hey, sortByKey() is a transformation. It returns an RDD sorted ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
78 views
0 votes
1 answer

In how many modes Apache spark can run?

Hey, You can launch spark application in four ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
26 views
0 votes
1 answer

How to launch spark application in cluster mode in Spark?

Hi, To launch spark application in cluster mode, ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
40 views
0 votes
1 answer

How to create paired RDD using subString method in Spark?

Hi, If you have a file with id ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
50 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
565 views
0 votes
1 answer

By default how many partitions are created in RDD in Apache spark?

Well, it depends on the block of ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
48 views
0 votes
1 answer

Join in RDD using keys

Suppose you have two dataset results( id, ...READ MORE

Aug 2 in Apache Spark by Trisha
33 views
0 votes
1 answer

Scala: save filtered data row by row using saveAsTextFile

Try this code, it worked for me: val ...READ MORE

Aug 2 in Apache Spark by Karan
40 views
0 votes
1 answer

What is Hive on Spark?

Hi, Hive contains significant support for Apache Spark, ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
22 views
0 votes
1 answer

Can anyone explain the sparse vector in Spark?

Hey, A sparse vector is used for storing ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
56 views
0 votes
1 answer

Scala - Error in Inheritance: <console>:: error: not found: value

You need to declare the variable which ...READ MORE

Aug 1 in Apache Spark by Karan
100 views
0 votes
1 answer

Pyspark dataframe with random values

Hey @Esha, you can use this code. ...READ MORE

Aug 1 in Apache Spark by Zed
166 views
0 votes
1 answer

Spark + Hive connectivity

The problem is probably with the command. ...READ MORE

Aug 1 in Apache Spark by Rishni
40 views
0 votes
1 answer

Scala: Convert text file data into ORC format using data frame

Converting text file to Orc: Using Spark, the ...READ MORE

Aug 1 in Apache Spark by Esha
127 views