Trending questions in Apache Spark

0 votes
0 answers

One Hot Encoding in Apache Spark

The following code that I wrote for ...READ MORE

Feb 11, 2020 in Apache Spark by Manish
• 120 points
1,581 views
0 votes
1 answer
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

Feb 6, 2020 in Apache Spark by MD
• 95,140 points
1,102 views
0 votes
1 answer

What is the difference between spark streaming and spark structured streaming?

Hi@akhtar Generally, Spark streaming  is used for real time ...READ MORE

Feb 4, 2020 in Apache Spark by MD
• 95,140 points
1,040 views
0 votes
1 answer

Why do we use sc.parallelize?

Spark revolves around the concept of a ...READ MORE

Jul 11, 2019 in Apache Spark by Suman
9,722 views
0 votes
1 answer

Caused by: java.lang.NumberFormatException: Empty String

Hi@akhtar, As we know text files are in ...READ MORE

Jan 31, 2020 in Apache Spark by MD
• 95,140 points
862 views
+1 vote
0 answers

How to create a list of RDDs(or RDD of RDDs, if possible) from a single JavaRDD<List<Integers>> in Java?

Hi, I have the input RDD as a ...READ MORE

Jan 11, 2020 in Apache Spark by itsroops
• 130 points
1,638 views
0 votes
1 answer

Does spark streaming provides checkpoint?

Hi@akhtar, Yes, Spark streaming uses checkpoint. Checkpoint is ...READ MORE

Feb 4, 2020 in Apache Spark by MD
• 95,140 points
300 views
0 votes
0 answers

not able to get output in spark streaming??

Hi everyone, I tried to count individual words ...READ MORE

Feb 4, 2020 in Apache Spark by akhtar
• 38,180 points
215 views
0 votes
1 answer

Is Spark Sql provides indexing to improve processing speed?

Hi@akhtar, There is no concept of indexing in ...READ MORE

Feb 4, 2020 in Apache Spark by MD
• 95,140 points
154 views
0 votes
1 answer

What are Dstreams?

Hi@akhtar, Dstreams are the basic abstraction that is ...READ MORE

Feb 4, 2020 in Apache Spark by MD
• 95,140 points
149 views
0 votes
1 answer

Cannot create directory /hive/xzxz/_temporary/0. Name node is in safe mode.

Hi@akhtar, Here you are trying to save csv ...READ MORE

Feb 3, 2020 in Apache Spark by MD
• 95,140 points
158 views
0 votes
0 answers

Error: Package: R-core-devel-3.6.0-1el7.x86_64 (epel) Requires: pcre2-devel

Hi, I am getting this error when try ...READ MORE

Jan 31, 2020 in Apache Spark by Hasid
• 370 points
228 views
0 votes
2 answers

How to execute a function in apache-scala?

Function Definition : def test():Unit{ var a=10 var b=20 var c=a+b } calling ...READ MORE

Aug 5, 2020 in Apache Spark by Ramkumar Ramasamy
191 views
+2 votes
1 answer

Spark code takes too much time to run on cluster

Hi @asif, Share with us please the application ...READ MORE

Jan 21, 2020 in Apache Spark by Alexandru
• 510 points
289 views
+1 vote
0 answers

how to access hive view using spark2

We do not have access to hive ...READ MORE

Dec 29, 2019 in Apache Spark by anonymous
• 130 points
897 views
0 votes
2 answers

Difference between createOrReplaceTempView and registerTempTable

I am pretty sure createOrReplaceTempView just replaced ...READ MORE

Sep 18, 2020 in Apache Spark by Nathan Mott
7,811 views
+2 votes
1 answer

Type mismatch error in scala

Hello, Your problem is here: val df_merge_final = df_merge .withColumn("version_key", ...READ MORE

Dec 13, 2019 in Apache Spark by Alexandru
• 510 points
3,267 views
0 votes
1 answer

Cannot load file to spark: "org.apache.spark.sql.AnalysisException: Path does not exist"

Since the file is in HDFS so ...READ MORE

Jul 31, 2019 in Apache Spark by Tina
6,434 views
0 votes
1 answer

Spark: java.sql.SQLException: No suitable driver

The missing driver is the JDBC one ...READ MORE

Jul 24, 2019 in Apache Spark by John
6,485 views
0 votes
1 answer

How can I remove headers from dataframe?

You can use filter to do this. ...READ MORE

Feb 14, 2019 in Apache Spark by Aryan
13,163 views
0 votes
1 answer

Difference between cogroup and full outer join in spark

Please go through the below explanation : Full ...READ MORE

Jul 13, 2019 in Apache Spark by Kiran
6,356 views
+1 vote
1 answer

Cannot resolve Error In Spark when filter records with two where condition

Try df.where($"cola".isNotNull && $"cola" =!= "" && !$"colb".isin(2,3)) your ...READ MORE

Dec 13, 2019 in Apache Spark by Alexandru
• 510 points

edited Dec 13, 2019 by Alexandru 1,065 views
0 votes
1 answer

Pyspark dataframe with random values

Hey @Esha, you can use this code. ...READ MORE

Aug 1, 2019 in Apache Spark by Zed
5,149 views
+1 vote
2 answers

Spark: Can we add column to dataframe?

Yes we can add columns to the ...READ MORE

Oct 24, 2019 in Apache Spark by Siva
• 160 points
3,040 views
+1 vote
2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

Aug 7, 2019 in Apache Spark by ashish
2,521 views
+1 vote
1 answer

How to convert a json file structure with values in single quotes to quoteless ?

You can do this by turning off ...READ MORE

Oct 4, 2019 in Apache Spark by Jisha
1,885 views
+1 vote
2 answers

sparkstream.textfilstreaming(localpathdirectory). I am getting empty results

Hey @c.kothamasu You should copy your file to ...READ MORE

Nov 7, 2019 in Apache Spark by Manas
169 views
0 votes
1 answer

How to create RDD from existing RDD in scala?

scala> val rdd1 = sc.parallelize(List(1,2,3,4,5))                           -  Creating ...READ MORE

Feb 28, 2020 in Apache Spark by anonymous
382 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

Aug 2, 2019 in Apache Spark by Gitika
• 65,870 points
4,148 views
0 votes
1 answer

Date formats : how to cast string to date?

Try this, it should work: > from pyspark.sql.functions ...READ MORE

Jul 29, 2019 in Apache Spark by Niall
4,313 views
0 votes
1 answer

How to work with Matrix Multiplication in Apache Spark?

Hey, You can follow this below solution for ...READ MORE

Jul 31, 2019 in Apache Spark by Gitika
• 65,870 points
4,230 views
0 votes
1 answer

Spark, Scala: Load custom delimited file

You can load a DAT file into ...READ MORE

Jul 16, 2019 in Apache Spark by Shri
4,841 views
+1 vote
1 answer

Spark: java.io.FileNotFoundException

Hello, From the error I get that the ...READ MORE

Dec 13, 2019 in Apache Spark by Alexandru
• 510 points
1,502 views
0 votes
1 answer

How SortBykey() operation works in Spark?

Hey, sortByKey() is a transformation. It returns an RDD sorted ...READ MORE

Aug 2, 2019 in Apache Spark by Gitika
• 65,870 points
3,986 views
0 votes
1 answer

Join in RDD using keys

Suppose you have two dataset results( id, ...READ MORE

Aug 2, 2019 in Apache Spark by Trisha
3,899 views
0 votes
1 answer

Spark: Error while instantiating "org.apache.spark.sql.hive.HiveSessionState"

Seems like you have not started the ...READ MORE

Jul 25, 2019 in Apache Spark by Rohit
4,224 views
0 votes
1 answer

How do I connect to a HIVE Meta store through a program in SparkSQL?

In spark 2.0.+ it should look something ...READ MORE

Sep 5, 2019 in Apache Spark by ravikiran
• 4,620 points
2,330 views
0 votes
1 answer

What does the command df.registerTempTable() do?

df.registerTempTable(“airports”) This command is used to register ...READ MORE

Jul 14, 2019 in Apache Spark by James
4,525 views
0 votes
1 answer

What is the use of App class in Scala?

Hi, Scala provides a helper class, called App, that ...READ MORE

Jul 31, 2019 in Apache Spark by Gitika
• 65,870 points
3,592 views
+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

Aug 6, 2019 in Apache Spark by Gitika
• 65,870 points
3,211 views
0 votes
1 answer

Removing the header of a text file in SparkRDD

1) First we loaded the data to ...READ MORE

Jul 31, 2019 in Apache Spark by Namitha
3,517 views
+1 vote
2 answers

What is sparkContext?

SparkContext sets up internal services and establishes ...READ MORE

Dec 5, 2019 in Apache Spark by anonymous
1,044 views
0 votes
1 answer

"main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

1. We will check whether master and ...READ MORE

Jul 29, 2019 in Apache Spark by Yogi
3,312 views
0 votes
1 answer

How to launch spark application in cluster mode in Spark?

Hi, To launch spark application in cluster mode, ...READ MORE

Aug 2, 2019 in Apache Spark by Gitika
• 65,870 points
3,103 views
0 votes
1 answer

What is Action in Spark?

Hi, Actions are RDD’s operation, that value returns ...READ MORE

Jul 3, 2019 in Apache Spark by Gitika
• 65,870 points
4,253 views
0 votes
1 answer

Spark error: Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable.

Give  read-write permissions to  C:\tmp\hive folder Cd to winutils bin folder ...READ MORE

Jul 11, 2019 in Apache Spark by Rajiv
3,890 views
+1 vote
1 answer

Primary keys in Apache Spark

import sqlContext.implicits._ import org.apache.spark.sql.Row import org.apache.spark.sql.types.{StructType, StructField, LongType} val df ...READ MORE

Aug 9, 2019 in Apache Spark by ravikiran
• 4,620 points
2,596 views