Most voted questions in Apache Spark

+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21, 2019 in Apache Spark by anonymous
62,277 views
+2 votes
2 answers

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Using findspark is expected to solve the ...READ MORE

Jun 20, 2020 in Apache Spark by suvasish
7,513 views
+2 votes
1 answer

Spark code takes too much time to run on cluster

Hi @asif, Share with us please the application ...READ MORE

Jan 21, 2020 in Apache Spark by Alexandru
• 510 points
289 views
+2 votes
1 answer

Type mismatch error in scala

Hello, Your problem is here: val df_merge_final = df_merge .withColumn("version_key", ...READ MORE

Dec 13, 2019 in Apache Spark by Alexandru
• 510 points
3,267 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 68,054 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,390 points
33,289 views
+1 vote
1 answer

How to write Spark DataFrame to Avro Data File?

Hi@akhtar, Since Avro library is external to Spark, ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,140 points
499 views
+1 vote
1 answer

How to read .mp4 (video file) stored at HDFS using pyspark?

Hi@Amey, You can enable WebHDFS to do this ...READ MORE

May 29, 2020 in Apache Spark by MD
• 95,140 points
436 views
+1 vote
1 answer

Optimal column count for ORC and Parquet

Hi@Amey, It depends on your use case. Both ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,140 points
543 views
+1 vote
1 answer

How to convert pyspark Dataframe to pandas Dataframe?

Hi@akhtar, To convert pyspark dataframe into pandas dataframe, ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,140 points
4,934 views
+1 vote
1 answer

is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [51, 53, 10, 10]

Hi@akhtar, Here you are trying to read a ...READ MORE

Feb 3, 2020 in Apache Spark by MD
• 95,140 points
5,712 views
+1 vote
0 answers

How to create a list of RDDs(or RDD of RDDs, if possible) from a single JavaRDD<List<Integers>> in Java?

Hi, I have the input RDD as a ...READ MORE

Jan 11, 2020 in Apache Spark by itsroops
• 130 points
1,638 views
+1 vote
1 answer

How to assign a column in Spark Dataframe (PySpark) as a Primary Key?

spark do not have any concept of ...READ MORE

Jan 12, 2020 in Apache Spark by Sirish
• 160 points
3,072 views
+1 vote
0 answers

how to access hive view using spark2

We do not have access to hive ...READ MORE

Dec 29, 2019 in Apache Spark by anonymous
• 130 points
897 views
+1 vote
2 answers

sparkstream.textfilstreaming(localpathdirectory). I am getting empty results

Hey @c.kothamasu You should copy your file to ...READ MORE

Nov 7, 2019 in Apache Spark by Manas
169 views
+1 vote
1 answer

Is there any efficient way of dealing null values during concat functionality of pyspark.sql version 2.3.4?

When you concatenate any string with a ...READ MORE

Nov 6, 2019 in Apache Spark by Rishi
11,403 views
+1 vote
1 answer

How to convert a json file structure with values in single quotes to quoteless ?

You can do this by turning off ...READ MORE

Oct 4, 2019 in Apache Spark by Jisha
1,885 views
+1 vote
1 answer

Cannot resolve Error In Spark when filter records with two where condition

Try df.where($"cola".isNotNull && $"cola" =!= "" && !$"colb".isin(2,3)) your ...READ MORE

Dec 13, 2019 in Apache Spark by Alexandru
• 510 points

edited Dec 13, 2019 by Alexandru 1,065 views
+1 vote
0 answers

Difference Between rdd dataframe dataset [closed]

Sep 12, 2019 in Apache Spark by Rajesh pagadala

closed Sep 13, 2019 by Omkar 292 views
+1 vote
0 answers

What is the use case of map and flatMap? [closed]

What is the major use case for ...READ MORE

Aug 24, 2019 in Apache Spark by anonymous
• 130 points

closed Aug 26, 2019 by Omkar 596 views
+1 vote
1 answer

How to convert JSON file to AVRO file and vise versa

Try including the package while starting the ...READ MORE

Aug 26, 2019 in Apache Spark by Karan
1,110 views
+1 vote
1 answer

How to extract record from one RDD using another RDD

Hey, you can use "contains" filter to extract ...READ MORE

Aug 23, 2019 in Apache Spark by Karan
1,057 views
+1 vote
2 answers

Spark: Can we add column to dataframe?

Yes we can add columns to the ...READ MORE

Oct 24, 2019 in Apache Spark by Siva
• 160 points
3,041 views
+1 vote
1 answer

Primary keys in Apache Spark

import sqlContext.implicits._ import org.apache.spark.sql.Row import org.apache.spark.sql.types.{StructType, StructField, LongType} val df ...READ MORE

Aug 9, 2019 in Apache Spark by ravikiran
• 4,620 points
2,596 views
+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

Aug 6, 2019 in Apache Spark by Gitika
• 65,870 points
3,211 views
+1 vote
1 answer

By default how many partitions are created in RDD in Apache spark?

Well, it depends on the block of ...READ MORE

Aug 2, 2019 in Apache Spark by Gitika
• 65,870 points
1,302 views
+1 vote
1 answer

Scala: Convert text file data into ORC format using data frame

Converting text file to Orc: Using Spark, the ...READ MORE

Aug 1, 2019 in Apache Spark by Esha
1,590 views
+1 vote
2 answers

Spark: Dataframe vs Dataset

Recently, there are two new data abstractions ...READ MORE

Jul 29, 2019 in Apache Spark by Jackie
28,481 views
+1 vote
1 answer

Scala: CSV file to Save data into HBase

Check the reference code mentioned below: def main(args: ...READ MORE

Jul 25, 2019 in Apache Spark by Hari
682 views
+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24, 2019 in Apache Spark by Suri
20,305 views
+1 vote
1 answer

How to install Scala Build Tool (SBT) on ubuntu?

Hey, To install SBT on Ubuntu first you need ...READ MORE

Jul 23, 2019 in Apache Spark by Gitika
• 65,870 points
763 views
+1 vote
1 answer

Need to load 40 GB data to elasticsearch using spark

Did you find any documents or example ...READ MORE

Nov 5, 2019 in Apache Spark by Begum
589 views
+1 vote
1 answer

Spark: java.io.FileNotFoundException

Hello, From the error I get that the ...READ MORE

Dec 13, 2019 in Apache Spark by Alexandru
• 510 points
1,502 views
+1 vote
1 answer

How do I turn off INFO Logging in Spark?

Hi, You need to edit one property in ...READ MORE

Jul 12, 2019 in Apache Spark by ravikiran
• 4,620 points

edited Dec 20, 2020 by MD 2,827 views
+1 vote
1 answer

How to add package com.databricks.spark.avro in spark?

Start spark shell using below line of ...READ MORE

Jul 10, 2019 in Apache Spark by Jishnu
3,272 views
+1 vote
1 answer

Error: value textfile is not a member of org.apache.spark.SparkContext

Hi, Regarding this error, you just need to change ...READ MORE

Jul 4, 2019 in Apache Spark by Gitika
• 65,870 points
2,039 views
+1 vote
2 answers

What is sparkContext?

SparkContext sets up internal services and establishes ...READ MORE

Dec 5, 2019 in Apache Spark by anonymous
1,044 views
+1 vote
1 answer

What is reduce() action in Spark?

Hey, It takes a function that operates on two ...READ MORE

Jul 2, 2019 in Apache Spark by Gitika
• 65,870 points
2,868 views
+1 vote
1 answer

_spark_metadata/0 doesn't exist while Compacting batch 9 Structured streaming error

Please check https://kb.databricks.com/streaming/file-sink-str ...READ MORE

Nov 20, 2019 in Apache Spark by anonymous
1,112 views
+1 vote
2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

Aug 7, 2019 in Apache Spark by ashish
2,521 views
+1 vote
3 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17, 2019 in Apache Spark by vishal
• 180 points
25,799 views
+1 vote
1 answer

Facing out-of-memory errors in Spark driver

I am guessing that the configuration set ...READ MORE

Feb 22, 2019 in Apache Spark by Rishab
626 views
+1 vote
1 answer

Spark interview

Preparing for an interview? We have something ...READ MORE

Feb 7, 2019 in Apache Spark by Edureka
• 2,960 points
283 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,660 points
35,276 views
+1 vote
6 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

Mar 3, 2019 in Apache Spark by anonymous
43,794 views
+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,390 points
1,134 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

Dec 10, 2018 in Apache Spark by Akshay
42,769 views