Trending questions in Apache Spark

+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21 in Apache Spark by anonymous
34,226 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 27,690 views
0 votes
1 answer

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3, ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,310 points
32,054 views
0 votes
6 answers

How to replace null values in Spark DataFrame?

Hi i hope this will help for ...READ MORE

Feb 5 in Apache Spark by Srinivasreddy
• 140 points
23,996 views
0 votes
2 answers

sparkstream.textfilstreaming(localpathdirectory). I am getting empty results

Hey @c.kothamasu You should copy your file to ...READ MORE

Nov 7 in Apache Spark by Manas
37 views
0 votes
1 answer
0 votes
1 answer

How to convert a json file structure with values in single quotes to quoteless ?

You can do this by turning off ...READ MORE

Oct 4 in Apache Spark by Jisha
107 views
0 votes
0 answers

Cannot resolve Error In Spark when filter records with two where condition

SPARK 1.6, SCALA, MAVEN i have created a ...READ MORE

Sep 30 in Apache Spark by anonymous
• 120 points
55 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

Dec 10, 2018 in Apache Spark by Vini
16,717 views
0 votes
5 answers

groupByKey vs reduceByKey in Apache Spark.

Below Images are self explainatry for reducebykey ...READ MORE

Apr 22 in Apache Spark by Gunjan Kumar
12,277 views
0 votes
2 answers

Spark: Can we add column to dataframe?

Yes we can add columns to the ...READ MORE

Oct 24 in Apache Spark by Siva
• 140 points
65 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,580 points
17,273 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve "`id`" given input columns

I have used a header-less csv file ...READ MORE

Jul 13 in Apache Spark by Puneet
2,728 views
0 votes
0 answers

Difference Between rdd dataframe dataset [closed]

Sep 12 in Apache Spark by Rajesh pagadala

closed Sep 13 by Omkar 94 views
0 votes
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24 in Apache Spark by Suri
2,205 views
0 votes
1 answer

Need to load 40 GB data to elasticsearch using spark

Did you find any documents or example ...READ MORE

Nov 5 in Apache Spark by Begum
127 views
0 votes
1 answer

Primary keys in Apache Spark

I found the following solution to be ...READ MORE

Sep 11 in Apache Spark by ravikiran
• 4,560 points
80 views
0 votes
1 answer

How do I connect to a HIVE Meta store through a program in SparkSQL?

In spark 2.0.+ it should look something ...READ MORE

Sep 5 in Apache Spark by ravikiran
• 4,560 points
95 views
+1 vote
1 answer

_spark_metadata/0 doesn't exist while Compacting batch 9 Structured streaming error

Please check https://kb.databricks.com/streaming/file-sink-stre ...READ MORE

2 days ago in Apache Spark by anonymous
247 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,260 points
15,881 views
+1 vote
1 answer

How to convert JSON file to AVRO file and vise versa

Try including the package while starting the ...READ MORE

Aug 26 in Apache Spark by Karan
122 views
0 votes
0 answers

What is the use case of map and flatMap? [closed]

What is the major use case for ...READ MORE

Aug 24 in Apache Spark by anonymous
• 120 points

closed Aug 26 by Omkar 89 views
0 votes
1 answer

How to extract record from one RDD using another RDD

Hey, you can use "contains" filter to extract ...READ MORE

Aug 23 in Apache Spark by Karan
57 views
+1 vote
0 answers

Type mismatch error in scala

import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.functions.{col, ...READ MORE

Aug 16 in Apache Spark by anonymous
286 views
0 votes
1 answer

How to work with Matrix Multiplication in Apache Spark?

Hey, You can follow this below solution for ...READ MORE

Jul 31 in Apache Spark by Gitika
• 25,360 points
759 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,360 points
639 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

The string Productivity has to be enclosed between single ...READ MORE

Jul 10 in Apache Spark by Tina
1,541 views
0 votes
1 answer
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,360 points
337 views
0 votes
1 answer

Pyspark dataframe with random values

Hey @Esha, you can use this code. ...READ MORE

Aug 1 in Apache Spark by Zed
493 views
0 votes
1 answer

Primary keys in Apache Spark

import sqlContext.implicits._ import org.apache.spark.sql.Row import org.apache.spark.sql.types.{StructType, StructField, LongType} val df ...READ MORE

Aug 9 in Apache Spark by ravikiran
• 4,560 points
114 views
0 votes
1 answer

Monitoring Spark application

Spark-submit jobs are also run from client/edge ...READ MORE

Aug 9 in Apache Spark by Umesh
46 views
0 votes
1 answer

What is Spark UI and how to monitor a spark job?

Hey, Jobs- to view all the spark jobs Stages- ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,360 points
101 views
0 votes
1 answer

How to handle data shuffle in Spark

Hi, You can do it using map partition ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,360 points
72 views
0 votes
1 answer

How to start spark history server?

Hi, You can use this command to start ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,360 points
44 views
0 votes
1 answer

Scala: Convert text file data into ORC format using data frame

Converting text file to Orc: Using Spark, the ...READ MORE

Aug 1 in Apache Spark by Esha
246 views
0 votes
1 answer

How SortBykey() operation works in Spark?

Hey, sortByKey() is a transformation. It returns an RDD sorted ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,360 points
180 views
0 votes
1 answer

Scala - Error in Inheritance: <console>:: error: not found: value

You need to declare the variable which ...READ MORE

Aug 1 in Apache Spark by Karan
223 views
0 votes
1 answer

Spark Error: StackOverflowError : Exception in thread "main" java.lang.StackOverflowError at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply

Hey, It already has SparkContent.union and it does know how to ...READ MORE

Jul 31 in Apache Spark by Gitika
• 25,360 points
242 views
0 votes
1 answer

Can anyone explain the sparse vector in Spark?

Hey, A sparse vector is used for storing ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,360 points
130 views
0 votes
1 answer

Spark: java.sql.SQLException: No suitable driver

The missing driver is the JDBC one ...READ MORE

Jul 24 in Apache Spark by John
484 views
0 votes
1 answer

Difference between cogroup and full outer join in spark

Please go through the below explanation : Full ...READ MORE

Jul 13 in Apache Spark by Kiran
945 views
0 votes
1 answer

How Foreach Operation works in Apache Spark?

Hi, foreach() operation is an action. It does not ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,360 points
88 views
0 votes
1 answer

How to create paired RDD using subString method in Spark?

Hi, If you have a file with id ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,360 points
88 views
0 votes
1 answer

Spark: Error while instantiating "org.apache.spark.sql.hive.HiveSessionState"

Seems like you have not started the ...READ MORE

Jul 25 in Apache Spark by Rohit
436 views
0 votes
1 answer

Spark: Dataframe vs Dataset

Recently, there are two new data abstractions ...READ MORE

Jul 29 in Apache Spark by Jackie
248 views
0 votes
1 answer

How to launch spark application in cluster mode in Spark?

Hi, To launch spark application in cluster mode, ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,360 points
72 views