Trending questions in Apache Spark

+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21 in Apache Spark by anonymous
33,238 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 26,484 views
0 votes
1 answer

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3, ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,300 points
30,741 views
0 votes
6 answers

How to replace null values in Spark DataFrame?

Hi i hope this will help for ...READ MORE

Feb 5 in Apache Spark by Srinivasreddy
• 140 points
23,118 views
0 votes
2 answers

sparkstream.textfilstreaming(localpathdirectory). I am getting empty results

Hey @c.kothamasu You should copy your file to ...READ MORE

5 days ago in Apache Spark by Manas
16 views
0 votes
1 answer
0 votes
1 answer

How to convert a json file structure with values in single quotes to quoteless ?

You can do this by turning off ...READ MORE

Oct 4 in Apache Spark by Jisha
96 views
0 votes
0 answers

Cannot resolve Error In Spark when filter records with two where condition

SPARK 1.6, SCALA, MAVEN i have created a ...READ MORE

Sep 30 in Apache Spark by anonymous
• 120 points
51 views
0 votes
2 answers

Spark: Can we add column to dataframe?

Yes we can add columns to the ...READ MORE

Oct 24 in Apache Spark by Siva
• 140 points
61 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

Dec 10, 2018 in Apache Spark by Vini
16,120 views
0 votes
0 answers

Difference Between rdd dataframe dataset [closed]

Sep 12 in Apache Spark by Rajesh pagadala

closed Sep 13 by Omkar 87 views
0 votes
1 answer

Primary keys in Apache Spark

I found the following solution to be ...READ MORE

Sep 11 in Apache Spark by ravikiran
• 4,560 points
78 views
0 votes
1 answer

Need to load 40 GB data to elasticsearch using spark

Did you find any documents or example ...READ MORE

Nov 5 in Apache Spark by Begum
108 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,580 points
16,825 views
0 votes
5 answers

groupByKey vs reduceByKey in Apache Spark.

Below Images are self explainatry for reducebykey ...READ MORE

Apr 22 in Apache Spark by Gunjan Kumar
11,624 views
0 votes
1 answer

How do I connect to a HIVE Meta store through a program in SparkSQL?

In spark 2.0.+ it should look something ...READ MORE

Sep 5 in Apache Spark by ravikiran
• 4,560 points
79 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve "`id`" given input columns

I have used a header-less csv file ...READ MORE

Jul 13 in Apache Spark by Puneet
2,382 views
0 votes
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24 in Apache Spark by Suri
1,706 views
+1 vote
1 answer

How to convert JSON file to AVRO file and vise versa

Try including the package while starting the ...READ MORE

Aug 26 in Apache Spark by Karan
97 views
0 votes
0 answers

What is the use case of map and flatMap? [closed]

What is the major use case for ...READ MORE

Aug 24 in Apache Spark by anonymous
• 120 points

closed Aug 26 by Omkar 84 views
0 votes
1 answer

How to extract record from one RDD using another RDD

Hey, you can use "contains" filter to extract ...READ MORE

Aug 23 in Apache Spark by Karan
54 views
+1 vote
0 answers

Type mismatch error in scala

import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.functions.{col, ...READ MORE

Aug 16 in Apache Spark by anonymous
227 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
611 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,260 points
15,337 views
0 votes
1 answer

How to work with Matrix Multiplication in Apache Spark?

Hey, You can follow this below solution for ...READ MORE

Jul 31 in Apache Spark by Gitika
• 25,340 points
614 views
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,340 points
263 views
0 votes
1 answer
0 votes
1 answer

Primary keys in Apache Spark

import sqlContext.implicits._ import org.apache.spark.sql.Row import org.apache.spark.sql.types.{StructType, StructField, LongType} val df ...READ MORE

Aug 9 in Apache Spark by ravikiran
• 4,560 points
96 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

The string Productivity has to be enclosed between single ...READ MORE

Jul 10 in Apache Spark by Tina
1,342 views
0 votes
1 answer

Monitoring Spark application

Spark-submit jobs are also run from client/edge ...READ MORE

Aug 9 in Apache Spark by Umesh
44 views
0 votes
1 answer

Pyspark dataframe with random values

Hey @Esha, you can use this code. ...READ MORE

Aug 1 in Apache Spark by Zed
383 views
0 votes
1 answer

What is Spark UI and how to monitor a spark job?

Hey, Jobs- to view all the spark jobs Stages- ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,340 points
85 views
0 votes
1 answer

How to handle data shuffle in Spark

Hi, You can do it using map partition ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,340 points
63 views
0 votes
1 answer

How to start spark history server?

Hi, You can use this command to start ...READ MORE

Aug 6 in Apache Spark by Gitika
• 25,340 points
42 views
0 votes
1 answer

Scala: Convert text file data into ORC format using data frame

Converting text file to Orc: Using Spark, the ...READ MORE

Aug 1 in Apache Spark by Esha
208 views
0 votes
1 answer

How SortBykey() operation works in Spark?

Hey, sortByKey() is a transformation. It returns an RDD sorted ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
139 views
0 votes
1 answer

Scala - Error in Inheritance: <console>:: error: not found: value

You need to declare the variable which ...READ MORE

Aug 1 in Apache Spark by Karan
176 views
0 votes
1 answer

Spark Error: StackOverflowError : Exception in thread "main" java.lang.StackOverflowError at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply

Hey, It already has SparkContent.union and it does know how to ...READ MORE

Jul 31 in Apache Spark by Gitika
• 25,340 points
199 views
0 votes
1 answer

Can anyone explain the sparse vector in Spark?

Hey, A sparse vector is used for storing ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
115 views
0 votes
1 answer

How to create paired RDD using subString method in Spark?

Hi, If you have a file with id ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
77 views
0 votes
1 answer

How Foreach Operation works in Apache Spark?

Hi, foreach() operation is an action. It does not ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
64 views
0 votes
1 answer

Scala: save filtered data row by row using saveAsTextFile

Try this code, it worked for me: val ...READ MORE

Aug 2 in Apache Spark by Karan
59 views
0 votes
1 answer

How to launch spark application in cluster mode in Spark?

Hi, To launch spark application in cluster mode, ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
54 views
0 votes
1 answer

By default how many partitions are created in RDD in Apache spark?

Well, it depends on the block of ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
55 views
0 votes
1 answer

Join in RDD using keys

Suppose you have two dataset results( id, ...READ MORE

Aug 2 in Apache Spark by Trisha
36 views
0 votes
1 answer

Spark:error:throws stack overflow when union a lot.

Hey, Use SparkContext.union(...) instead to union many RDDs at once You ...READ MORE

Jul 31 in Apache Spark by Gitika
• 25,340 points
119 views
0 votes
1 answer

In how many modes Apache spark can run?

Hey, You can launch spark application in four ...READ MORE

Aug 2 in Apache Spark by Gitika
• 25,340 points
30 views