How to select all columns with group by?

0 votes

How to select all columns with group by in spark

df.select(*).groupby("id").agg(sum("salary"))

I tried using select but could not make it work.

Feb 18, 2019 in Apache Spark by Ishan
213 views

1 answer to this question.

0 votes

You can use the following to print all the columns:

resultset = df.groupBy("id").sum("salary");
joinedDS = studentDataset.join(resultset, "id");
answered Feb 18, 2019 by Omkar
• 69,040 points

Related Questions In Apache Spark

0 votes
1 answer

Unable to run select query with selected columns on a temp view registered in spark application

from pyspark.sql.types import FloatType fname = [1.0,2.4,3.6,4.2,45.4] df=spark.createDataFrame(fname, ...READ MORE

answered Mar 28 in Apache Spark by GAURAV
• 140 points
258 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 51,972 views
0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

answered Aug 16, 2018 in Apache Spark by zombie
• 3,750 points
2,659 views
0 votes
1 answer

How to limit the cores being used by a cluster?

You can set the maximum number of ...READ MORE

answered Mar 11, 2019 in Apache Spark by Raj
73 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
5,314 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
781 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyF ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
32,409 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points
2,003 views
0 votes
1 answer

How to increase the amount of data to be transferred to shuffle service at the same time?

The amount of data to be transferred ...READ MORE

answered Mar 1, 2019 in Apache Spark by Omkar
• 69,040 points
141 views
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3, 2019 in Apache Spark by Omkar
• 69,040 points
603 views