How to select all columns with group by?

0 votes

How to select all columns with group by in spark

df.select(*).groupby("id").agg(sum("salary"))

I tried using select but could not make it work.

Feb 18, 2019 in Apache Spark by Ishan
91 views

1 answer to this question.

0 votes

You can use the following to print all the columns:

resultset = df.groupBy("id").sum("salary");
joinedDS = studentDataset.join(resultset, "id");
answered Feb 18, 2019 by Omkar
• 69,000 points

Related Questions In Apache Spark

0 votes
1 answer

Unable to run select query with selected columns on a temp view registered in spark application

from pyspark.sql.types import FloatType fname = [1.0,2.4,3.6,4.2,45.4] df=spark.createDataFrame(fname, ...READ MORE

answered 5 days ago in Apache Spark by GAURAV
• 140 points
76 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 42,303 views
0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

answered Aug 16, 2018 in Apache Spark by zombie
• 3,750 points
1,541 views
0 votes
1 answer

How to limit the cores being used by a cluster?

You can set the maximum number of ...READ MORE

answered Mar 11, 2019 in Apache Spark by Raj
64 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,870 points
4,548 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,870 points
645 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
25,796 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points
1,709 views
0 votes
1 answer

How to increase the amount of data to be transferred to shuffle service at the same time?

The amount of data to be transferred ...READ MORE

answered Mar 1, 2019 in Apache Spark by Omkar
• 69,000 points
103 views
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3, 2019 in Apache Spark by Omkar
• 69,000 points
474 views