Spark: Can we add column to dataframe?

+1 vote
Can we add column to dataframe? If yes, please share the code.
Aug 9, 2019 in Apache Spark by Chirag
114 views

2 answers to this question.

+1 vote

Yes we can add a column using withColumn with the function as shown below for your reference.

val sqlContext = new SQLContext(sc)

import sqlContext.implicits._ // for `toDF` and $""

import org.apache.spark.sql.functions._ // for `when`


val df = sc.parallelize(Seq((4, "blah", 2), (2, "", 3), (56, "foo", 3), (100, null, 5)))

    .toDF("A", "B", "C")

val newDf = df.withColumn("D", when($"B".isNull or $"B" === "", 0).otherwise(1))

newDf.show() shows

+---+----+---+---+

| A| B| C| D|

+---+----+---+---+

| 4|blah| 2| 1|

| 2| | 3| 0|

| 56| foo| 3| 1|

|100|null| 5| 0|

+---+----+---+---+
answered Aug 9, 2019 by Shirish
+1 vote

Yes we can add columns to the existing data frame in Spark

import pandas as pd

data = {'Name': ['Indis', 'Sachin', 'Rohit', 'Dhoni'],

        'Height': [5.1, 6.2, 5.1, 5.2],

        'Qualification': ['Team', 'Opener', 'Hitman', 'Keeper']}  

df = pd.DataFrame(data)

address = ['India', 'Mumbai', 'Chennai', 'Patna']

df['Address'] = address

df

on Spark Online Training

answered Oct 24, 2019 by Siva
• 160 points

Related Questions In Apache Spark

0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 34,011 views
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,750 points
3,156 views
+1 vote
1 answer
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
5,649 views
+1 vote
1 answer
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3, 2019 in Apache Spark by Omkar
• 68,880 points
361 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,290 points
18,844 views
0 votes
3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8, 2019 in Big Data Hadoop by Vijay Dixon
• 190 points
1,880 views
+1 vote
1 answer

How to add package com.databricks.spark.avro in spark?

Start spark shell using below line of ...READ MORE

answered Jul 10, 2019 in Apache Spark by Jishnu
719 views
0 votes
1 answer

How to add package com.databricks.spark.avro in spark?

Start spark shell using below line of ...READ MORE

answered Jul 23, 2019 in Apache Spark by Ritu
442 views