Spark: Can we add column to dataframe?

+1 vote
Can we add column to dataframe? If yes, please share the code.
Aug 9 in Apache Spark by Chirag
79 views

2 answers to this question.

+1 vote

Yes we can add a column using withColumn with the function as shown below for your reference.

val sqlContext = new SQLContext(sc)

import sqlContext.implicits._ // for `toDF` and $""

import org.apache.spark.sql.functions._ // for `when`


val df = sc.parallelize(Seq((4, "blah", 2), (2, "", 3), (56, "foo", 3), (100, null, 5)))

    .toDF("A", "B", "C")

val newDf = df.withColumn("D", when($"B".isNull or $"B" === "", 0).otherwise(1))

newDf.show() shows

+---+----+---+---+

| A| B| C| D|

+---+----+---+---+

| 4|blah| 2| 1|

| 2| | 3| 0|

| 56| foo| 3| 1|

|100|null| 5| 0|

+---+----+---+---+
answered Aug 9 by Shirish
+1 vote

Yes we can add columns to the existing data frame in Spark

import pandas as pd

data = {'Name': ['Indis', 'Sachin', 'Rohit', 'Dhoni'],

        'Height': [5.1, 6.2, 5.1, 5.2],

        'Qualification': ['Team', 'Opener', 'Hitman', 'Keeper']}  

df = pd.DataFrame(data)

address = ['India', 'Mumbai', 'Chennai', 'Patna']

df['Address'] = address

df

on Spark Online Training

answered Oct 24 by Siva
• 160 points

Related Questions In Apache Spark

0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 29,520 views
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,710 points
2,399 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
5,154 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

answered Dec 31, 2018 in Apache Spark by anonymous
7,140 views
+1 vote
1 answer
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3 in Apache Spark by Omkar
• 68,180 points
294 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,280 points
16,738 views
0 votes
3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8 in Big Data Hadoop by Vijay Dixon
• 190 points
1,617 views
+1 vote
1 answer

How to add package com.databricks.spark.avro in spark?

Start spark shell using below line of ...READ MORE

answered Jul 10 in Apache Spark by Jishnu
498 views
0 votes
1 answer

How to add package com.databricks.spark.avro in spark?

Start spark shell using below line of ...READ MORE

answered Jul 23 in Apache Spark by Ritu
261 views