Spark: Can we add column to dataframe?

+1 vote
Can we add column to dataframe? If yes, please share the code.
Aug 9, 2019 in Apache Spark by Chirag
373 views

2 answers to this question.

+1 vote

Yes we can add a column using withColumn with the function as shown below for your reference.

val sqlContext = new SQLContext(sc)

import sqlContext.implicits._ // for `toDF` and $""

import org.apache.spark.sql.functions._ // for `when`


val df = sc.parallelize(Seq((4, "blah", 2), (2, "", 3), (56, "foo", 3), (100, null, 5)))

    .toDF("A", "B", "C")

val newDf = df.withColumn("D", when($"B".isNull or $"B" === "", 0).otherwise(1))

newDf.show() shows

+---+----+---+---+

| A| B| C| D|

+---+----+---+---+

| 4|blah| 2| 1|

| 2| | 3| 0|

| 56| foo| 3| 1|

|100|null| 5| 0|

+---+----+---+---+
answered Aug 9, 2019 by Shirish
+1 vote

Yes we can add columns to the existing data frame in Spark

import pandas as pd

data = {'Name': ['Indis', 'Sachin', 'Rohit', 'Dhoni'],

        'Height': [5.1, 6.2, 5.1, 5.2],

        'Qualification': ['Team', 'Opener', 'Hitman', 'Keeper']}  

df = pd.DataFrame(data)

address = ['India', 'Mumbai', 'Chennai', 'Patna']

df['Address'] = address

df

on Spark Online Training

answered Oct 24, 2019 by Siva
• 160 points

Related Questions In Apache Spark

0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 42,930 views
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,750 points
4,872 views
+1 vote
1 answer
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
6,612 views
+1 vote
2 answers
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3, 2019 in Apache Spark by Omkar
• 69,000 points
481 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,310 points
22,743 views
0 votes
3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8, 2019 in Big Data Hadoop by Vijay Dixon
• 190 points
2,316 views
+1 vote
1 answer

How to add package com.databricks.spark.avro in spark?

Start spark shell using below line of ...READ MORE

answered Jul 10, 2019 in Apache Spark by Jishnu
1,205 views
0 votes
1 answer

How to add package com.databricks.spark.avro in spark?

Start spark shell using below line of ...READ MORE

answered Jul 23, 2019 in Apache Spark by Ritu
726 views