How to assign a column in Spark Dataframe PySpark as a Primary Key

+1 vote
I've just converted a glue dynamic frame into spark dataframe using the .todf() method. I now need to assign a column as the Primary Key. How do I do that? Please help!
Jan 9, 2020 in Apache Spark by anonymous
• 150 points
11,994 views
What you could do is, create a dataframe on your PySpark, set the column as Primary key and then insert the values in the PySpark dataframe.
Hi Kalgi! I do not see a way to set a column as Primary Key in PySpark. Can you please share the details (code) about how that is done? Thanks!

1 answer to this question.

+1 vote

spark do not have any concept of primary key. As spark is computation engine not database.

Hope this helps!

To know more about Pyspark, it's recommended that you join Pyspark course online.

Thanks.

answered Jan 12, 2020 by Sirish
• 160 points
Yes I just read a few articles and came to the conclusion that you cannot set primary key in apache spark.

Related Questions In Apache Spark

0 votes
1 answer
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 87,324 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

answered May 14, 2020 in Apache Spark by MD
• 95,440 points
4,525 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Dhara dhruve
5,660 views
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,910 points
121,288 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,684 views
+1 vote
2 answers
0 votes
1 answer

How to find the number of null contain in dataframe?

Hey there! You can use the select method of the ...READ MORE

answered May 3, 2019 in Apache Spark by Omkar
• 69,210 points
4,605 views
0 votes
1 answer

How to read a dataframe based on an avro schema?

Hi, I am able to understand your requirement. ...READ MORE

answered Oct 30, 2020 in Apache Spark by MD
• 95,440 points
2,759 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP