Facing issue while reading tsv file in pyspark

0 votes
from pyspark.sql.types import StructType,StructField,StringType,IntegerType
schema = StructType([StructField("id_code", IntegerType()),StructField("description", StringType())])
df=spark.read.csv("C:/Users/HP/Downloads/`connection_type`.tsv",schema=schema)
df.show();
+-------+-----------+
|id_code|description|
+-------+-----------+
|   null|       null|
|   null|       null|
|   null|       null|
|   null|       null|
|   null|       null|
+-------+-----------+

If i read it simply without applying any schema.

df=spark.read.csv("C:/Users/HP/Downloads/connection_type.tsv",sep="/t")
df.show()
+----------------+
|             _c0|
+----------------+
| 0 Not Specified|
| 1 Modem|
| 2 LAN/Wifi|
| 3 Unknown|
4 Mobile Carrier|
+----------------+
It is not coming in a proper way.Can anyone please help me on this.My sample file is .tsv file and it has below records.
0   Specified
1   Modemwifi
2   LAN/Wifi
3   Unknown
4   Mobile user
Sep 26, 2020 in Apache Spark by khyati
• 190 points
2,525 views

1 answer to this question.

0 votes

Hi@khyati,

You are getting this type of output because of your schema. Just check the schema properly and then read the file. According to your output, it is reading only one column. So analyze your dataset first then use your commands.

After that it will work.

To know more about it, get your Pyspark certification today and become expert.

Thanks.

answered Sep 28, 2020 by MD
• 95,460 points

Related Questions In Apache Spark

+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

answered Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,350 points
2,337 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points
7,888 views
0 votes
1 answer

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

I guess you need provide this kafka.bootstrap.servers ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,490 points
2,818 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,496 views
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,770 points
125,787 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,717 views
0 votes
1 answer

Error reading avro dataset in spark

For avro, you need to download and ...READ MORE

answered Feb 4, 2019 in Apache Spark by Omkar
• 69,220 points
2,283 views
0 votes
1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

answered Feb 13, 2019 in Apache Spark by Omkar
• 69,220 points
1,358 views
0 votes
1 answer

env: ‘python’: No such file or directory in pyspark.

Hi@akhtar, This error occurs because your python version ...READ MORE

answered Apr 7, 2020 in Apache Spark by MD
• 95,460 points
6,430 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP