Facing issue while reading tsv file in pyspark

Question

from pyspark.sql.types import StructType,StructField,StringType,IntegerType
schema = StructType([StructField("id_code", IntegerType()),StructField("description", StringType())])
df=spark.read.csv("C:/Users/HP/Downloads/`connection_type`.tsv",schema=schema)
df.show();
+-------+-----------+
|id_code|description|
+-------+-----------+
|   null|       null|
|   null|       null|
|   null|       null|
|   null|       null|
|   null|       null|
+-------+-----------+

If i read it simply without applying any schema.

df=spark.read.csv("C:/Users/HP/Downloads/connection_type.tsv",sep="/t")
df.show()
+----------------+
|             _c0|
+----------------+
| 0 Not Specified|
| 1 Modem|
| 2 LAN/Wifi|
| 3 Unknown|
4 Mobile Carrier|
+----------------+
It is not coming in a proper way.Can anyone please help me on this.My sample file is .tsv file and it has below records.
0   Specified
1   Modemwifi
2   LAN/Wifi
3   Unknown
4   Mobile user

MD · Answer 1 · Sep 28, 2020

Hi@khyati,

You are getting this type of output because of your schema. Just check the schema properly and then read the file. According to your output, it is reading only one column. So analyze your dataset first then use your commands.

After that it will work.

To know more about it, get your Pyspark certification today and become expert.

Thanks.