Facing issue while reading tsv file in pyspark

0 votes
from pyspark.sql.types import StructType,StructField,StringType,IntegerType
schema = StructType([StructField("id_code", IntegerType()),StructField("description", StringType())])
df=spark.read.csv("C:/Users/HP/Downloads/`connection_type`.tsv",schema=schema)
df.show();
+-------+-----------+
|id_code|description|
+-------+-----------+
|   null|       null|
|   null|       null|
|   null|       null|
|   null|       null|
|   null|       null|
+-------+-----------+

If i read it simply without applying any schema.

df=spark.read.csv("C:/Users/HP/Downloads/connection_type.tsv",sep="/t")
df.show()
+----------------+
|             _c0|
+----------------+
| 0 Not Specified|
| 1 Modem|
| 2 LAN/Wifi|
| 3 Unknown|
4 Mobile Carrier|
+----------------+
It is not coming in a proper way.Can anyone please help me on this.My sample file is .tsv file and it has below records.
0   Specified
1   Modemwifi
2   LAN/Wifi
3   Unknown
4   Mobile user
Sep 26 in Apache Spark by khyati
• 160 points
55 views

1 answer to this question.

0 votes
Hi@khyati,

You are getting this type of output because of your schema. Just check the schema properly and then read the file. According to your output, it is reading only one column. So analyze your dataset first then use your commands.
answered Sep 28 by MD
• 64,820 points

Related Questions In Apache Spark

+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

answered Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,320 points
907 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,320 points
3,041 views
0 votes
1 answer

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

I guess you need provide this kafka.bootstrap.servers ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,450 points
1,332 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,450 points
4,081 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
41,419 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 10,950 points
4,168 views
0 votes
1 answer

Error reading avro dataset in spark

For avro, you need to download and ...READ MORE

answered Feb 4, 2019 in Apache Spark by Omkar
• 69,030 points
796 views
0 votes
1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

answered Feb 13, 2019 in Apache Spark by Omkar
• 69,030 points
407 views
0 votes
1 answer

env: ‘python’: No such file or directory in pyspark.

Hi@akhtar, This error occurs because your python version ...READ MORE

answered Apr 7 in Apache Spark by MD
• 64,820 points
1,082 views
0 votes
1 answer