Pdf to csv file format conversions

0 votes
I am having a requirement to load the multiple .pdf files from hdfs location. Now I want to convert all pdf file formats to csv file formats and load into hive tables. I tried to Google it but I am not getting how to convert pdf to csv format code. Can you please help me how to load all pdf files and convert to csv if possible please share the code?
Jul 9, 2019 in Big Data Hadoop by Shri
1,203 views

1 answer to this question.

0 votes

You can convert the pdf files with the help of some external tools that you can find online. After this, upload the files on hdfs and create the hive table to store these files in the hive table. Refer to the below example to create a hive table and load a CSV file in it.

create table users_data(userid varchar(10), location varchar(100), age varchar(5)) row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with serdeproperties("separatorChar" = "\;","quoteChar" = "\"") stored as textfile; 

Below is the query to load the CSV file in the above table,

load data inpath 'BX-Users.csv' into table users_data; 

Refer to the below screenshot for the same:

image

answered Jul 9, 2019 by Esha

Related Questions In Big Data Hadoop

0 votes
1 answer

How can we transfer a PDF file to HDFS?

You can easily upload any file to ...READ MORE

answered Apr 13, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,216 views
0 votes
1 answer

How to change file format using Sqoop?

For change in the file format, you ...READ MORE

answered Dec 18, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,085 views
0 votes
1 answer
0 votes
1 answer

Copy file from HDFS to the local file system

There are two possible ways to copy ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
17,174 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,072 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,570 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,054 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,639 views
0 votes
2 answers

How to convert .txt file to Hadoop's sequence file format

import java.io.IOException; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import ...READ MORE

answered Oct 12, 2018 in Big Data Hadoop by Sanjay
3,584 views
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
26,645 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP