Pdf to csv file format conversions

0 votes
I am having a requirement to load the multiple .pdf files from hdfs location. Now I want to convert all pdf file formats to csv file formats and load into hive tables. I tried to Google it but I am not getting how to convert pdf to csv format code. Can you please help me how to load all pdf files and convert to csv if possible please share the code?
Jul 9 in Big Data Hadoop by Shri
47 views

1 answer to this question.

0 votes

You can convert the pdf files with the help of some external tools that you can find online. After this, upload the files on hdfs and create the hive table to store these files in the hive table. Refer to the below example to create a hive table and load a CSV file in it.

create table users_data(userid varchar(10), location varchar(100), age varchar(5)) row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with serdeproperties("separatorChar" = "\;","quoteChar" = "\"") stored as textfile; 

Below is the query to load the CSV file in the above table,

load data inpath 'BX-Users.csv' into table users_data; 

Refer to the below screenshot for the same:

image

answered Jul 9 by Esha

Related Questions In Big Data Hadoop

0 votes
1 answer

How can we transfer a PDF file to HDFS?

You can easily upload any file to ...READ MORE

answered Apr 13, 2018 in Big Data Hadoop by nitinrawat895
• 10,760 points
90 views
0 votes
1 answer

How to change file format using Sqoop?

For change in the file format, you ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 68,180 points
316 views
0 votes
1 answer
0 votes
1 answer

Copy file from HDFS to the local file system

There are two possible ways to copy ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by nitinrawat895
• 10,760 points
6,837 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
3,538 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
436 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
17,995 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,280 points
1,302 views
0 votes
2 answers

How to convert .txt file to Hadoop's sequence file format

import java.io.IOException; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import ...READ MORE

answered Oct 12, 2018 in Big Data Hadoop by Sanjay
867 views
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
6,489 views