How do I output the results of a HiveQL query to CSV

Question

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory '/home/output.csv' select books from table;

When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?

Gitika · Answer 1 · Nov 20, 2020

A slight modification (adding the LOCAL keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll
total 4
-rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0
[lvermeer@hadoop temp]$ head 000000_0 
"row1""col1"1234"col3"1234FALSE
"row2""col1"5678"col3"5678TRUE

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

answered Nov 20, 2020 by Gitika
• 65,730 points

akhtar · Answer 2 · Dec 18, 2020

Hi,

The insert overwrite directory is running in an incorrect way. You need not specify the file extension. Just give the path to your directory. The correct method is shown below.

$ insert overwrite directory '/home/output' select books from table;

Also, note that the insert overwrites directory basically removes all the existing files under the specified folder and then create data files as part files.