How to create smaller table from big table in HIVE?

0 votes

I currently have a Hive table that has 1.5 billion rows. I would like to create a smaller table (using the same table schema) with about 1 million rows from the original table. Ideally, the new rows would be randomly sampled from the original table, but getting the top 1M or bottom 1M of the original table would be ok, too. How would I do this?

Sep 24, 2018 in Big Data Hadoop by slayer
• 29,040 points
87 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You could probably best use Hive's built-in sampling methods.

INSERT OVERWRITE TABLE my_table_sample 
SELECT * FROM my_table 
TABLESAMPLE (1m ROWS) t;

This syntax was introduced in Hive 0.11. If you are running an older version of Hive, you'll be confined to using the PERCENT syntax like so.

INSERT OVERWRITE TABLE my_table_sample 
SELECT * FROM my_table 
TABLESAMPLE (1 PERCENT) t;
answered Sep 24, 2018 by digger
• 27,620 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to create a Hive table from sequence file stored in HDFS?

There are two SerDe for SequenceFile as ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 66,910 points
275 views
0 votes
1 answer

How to create a parquet table in hive and store data in it from a hive table?

Please use the code attached below for ...READ MORE

answered Jan 28 in Big Data Hadoop by Omkar
• 66,910 points
383 views
0 votes
1 answer

How to create a managed table in Hive?

You can use this command: create table employee(Name ...READ MORE

answered Dec 14, 2018 in Big Data Hadoop by Omkar
• 66,910 points
189 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 66,910 points
476 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,450 points
1,843 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,135 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
656 views
0 votes
1 answer
+2 votes
5 answers

How to transpose/pivot data in hive?

Below is also a way for Pivot SELECT ...READ MORE

answered Oct 12, 2018 in Big Data Hadoop by Rahul
2,642 views
0 votes
1 answer

How to execute python script in hadoop file system (hdfs)?

If you are simply looking to distribute ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by digger
• 27,620 points
1,029 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.