How to Access Hive via Python?

0 votes

https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python appears to be outdated.

When I add this to /etc/profile:

export PYTHONPATH=$PYTHONPATH:/usr/lib/hive/lib/py

I can then do the imports as listed in the link, with the exception of from hive import ThriftHivewhich actually need to be:

from hive_service import ThriftHive

Next the port in the example was 10000, which when I tried caused the program to hang. The default Hive Thrift port is 9083, which stopped the hanging.

So I set it up like so:

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
try:
    transport = TSocket.TSocket('<node-with-metastore>', 9083)
    transport = TTransport.TBufferedTransport(transport)
    protocol = TBinaryProtocol.TBinaryProtocol(transport)
    client = ThriftHive.Client(protocol)
    transport.open()
    client.execute("CREATE TABLE test(c1 int)")

    transport.close()
except Thrift.TException, tx:
    print '%s' % (tx.message)

I received the following error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 68, in execute
self.recv_execute()
File "/usr/lib/hive/lib/py/hive_service/ThriftHive.py", line 84, in recv_execute
raise x
thrift.Thrift.TApplicationException: Invalid method name: 'execute'

But inspecting the ThriftHive.py file reveals the method execute within the Client class.

How may I use Python to access Hive?

Oct 9, 2018 in Big Data Hadoop by digger
• 27,630 points

recategorized Oct 9, 2018 by Omkar 2,315 views

1 answer to this question.

0 votes

The easiest way is to use PyHive.

To install you'll need these libraries:

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

After installation, you can connect to Hive like this:

from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")

Now that you have the hive connection, you have options how to use it. You can just straight-up query:

cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
  use_result(result)

...or to use the connection to make a Pandas dataframe:

import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
answered Oct 9, 2018 by Omkar
• 67,290 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,510 points
266 views
0 votes
1 answer

How to programmatically access hadoop cluster where kerberos is enable?

Okay,here's the code snippet to work in the ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by coldcode
• 2,020 points
1,284 views
0 votes
3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8 in Big Data Hadoop by Vijay Dixon
• 180 points
1,161 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
2,392 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
12,185 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
895 views
0 votes
1 answer
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 67,290 points
225 views
0 votes
1 answer

Hive: How to use insert query like SQL

It is now possible to insert like ...READ MORE

answered Nov 5, 2018 in Big Data Hadoop by Omkar
• 67,290 points
47 views