How to add third party java jars for use in PySpark?

0 votes

For my project, I required some third party Database client libraries in Java. I want to access them through

java_gateway.py
E.g: to make the client class (not a jdbc driver!) available to the python client via the java gateway:

java_import(gateway.jvm, "org.mydatabase.MyDBClient")
It is not clear where to add the third party libraries to the jvm classpath. I tried to add to compute-classpath.sh but that did nto seem to work: I get

 Py4jError: Trying to call a package
Also, when comparing to Hive: the hive jar files are NOT loaded via compute-classpath.sh so that makes me suspicious. There seems to be some other mechanism happening to set up the jvm side classpath.

Can someone help?

Thanks in advance!

Jul 4, 2018 in Apache Spark by Shubham
• 13,310 points
1,498 views

1 answer to this question.

0 votes

You can add external jars as arguments to PySpark.

pyspark --jars file1.jar,file2.jar

answered Jul 4, 2018 by nitinrawat895
• 10,730 points

Related Questions In Apache Spark

0 votes
1 answer

How to use Spark jars for Yarn distribution?

First, store upload this archive to hdfs and ...READ MORE

answered Mar 28 in Apache Spark by Raj
244 views
0 votes
1 answer

How can you use "for" statement in scala to print list from collection?

Hi, You can use for loop in scala using ...READ MORE

answered Jul 5 in Apache Spark by Gitika
• 25,360 points
38 views
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,260 points
133 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
16,490 views
+1 vote
1 answer
0 votes
1 answer

Writing File into HDFS using spark scala

The reason you are not able to ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
6,164 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,461 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 10,730 points
7,611 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

answered May 25, 2018 in Apache Spark by nitinrawat895
• 10,730 points
2,594 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,574 views