Hadoop security GroupMappingServiceProvider exception for Spark job via Dataproc API

0 votes

While runnin a spark job on a google dataproc cluster I am stuck at following error:

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMapping not org.apache.hadoop.security.GroupMappingServiceProvider
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2330)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:108)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:102)
    at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:450)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:310)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:277)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:833)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:803)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:676)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2430)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at com.my.package.spark.SparkModule.provideJavaSparkContext(SparkModule.java:59)
    at com.my.package.spark.SparkModule$$ModuleAdapter$ProvideJavaSparkContextProvidesAdapter.get(SparkModule$$ModuleAdapter.java:140)
    at com.my.package.spark.SparkModule$$ModuleAdapter$ProvideJavaSparkContextProvidesAdapter.get(SparkModule$$ModuleAdapter.java:101)
    at dagger.internal.Linker$SingletonBinding.get(Linker.java:364)
    at spark.Main$$InjectAdapter.get(Main$$InjectAdapter.java:65)
    at spark.Main$$InjectAdapter.get(Main$$InjectAdapter.java:23)
    at dagger.ObjectGraph$DaggerObjectGraph.get(ObjectGraph.java:272)
    at spark.Main.main(Main.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMapping not org.apache.hadoop.security.GroupMappingServiceProvider
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2324)
    ... 31 more

Job configuration:

Region: global
Cluster my-cluster
Job type: Spark
Jar files: gs://bucket/jars/spark-job.jar
Main class or jar: spark.Main
Arguments:
Properties:
spark.driver.extraClassPath: /path/to/google-api-client-1.20.0.jar
spark.driver.userClassPathFirst: true

I have no problem running it this way on the command line:

spark-submit --conf "spark.driver.extraClassPath=/path/to/google-api-client-1.20.0.jar" --conf "spark.driver.userClassPathFirst=true" --class spark.Main /path/to/spark-job.jar

But the UI/API does not allow you to pass both the class name and jar, so it looks like this instead:

spark-submit --conf spark.driver.extraClassPath=/path/to/google-api-client-1.20.0.jar --conf spark.driver.userClassPathFirst=true --class spark.Main --jars /tmp/1f4d5289-37af-4311-9ccc-5eee34acaf62/spark-job.jar /usr/lib/hadoop/hadoop-common.jar

I can't figure out if it is a problem with providing the extraClassPath or if the spark-job.jar and the hadoop-common.jar are somehow conflicting.

Mar 22, 2018 in Big Data Hadoop by Ashish
• 2,630 points
103 views

1 answer to this question.

0 votes
One of the reason behin you getting this error might be by the combination of userClassPathFirst and /usr/lib/hadoop/hadoop-common.jar being the jar Dataproc specifies to spark-submit. As far as I know in some cases, the instance of GroupMappingServiceProvider from the user class loader will be used and in others the instance from the system class loader will be used. As a class loaded from one class loader is not equal to the same class loaded from another class loader, you would end up with this exception.

Instead of userClassPathFirst, it would  make sense to instead relocate the conflicting classes using something like maven shade. Let me know in case it does not work out.
answered Mar 22, 2018 by nitinrawat895
• 10,110 points

Related Questions In Big Data Hadoop

0 votes
1 answer
0 votes
2 answers

How does Hadoop/Spark is used for building large analytics report?

The best possible framework for this task ...READ MORE

answered Aug 7, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
140 views
0 votes
1 answer

Which is the Real Time Monitoring tool/API for Hadoop?

If you're using Yarn, there's a rest ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,810 points
92 views
0 votes
1 answer
0 votes
1 answer

Hadoop MapReduce wordcount "type job is not applicable for the arguments" error

The combiner class is not required in ...READ MORE

answered Dec 19, 2018 in Big Data Hadoop by Omkar
• 67,120 points
27 views
+10 votes
11 answers

Hadoop “Unable to load native-hadoop library for your platform” warning

modify the glibc version.CentOS provides safe softwares ...READ MORE

answered Sep 10, 2018 in Big Data Hadoop by bug_seeker
• 15,310 points
8,333 views
0 votes
1 answer

Relationship between Spark, Hadoop and Cassandra?

Spark is a distributed in memory processing ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
99 views