Hadoop security GroupMappingServiceProvider exception for Spark job via Dataproc API

0 votes

While runnin a spark job on a google dataproc cluster I am stuck at following error:

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMapping not org.apache.hadoop.security.GroupMappingServiceProvider
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2330)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:108)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:102)
    at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:450)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:310)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:277)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:833)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:803)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:676)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2430)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at com.my.package.spark.SparkModule.provideJavaSparkContext(SparkModule.java:59)
    at com.my.package.spark.SparkModule$$ModuleAdapter$ProvideJavaSparkContextProvidesAdapter.get(SparkModule$$ModuleAdapter.java:140)
    at com.my.package.spark.SparkModule$$ModuleAdapter$ProvideJavaSparkContextProvidesAdapter.get(SparkModule$$ModuleAdapter.java:101)
    at dagger.internal.Linker$SingletonBinding.get(Linker.java:364)
    at spark.Main$$InjectAdapter.get(Main$$InjectAdapter.java:65)
    at spark.Main$$InjectAdapter.get(Main$$InjectAdapter.java:23)
    at dagger.ObjectGraph$DaggerObjectGraph.get(ObjectGraph.java:272)
    at spark.Main.main(Main.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMapping not org.apache.hadoop.security.GroupMappingServiceProvider
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2324)
    ... 31 more

Job configuration:

Region: global
Cluster my-cluster
Job type: Spark
Jar files: gs://bucket/jars/spark-job.jar
Main class or jar: spark.Main
spark.driver.extraClassPath: /path/to/google-api-client-1.20.0.jar
spark.driver.userClassPathFirst: true

I have no problem running it this way on the command line:

spark-submit --conf "spark.driver.extraClassPath=/path/to/google-api-client-1.20.0.jar" --conf "spark.driver.userClassPathFirst=true" --class spark.Main /path/to/spark-job.jar

But the UI/API does not allow you to pass both the class name and jar, so it looks like this instead:

spark-submit --conf spark.driver.extraClassPath=/path/to/google-api-client-1.20.0.jar --conf spark.driver.userClassPathFirst=true --class spark.Main --jars /tmp/1f4d5289-37af-4311-9ccc-5eee34acaf62/spark-job.jar /usr/lib/hadoop/hadoop-common.jar

I can't figure out if it is a problem with providing the extraClassPath or if the spark-job.jar and the hadoop-common.jar are somehow conflicting.

Mar 23, 2018 in Big Data Hadoop by Ashish
• 2,650 points

1 answer to this question.

0 votes
One of the reason behin you getting this error might be by the combination of userClassPathFirst and /usr/lib/hadoop/hadoop-common.jar being the jar Dataproc specifies to spark-submit. As far as I know in some cases, the instance of GroupMappingServiceProvider from the user class loader will be used and in others the instance from the system class loader will be used. As a class loaded from one class loader is not equal to the same class loaded from another class loader, you would end up with this exception.

Instead of userClassPathFirst, it would  make sense to instead relocate the conflicting classes using something like maven shade. Let me know in case it does not work out.
answered Mar 23, 2018 by nitinrawat895
• 11,380 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Hadoop security GroupMappingServiceProvider exception for Spark job via Dataproc API

If you don't want to turn off ...READ MORE

answered Jul 2, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
0 votes
2 answers

How does Hadoop/Spark is used for building large analytics report?

The best possible framework for this task ...READ MORE

answered Aug 7, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
0 votes
1 answer

Which is the Real Time Monitoring tool/API for Hadoop?

If you're using Yarn, there's a rest ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,830 points
0 votes
1 answer
0 votes
1 answer

Hadoop MapReduce wordcount "type job is not applicable for the arguments" error

The combiner class is not required in ...READ MORE

answered Dec 19, 2018 in Big Data Hadoop by Omkar
• 69,230 points
+11 votes
11 answers

Hadoop “Unable to load native-hadoop library for your platform” warning

modify the glibc version.CentOS provides safe softwares ...READ MORE

answered Sep 10, 2018 in Big Data Hadoop by bug_seeker
• 15,520 points
0 votes
1 answer

Relationship between Spark, Hadoop and Cassandra?

Spark is a distributed in memory processing ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP