Advance Hadoop — How to cross principal/realm to access secured Kafka in Spark application

2 min readMay 24, 2020

In general case, Spark application can use delegation token to access secured Hadoop ecosystem. But if you want to cross principal/realm to do that, you need to do the authentication one more time in your spark application. Here is the solution about how to achieve that.

Customize krb5.conf file. Define the realm of your Kafka servers.

[libdefaults]default_realm = My.HADOOP.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
forwardable = yes
allow_weak_crypto = true
renew_lifetime = 7d[realms]My.HADOOP.COM = {
kdc = krb-kdc-master.analysis.jp.local
kdc = krb-kdc-slave.analysis.jp.local
admin_server = krb-kadmin.analysis.jp.local:2749
}My.KAFKA.COM = {
kdc = kafka-kdc.hnd1.bdd.local
kdc = kafka-kdc.hnd1.bdd.local
admin_server = kafka-adm.hnd1.bdd.local:2749
}[domain_realm]kafka-srv01.hnd1.bdd.local=My.KAFKA.COM
kafka-srv02.hnd1.bdd.local=My.KAFKA.COM
kafka-srv03.hnd1.bdd.local=My.KAFKA.COM
kafka-srv04.hnd1.bdd.local=My.KAFKA.COM
kafka-srv05.hnd1.bdd.local=My.KAFKA.COM...

2. Customize jaas.conf file.

KafkaClient {com.sun.security.auth.module.Krb5LoginModule requireduseTicketCache=falseuseKeyTab=trueprincipal="kafkaPrimary@My.KAFKA.COM"keyTab="./kafkaPrimary.keytab"renewTicket=truestoreKey=trueserviceName="kafka"debug=true;};

3. Customize spark-submit command.

export SECURE_KAFKA_OPTS="-Djava.security.krb5.conf=./krb5.conf -Djava.security.auth.login.config=./jaas.conf"spark-submit 
--master yarn 
--deploy-mode cluster 
--num-executors 4 
--executor-memory 8g 
--keytab hadoop_user.keytab 
--principal hadoop_user@My.HADOOP.COM
--files "/path/to/custom/krb5.conf,/path/to/custom/jaas.conf,/path/to/kafkaPrimary.keytab"
--conf "spark.driver.extraJavaOptions=${SECURE_KAFKA_OPTS}" 
--conf "spark.executor.extraJavaOptions=${SECURE_KAFKA_OPTS}
sparkApp.jar

--keytab and --principal: This two parameter are used by spark client, so you should input the authentication information for the Hadoop cluster.
--files: Upload custom krb5.conf, jaas.conf and keytabe files. These files will be put into the working folder of your spark application.
--conf: Set up Java options for both driver and executor. Because the custom files will be put into the working folder, applications can reference the files by relative path.

Hope you already can access the data from Kafka in your spark applications. If you still have any questions, please leave the comment.

Advance Hadoop — How to cross principal/realm to access secured Kafka in Spark application

Written by Jeffrey Chen

No responses yet