Advance Hadoop — How to cross principal/realm to access secured Kafka in Spark application

Jeffrey Chen
2 min readMay 24, 2020

--

In general case, Spark application can use delegation token to access secured Hadoop ecosystem. But if you want to cross principal/realm to do that, you need to do the authentication one more time in your spark application. Here is the solution about how to achieve that.

  1. Customize krb5.conf file. Define the realm of your Kafka servers.
[libdefaults]default_realm = My.HADOOP.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
forwardable = yes
allow_weak_crypto = true
renew_lifetime = 7d
[realms]My.HADOOP.COM = {
kdc = krb-kdc-master.analysis.jp.local
kdc = krb-kdc-slave.analysis.jp.local
admin_server = krb-kadmin.analysis.jp.local:2749
}
My.KAFKA.COM = {
kdc = kafka-kdc.hnd1.bdd.local
kdc = kafka-kdc.hnd1.bdd.local
admin_server = kafka-adm.hnd1.bdd.local:2749
}
[domain_realm]kafka-srv01.hnd1.bdd.local=My.KAFKA.COM
kafka-srv02.hnd1.bdd.local=My.KAFKA.COM
kafka-srv03.hnd1.bdd.local=My.KAFKA.COM
kafka-srv04.hnd1.bdd.local=My.KAFKA.COM
kafka-srv05.hnd1.bdd.local=My.KAFKA.COM
...

2. Customize jaas.conf file.

KafkaClient {com.sun.security.auth.module.Krb5LoginModule requireduseTicketCache=falseuseKeyTab=trueprincipal="kafkaPrimary@My.KAFKA.COM"keyTab="./kafkaPrimary.keytab"renewTicket=truestoreKey=trueserviceName="kafka"debug=true;};

3. Customize spark-submit command.

export SECURE_KAFKA_OPTS="-Djava.security.krb5.conf=./krb5.conf -Djava.security.auth.login.config=./jaas.conf"spark-submit 
--master yarn
--deploy-mode cluster
--num-executors 4
--executor-memory 8g
--keytab hadoop_user.keytab
--principal hadoop_user@My.HADOOP.COM
--files "/path/to/custom/krb5.conf,/path/to/custom/jaas.conf,/path/to/kafkaPrimary.keytab"
--conf "spark.driver.extraJavaOptions=${SECURE_KAFKA_OPTS}"
--conf "spark.executor.extraJavaOptions=${SECURE_KAFKA_OPTS}
sparkApp.jar
  • --keytab and --principal: This two parameter are used by spark client, so you should input the authentication information for the Hadoop cluster.
  • --files: Upload custom krb5.conf, jaas.conf and keytabe files. These files will be put into the working folder of your spark application.
  • --conf: Set up Java options for both driver and executor. Because the custom files will be put into the working folder, applications can reference the files by relative path.

Hope you already can access the data from Kafka in your spark applications. If you still have any questions, please leave the comment.

--

--

Jeffrey Chen
Jeffrey Chen

Written by Jeffrey Chen

Software engineer who’s dream is to become an athletes

No responses yet