I have a golang client using librdkafka with SASL-GSSAPI with Kerberos and it is working fine when runinng on Debian docker image.
On the other hand, it is not working when trying to run on Alpine image. See below Alpine Dockerfile (only relevant parts being shown):
FROM golang:1.17.11-alpine3.16 AS go_kafka_base
RUN apk update && apk add --no-cache git bash alpine-sdk linux-headers musl-dev g++
RUN apk add --no-cache bsd-compat-headers cyrus-sasl
RUN apk add --no-cache openssl krb5 cyrus-sasl-dev cyrus-sasl-gssapiv2
RUN git clone https://github.com/edenhill/librdkafka.git && cd librdkafka && git checkout v2.2.0 && ./configure --prefix /usr --install-deps && make && make install
# second stage
FROM go_kafka_base as builder
WORKDIR /app
#some parts are absent
RUN make build-app -W build-deps GOOS=linux GOENVS="CGO_ENABLED=1" BUILDFLAGS="${BUILDFLAGS}" LDFLAGS="${LDFLAGS}" TAGS='dynamic'
#other parts not shown here
When I run the application, I get the error message in the client application logs:
LIBSASL|rdkafka#producer-1| [thrd:sasl_ssl://kafka-1:19092/bootstrap]: sasl_ssl://kafka-1:19092/bootstrap: GSSAPI Error: Miscellaneous failure (see text) (Matching credential (kafka/[email protected]) not found)
On Kerberos log, I see the error below:
Oct 06 11:50:45 kerberos krb5kdc[16](info): TGS_REQ (6 etypes {18 17 20 19 16 23}) 172.28.0.5: LOOKING_UP_SERVER: authtime 0, [email protected] for kafka/[email protected], Server not found in Kerberos database
On Kafka Broker log, I see this:
2023-10-05 15:24:54 [2023-10-05 13:24:54,694] DEBUG [SslTransportLayer channelId=172.28.0.4:19092-172.28.0.5:43798-16 key=channel=java.nio.channels.SocketChannel[connected local=/172.28.0.4:19092 remote=/172.28.0.5:43798], selector=sun.nio.ch.EPollSelectorImpl@34fa9b29, interestOps=1, readyOps=0] SSL peer is not authenticated, returning ANONYMOUS instead (org.apache.kafka.common.network.SslTransportLayer)
2023-10-05 15:24:54 [2023-10-05 13:24:54,694] DEBUG [SslTransportLayer channelId=172.28.0.4:19092-172.28.0.5:43798-16 key=channel=java.nio.channels.SocketChannel[connected local=/172.28.0.4:19092 remote=/172.28.0.5:43798], selector=sun.nio.ch.EPollSelectorImpl@34fa9b29, interestOps=1, readyOps=0] SSL handshake completed successfully with peerHost '172.28.0.5' peerPort 43798 peerPrincipal 'User:ANONYMOUS' cipherSuite 'TLS_AES_256_GCM_SHA384' (org.apache.kafka.common.network.SslTransportLayer)
2023-10-05 15:24:54 [2023-10-05 13:24:54,695] DEBUG Set SASL server state to HANDSHAKE_OR_VERSIONS_REQUEST during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:24:54 [2023-10-05 13:24:54,695] DEBUG Handling Kafka request API_VERSIONS during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:24:54 [2023-10-05 13:24:54,695] DEBUG Set SASL server state to HANDSHAKE_REQUEST during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:24:54 [2023-10-05 13:24:54,695] DEBUG Handling Kafka request SASL_HANDSHAKE during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:24:54 [2023-10-05 13:24:54,695] DEBUG Using SASL mechanism 'GSSAPI' provided by client (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:24:54 [2023-10-05 13:24:54,695] DEBUG Creating SaslServer for kafka/[email protected] with mechanism GSSAPI (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:24:54 Found KeyTab /etc/kafka/keytab/broker1.keytab for kafka/[email protected]
2023-10-05 15:24:54 Found ticket for kafka/[email protected] to go to krbtgt/[email protected] expiring on Fri Oct 06 13:18:42 GMT 2023
2023-10-05 15:24:54 [2023-10-05 13:24:54,696] DEBUG Set SASL server state to AUTHENTICATE during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:24:54 [2023-10-05 13:24:54,698] DEBUG [SocketServer listenerType=ZK_BROKER, nodeId=1] Connection with /172.28.0.5 disconnected (org.apache.kafka.common.network.Selector)
2023-10-05 15:24:54 java.io.EOFException
2023-10-05 15:24:54 at org.apache.kafka.common.network.SslTransportLayer.read(SslTransportLayer.java:619)
It gets disconnected after Set SASL server state to AUTHENTICATE during authentication
On Kafka Broker logs, when client is running on Debian, as a comparison, I see the handshake process moving forward like:
2023-10-05 15:34:09 [2023-10-05 13:34:09,562] DEBUG Set SASL server state to HANDSHAKE_OR_VERSIONS_REQUEST during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:34:09 [2023-10-05 13:34:09,562] DEBUG Handling Kafka request API_VERSIONS during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:34:09 [2023-10-05 13:34:09,563] DEBUG Set SASL server state to HANDSHAKE_REQUEST during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:34:09 [2023-10-05 13:34:09,563] DEBUG Handling Kafka request SASL_HANDSHAKE during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:34:09 [2023-10-05 13:34:09,563] DEBUG Using SASL mechanism 'GSSAPI' provided by client (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:34:09 [2023-10-05 13:34:09,564] DEBUG Creating SaslServer for kafka/[email protected] with mechanism GSSAPI (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:34:09 Found KeyTab /etc/kafka/keytab/broker1.keytab for kafka/[email protected]
2023-10-05 15:34:09 Found ticket for kafka/[email protected] to go to krbtgt/[email protected] expiring on Fri Oct 06 13:18:42 GMT 2023
2023-10-05 15:34:09 [2023-10-05 13:34:09,567] DEBUG Set SASL server state to AUTHENTICATE during authentication (org.apache.kafka.common.security.authenticator.SaslServerAuthenticator)
2023-10-05 15:34:09 Entered Krb5Context.acceptSecContext with state=STATE_NEW
2023-10-05 15:34:09 [2023-10-05 13:34:09,641] INFO [Admin Manager on Broker 1]: Error processing create topic request CreatableTopic(name='__transaction_state', numPartitions=50, replicationFactor=3, assignments=[], configs=[CreateableTopicConfig(name='compression.type', value='uncompressed'), CreateableTopicConfig(name='cleanup.policy', value='compact'), CreateableTopicConfig(name='min.insync.replicas', value='2'), CreateableTopicConfig(name='segment.bytes', value='104857600'), CreateableTopicConfig(name='unclean.leader.election.enable', value='false')]) (kafka.server.ZkAdminManager)
2023-10-05 15:34:09 org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 3 larger than available brokers: 1.
2023-10-05 15:34:09 Looking for keys for: kafka/[email protected]
2023-10-05 15:34:09 Added key: 17version: 1
2023-10-05 15:34:09 Added key: 18version: 1
2023-10-05 15:34:09 >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
2023-10-05 15:34:09 Using builtin default etypes for permitted_enctypes
2023-10-05 15:34:09 default etypes for permitted_enctypes: 18 17 20 19 16 23.
2023-10-05 15:34:09 >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
2023-10-05 15:34:09 MemoryCache: add 1696512849/564599/E73F519252DFB1419DB40C5DFBBEC103F1AB871D8C0B6702106C0A171F24B38D/[email protected] to [email protected]|kafka/[email protected]
2023-10-05 15:34:09 >>> KrbApReq: authenticate succeed.
It seems to be a DNS / Kerberos Service Principal name resolution issue, because in the Kafka Broker on Alpine, the principal name seems not to be resolved to kafka/[email protected], it is being used kafka/[email protected] instead, without m_net .
Finally, I already tried to configure in the client side (krb5 configuration file) as explained in https://web.mit.edu/Kerberos/krb5-latest/doc/admin/princ_dns.html:
dns_canonicalize_hostname = false
or
dns_canonicalize_hostname = fallback
Same error. I am running out of options here. :-(
Worth mentioning that the only difference between both images Debian and Alpine is the Docker file, there is no any change in the client configuration files nor Kafka server stack.
Please, does anyone know what might be causing this error when running the application on Alpine ?