So I have cluster node on Google Kubernetes Engine and I do spark-submit to run some spark job. (I didn't use spark-submit exactly, I launch the submit using java code, but they are essentially invoking the same Scala class, which is SparkSubmit.class)
And in my case, I have two clusters I can connect with on my laptop by using the gcloud command.
e.g.
gcloud container clusters get-credentials cluster-1gcloud container clusters get-credentials cluster-2
when I connect to cluster-1, and spark-submit is submitting to cluster-1, it works. But when I ran the second gcloud command and still submitting to cluster-1, it won't work, and the following stack track appears (abridged version)
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:194)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
I've been searching for a while without success. The main issue is probably when spark-submit launches, it searches for some sort of credential on the local machine relating to Kubernetes, and the changing context by previous two gcloud command messed it up.
I'm just curious, when we do spark-submit, how exactly does the remote K8s server knows who I am? What's the auth process involved in all this?
Thank you in advance.
If you want to see what the
gcloud container clusters get-credentials cluster-1command does you can start from scratch again and look at the content of~/.kube/configSomething is probably not matching or conflicting. Or perhaps the user/contexts. Perhaps you have credentials for both cluster but you are using the context for
cluster-1to accesscluster-2The structure of the
~/.kube/configfile should be something like this:In the code, it looks like it uses the
io.fabric8.kubernetes.client.KubernetesClientlibrary. For example, in this file KubernetesDriverBuilder.scala