I'm running Java Spark app on Kubernetes (minikube) using spark-on-k8s-operator , java code snippet below :
static void minIOReadTester2()
{
System.out.println("*************** Getting spark session ****************");
SparkSession spark = SparkSession.builder()
.appName("spark-with-nats-readMinIO")
.config("fs.s3a.access.key", "minioadmin")
.config("fs.s3a.secret.key", "minioadmin")
.config("fs.s3a.endpoint", "http://minio-service.minio-dev:9000")
.config("fs.s3a.connection.ssl.enabled", "true")
.config("fs.s3a.path.style.access", "true")
.config("fs.s3a.attempts.maximum", "1")
.config("fs.s3a.connection.establish.timeout", "5000")
.config("fs.s3a.connection.timeout", "10000")
.getOrCreate();
System.out.println("*************** Got spark session ****************");
Dataset<Row> dftest = spark.read().load("s3a://testbucket/outputdelta");
System.out.println("*** COUNT *** : "+dftest.count());
dftest.show(20,false);
}
Sparkapplication CRD yaml file :
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-app
namespace: minio-dev
labels:
app: spark-app
spec:
type: Java
mode: cluster # Deploy in cluster mode
image: dockerhub/spark-minio:latest
imagePullPolicy: Always
mainClass: com.test.MinIOTester
mainApplicationFile: local:///app/spark-learning-0.0.1-SNAPSHOT.jar
sparkVersion: 3.3.3
deps:
jars:
- local:///app/libs/nats-spark-connector-balanced_2.12-1.1.4.jar
- local:///app/libs/jnats-2.17.1.jar
driver:
cores: 1
memory: 1024m
serviceAccount: sparksvcnew
labels:
version: 3.3.3
executor:
cores: 1
instances: 2
memory: 1024m
serviceAccount: sparksvcnew
labels:
version: 3.3.3
imagePullSecrets:
- dockerhub-secret
In the dependencies I'm using nats-spark-connector related jars as I need to connect with NATS in the future. As per spark-operator-quickGuide had followed all the steps and running the spark app. Driver pod starts successfully however executor pod is not getting created. I can see executor code is also running on the same driver pod and is giving result on the driver pod itself ! In the logs I can see below lines :
24/02/01 13:51:46 INFO Executor: Starting executor ID driver on host spark-app-driver
24/02/01 13:51:46 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
And also by running kubectl get events -n minio-dev I can see only driver was created and no executors got created.
7m41s Normal SparkApplicationSubmitted sparkapplication/spark-app SparkApplication spark-app was submitted successfully
7m34s Normal SparkDriverRunning sparkapplication/spark-app Driver spark-app-driver is running
7m16s Normal SparkDriverCompleted sparkapplication/spark-app Driver spark-app-driver completed
7m16s Normal SparkApplicationCompleted sparkapplication/spark-app SparkApplication spark-app completed
On Spark UI also I can see spark.master local[*] which means it's not considering K8S as master. Any Idea from where it's overriding value of master to local because of which it might not be able to create executor pod and running executor code on driver pod itself?