We are using DataStax Enterprise which has support for Spark and Spark Job server.
We have 3 node olap casandra cluster with each node configuration of 8 core processor and 32G of RAM
As far our spark job is concerned with above configuration I was able to run at most 4 spark job in parallel with low latency.
I am using pre-created spark context to submit the job via spark job server api , also Built In Spark Job server has a limit of submitting 8 spark at once in parallel and it start rejecting the job as soon as the more jobs are getting submitted.
I need a mechanism in which my spark job should only process 4 job at any moment of time and rest of the spark job should remain in queue till the time any of the spark job is fixed and we have slot available for it to process.
How can I acheive this on DSE Cassandra with Spark built in or In normal spark way ?
I am not aware about how put a queuing mechanism on spark cluster and how can I control with different parameter.
DSE spark have its own resource manager and doesn't have any MESOS or YARN functionality thorugh which i guess we can submit the job to a particular queue.
DataStax Enterprise (DSE) ships with open-source Spark Job Server for clusters running with the DSE Analytics workload. Specifically, DSE 6.8.12 ships with Spark Job Server v0.8.0.
The queueing mechanism you've asked for is not a feature available in the Job Server.
In any case, the Job Server simply provides a REST API for submitting jobs to a Spark cluster. It is the Spark master's role to coordinate and manage Spark applications, requesting resources (CPUs, memory) from Spark worker nodes to execute jobs.
If there are insufficient resources available (not enough CPUs, memory, or both) then the submitted Spark application will wait in the queue until resources become available.
If you think that Spark applications are taking up too much memory or CPUs on a node, consider lowering the amount of resources available to worker nodes by setting the relevant Spark worker options in
dse.yaml. For details, see Configuring Spark nodes in DataStax Enterprise. Cheers!