Spark: Reporting Total, and Available Memory of the Cluster

523 Views Asked by At

I'm running a Spark job on an Amazon EMR; I would like to keep reporting the total, and free memory of the cluster from within the program itself. Is there any method in Spark API which provides information about the cluster's memory?

1

There are 1 best solutions below

2
Sandeep Das On

You can use spark.metrics.conf

How to use : initialise spark.metrics.conf in your spark conf file

spark.metrics.conf = /path/to/metrics.properties 

At the above path create metrics.properties file .In that file mention the parameters you want from the spark application, even you can specify the format and the interval.

Eg Here I am getting the data in CSV format in every 1 minute :

driver.sink.csv.class=org.apache.spark.metrics.sink.CsvSink

# Polling period for the CsvSink
#*.sink.csv.period=1
# Unit of the polling period for the CsvSink
#*.sink.csv.unit=minutes

# Polling directory for CsvSink
driver.sink.csv.directory=/Path/at/which/data/will/be/dumped

# Polling period for the CsvSink specific for the worker instance
driver.sink.csv.period=1
# Unit of the polling period for the CsvSink specific for the worker instance
driver.sink.csv.unit=minutes

Full Documentation of this you can find in : https://spark.apache.org/docs/latest/monitoring.html#metrics