Spark: Reporting Total, and Available Memory of the Cluster

523 Views Asked by user1888243 At 06 June 2018 at 15:04

I'm running a Spark job on an Amazon EMR; I would like to keep reporting the total, and free memory of the cluster from within the program itself. Is there any method in Spark API which provides information about the cluster's memory?

Original Q&A

There are 1 best solutions below

Sandeep Das On 06 June 2018 at 15:28

You can use spark.metrics.conf

How to use : initialise spark.metrics.conf in your spark conf file

spark.metrics.conf = /path/to/metrics.properties

At the above path create metrics.properties file .In that file mention the parameters you want from the spark application, even you can specify the format and the interval.

Eg Here I am getting the data in CSV format in every 1 minute :

driver.sink.csv.class=org.apache.spark.metrics.sink.CsvSink

# Polling period for the CsvSink
#*.sink.csv.period=1
# Unit of the polling period for the CsvSink
#*.sink.csv.unit=minutes

# Polling directory for CsvSink
driver.sink.csv.directory=/Path/at/which/data/will/be/dumped

# Polling period for the CsvSink specific for the worker instance
driver.sink.csv.period=1
# Unit of the polling period for the CsvSink specific for the worker instance
driver.sink.csv.unit=minutes

Full Documentation of this you can find in : https://spark.apache.org/docs/latest/monitoring.html#metrics

Spark: Reporting Total, and Available Memory of the Cluster

There are 1 best solutions below

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in CLUSTER-COMPUTING

Related Questions in AMAZON-EMR

Related Questions in ELASTIC-MAP-REDUCE

Trending Questions

Popular # Hahtags

Popular Questions