How to configure EMR Serverless to log spark applications correctly to stdout and stderr

476 Views Asked by At

I am currently running Scala Spark applications on EMR serverless and all of the logs are getting output to stderr and logged at info level. Looking at this page it seems like this is the default for spark;

https://github.com/apache/spark/blob/v3.3.0/conf/log4j2.properties.template

I have been able to adjust the logging level using these lines based on the info here https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/log4j2.html

"rootLogger.level": "warn"         
"logger.myapp.name": "com.myapp",
"logger.myapp.level": "info",

but everything is still coming out to stderr. How would I configure this to have my application logs appear in stdout instead? I've tried the below configuration but it doesn't have any effect.

    {
      "classification": "spark-executor-log4j2",
      "configurations": [],
      "properties": {
        "rootLogger.level": "warn",
        "appender.myapp.target": "stdout",
        "appender.myapp.layout.type": "PatternLayout",
        "appender.myapp.name": "myapp",
        "appender.myapp.type": "Console",
        "logger.myapp.name": "com.myapp",
        "logger.myapp.level": "info",
        "logger.myapp.appenderRef.myappout.ref": "myapp",
        "logger.myapp.appenderRefs": "myappout"
      }
    }
1

There are 1 best solutions below

2
VonC On

Try and define the appender to point to stdout. And make sure that you correctly define PatternLayout to control the log message format.

Also, associate the defined appender with the logger. The appenderRef should point to the defined appender.
Then, set up the root logger to utilize the correct appender, as seen here.

{
  "classification": "spark-executor-log4j2",
  "configurations": [],
  "properties": {
    "rootLogger.level": "warn",
    "rootLogger.appenderRef.stdout.ref": "stdout",
    
    "appender.stdout.type": "Console",
    "appender.stdout.name": "stdout",
    "appender.stdout.layout.type": "PatternLayout",
    "appender.stdout.layout.pattern": "%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n",
    "appender.stdout.target": "SYSTEM_OUT",
    
    "logger.myapp.name": "com.myapp",
    "logger.myapp.level": "info",
    "logger.myapp.additivity": "false",
    "logger.myapp.appenderRef.stdout.ref": "stdout"
  }
}

I have set logger.myapp.additivity to false, to prevent the logs from being propagated to the root logger, ensuring that they only go to the appenders defined in logger.myapp.

The stdout appender is associated with both the root logger and myapp logger to make sure logs from both the root and myapp loggers are sent to stdout.

The root logger is configured to have a warn level and associated it with the stdout appender to ensure all logs with level warn or higher from classes not matched by other loggers would go to stdout.


This doesn't seem to work: I am getting various error messages when trying to start the job: Override property '%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n' is not supported. Override property 'SYSTEM_OUT' is not supported. Override property 'rootLogger.appenderRef.stdout.ref' is not supported.

Since the configuration set is limited, it might not be straightforward to log to stdout instead of stderr directly using the log4j2 properties overrides available in EMR Serverless.

As an alternative, considering the restrictions with EMR Serverless, you may consider capturing the logs written to stderr in your application and then writing them to stdout. This approach is more of an application-level solution than a logging configuration solution.

In your Scala Spark application, you can capture the logs programmatically and then print them to stdout. Here is a conceptual example in Scala where SLF4J and Logback libraries are used to handle logging:

import org.slf4j.LoggerFactory

object MyApp {
  val logger = LoggerFactory.getLogger("com.myapp")

  def main(args: Array[String]): Unit = {
    System.out.println("Logging to STDOUT")
    logger.info("Logging to STDERR")
  }
}

Messages logged via System.out.println will go to stdout, and messages logged via the logger will go to stderr (based on your existing log4j2 configuration).

Please try handling it programmatically as suggested and see if it works for your setup. It's a workaround, but given the restrictions, it might be one of the viable solutions.