I am currently running Scala Spark applications on EMR serverless and all of the logs are getting output to stderr and logged at info level. Looking at this page it seems like this is the default for spark;
https://github.com/apache/spark/blob/v3.3.0/conf/log4j2.properties.template
I have been able to adjust the logging level using these lines based on the info here https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/log4j2.html
"rootLogger.level": "warn"
"logger.myapp.name": "com.myapp",
"logger.myapp.level": "info",
but everything is still coming out to stderr. How would I configure this to have my application logs appear in stdout instead? I've tried the below configuration but it doesn't have any effect.
{
"classification": "spark-executor-log4j2",
"configurations": [],
"properties": {
"rootLogger.level": "warn",
"appender.myapp.target": "stdout",
"appender.myapp.layout.type": "PatternLayout",
"appender.myapp.name": "myapp",
"appender.myapp.type": "Console",
"logger.myapp.name": "com.myapp",
"logger.myapp.level": "info",
"logger.myapp.appenderRef.myappout.ref": "myapp",
"logger.myapp.appenderRefs": "myappout"
}
}
Try and define the appender to point to
stdout. And make sure that you correctly definePatternLayoutto control the log message format.Also, associate the defined appender with the logger. The
appenderRefshould point to the defined appender.Then, set up the root logger to utilize the correct appender, as seen here.
I have set
logger.myapp.additivitytofalse, to prevent the logs from being propagated to the root logger, ensuring that they only go to the appenders defined inlogger.myapp.The
stdoutappender is associated with both the root logger andmyapplogger to make sure logs from both the root andmyapploggers are sent tostdout.The root logger is configured to have a
warnlevel and associated it with thestdoutappender to ensure all logs with levelwarnor higher from classes not matched by other loggers would go tostdout.Since the configuration set is limited, it might not be straightforward to log to stdout instead of stderr directly using the log4j2 properties overrides available in EMR Serverless.
As an alternative, considering the restrictions with EMR Serverless, you may consider capturing the logs written to stderr in your application and then writing them to stdout. This approach is more of an application-level solution than a logging configuration solution.
In your Scala Spark application, you can capture the logs programmatically and then print them to stdout. Here is a conceptual example in Scala where SLF4J and Logback libraries are used to handle logging:
Messages logged via
System.out.printlnwill go to stdout, and messages logged via the logger will go to stderr (based on your existing log4j2 configuration).Please try handling it programmatically as suggested and see if it works for your setup. It's a workaround, but given the restrictions, it might be one of the viable solutions.