Sample log4j.properties
# Set root logger level to WARN to suppress INFO and DEBUG messages.
log4j.rootCategory=ERROR, console
# Define the console appender (where logs will be printed to the console)
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set specific log levels for Spark components.
# This is often more effective than just the root logger for Spark-specific messages.
# Set the log level for the org.apache.spark package to WARN.
log4j.logger.org.apache.spark=ERROR
# You can also set levels for more specific Spark components if needed.
# For example, to silence excessive DAGScheduler INFO messages, you could try:
# log4j.logger.org.apache.spark.scheduler.DAGScheduler=WARN
# Similarly, for BlockManager messages:
# log4j.logger.org.apache.spark.storage.BlockManager=WARN
# If you still see INFO messages from other libraries (e.g., Netty, Hadoop), you can add more specific loggers.
# For example, to reduce Netty INFO messages:
# log4j.logger.org.apache.spark.network.netty=WARN
# log4j.logger.io.netty=WARN
# Example of setting level for Hadoop (if you see Hadoop related INFO messages)
# log4j.logger.org.apache.hadoop=WARN
Explanation of common settings
log4j.rootCategory
- Defines the root logger.
- First part (
WARN, ERROR, etc.): Logging level for the root logger. Messages at or above this level will be processed.
- Second part (
console): Appenders to use for the root logger (in this case, ‘console’ which we define below).
log4j.appender.console
- Defines an appender named ‘console’.
org.apache.log4j.ConsoleAppender: Specifies that this appender writes to the console.
log4j.appender.console.target=System.out: Output stream for the console appender (System.out is standard output).
log4j.appender.console.layout: Defines the layout for log messages in this appender.
org.apache.log4j.PatternLayout: Uses a pattern to format the log messages.
log4j.appender.console.layout.ConversionPattern: The actual pattern for formatting.
%d{yy/MM/dd HH:mm:ss}: Date and time format.
%p: Log level (WARN, ERROR, etc.).
%c{1}: Shortened class name of the logger (just the last part).
%m: The log message itself.
%n: Newline character.
log4j.logger.package.name
- Sets the log level for a specific package (or class).
- e.g.,
log4j.logger.org.apache.spark=WARN: Sets the level for all loggers under the org.apache.spark package to WARN.
Sample use
log_file=~/software/pyspark-env/log4j.properties
spark-submit \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file://${log_file}" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file://${log_file}" \
src/test_2.py