Spark has thrown error and removed all broadcast pieces, but still reported broadcast timeout

21 Views Asked by At

Spark version: 3.1.3

The following is the stack trace.

[WARN] org.apache.spark.storage.BlockManager - Putting block broadcast_211_piece0 failed due to exception java.nio.file.FileSystemException: /tmp/blockmgr-d4fb1377-363e-48b4-8c8f-717e15ac5c48/33: No space left on device.
[ERROR] org.apache.spark.broadcast.TorrentBroadcast - Store broadcast broadcast_211 fail, remove all pieces of the broadcast
[ERROR] org.apache.spark.sql.execution.exchange.BroadcastExchangeExec - Could not execute broadcast in 3600 secs.
java.util.concurrent.TimeoutException: null
    at java.util.concurrent.FutureTask.get(FutureTask.java:205)
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:194)
    at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:515)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeBroadcast$1(SparkPlan.scala:193)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
    at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:189)
    at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:203)
    at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareRelation(BroadcastHashJoinExec.scala:217)
    at org.apache.spark.sql.execution.joins.HashJoin.codegenInner(HashJoin.scala:449)
    at org.apache.spark.sql.execution.joins.HashJoin.codegenInner$(HashJoin.scala:448)
    at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:40)
    at org.apache.spark.sql.execution.joins.HashJoin.doConsume(HashJoin.scala:357)
    at org.apache.spark.sql.execution.joins.HashJoin.doConsume$(HashJoin.scala:355)
    at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:40)
    at org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction(WholeStageCodegenExec.scala:221)
    at org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:192)
    at org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:149)
    at org.apache.spark.sql.execution.aggregate.HashAggregateExec.consume(HashAggregateExec.scala:47)
    at org.apache.spark.sql.execution.aggregate.HashAggregateExec.generateResultFunction(HashAggregateExec.scala:605)
    at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:741)
    at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:148)
    at org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:95)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
    at org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:90)
    at org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:90)
    at org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:47)
    at org.apache.spark.sql.execution.joins.HashJoin.doProduce(HashJoin.scala:352)
    at org.apache.spark.sql.execution.joins.HashJoin.doProduce$(HashJoin[2024-03-12 13:43:54,247]

I know that because the driver didn't have the enough space to store the broadcast block, spark threw the error about removing all the broadcast blocks. I think that after the error, the broadcasting would be aborted, but it seems that spark still did broadcasting and threw the next error about the broadcasting timeout. (I set the broadcast timeout for 1 hour) Why did it happen?

0

There are 0 best solutions below