DolphinScheduler why task instance is nul

486 Views Asked by At

Describe the question I have a workflow that is executed once every hour. Some tasks started at the time node cannot be completed normally and have been running all the time. The worker log finds errors:【[ERROR] 2020-04-26 13:10:03.627 org.apache.dolphinscheduler.server.worker.runner.FetchTaskThread:[261] - task instance is null. task id : 245 】

version1.2.0 of DolphinScheduler:

dolphinscheduler-master.log

[INFO] 2020-04-26 13:10:03.461 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerThread:[119] - start master exec thread , split DAG ...
[INFO] 2020-04-26 13:10:03.465 org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[296] - prepare process :158 end
[INFO] 2020-04-26 13:10:03.468 org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[792] - add task to stand by list: sparkPi
[INFO] 2020-04-26 13:10:03.468 org.apache.dolphinscheduler.common.queue.TaskQueueFactory:[45] - task queue impl use zookeeper
[INFO] 2020-04-26 13:10:03.468 org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[801] - remove task from stand by list: sparkPi
[INFO] 2020-04-26 13:10:03.470 org.apache.dolphinscheduler.dao.ProcessDao:[769] - start submit task : sparkPi, instance id:158, state: RUNNING_EXEUTION,
[INFO] 2020-04-26 13:10:03.474 org.apache.dolphinscheduler.common.queue.TaskQueueZkImpl:[99] - check task:2_158_2_0_-1 not exist in task queue
[INFO] 2020-04-26 13:10:03.628 org.apache.dolphinscheduler.common.queue.TaskQueueZkImpl:[99] - check task:2_158_2_245_-1 not exist in task queue
[INFO] 2020-04-26 13:10:03.628 org.apache.dolphinscheduler.dao.ProcessDao:[973] - task ready to queue: TaskInstance{id=245, name='sparkPi', taskType='SHELL', processDefinitionId=9, processInstanceId=158, processInstanceName='null', taskJson='{"depList":[],"dependence":"{}","forbidden":false,"id":"tasks-36923","maxRetryTimes":0,"name":"sparkPi","params":"{"rawScript":"sshpass -p hadoop ssh -o StrictHostKeyChecking=no [email protected] \"sh /data/hadoop/liucf/dsTest/sparkPi.sh\"","localParams":[],"resourceList":[]}","preTasks":"[]","retryInterval":1,"runFlag":"NORMAL","taskInstancePriority":"MEDIUM","taskTimeoutParameter":{"enable":false,"interval":0},"timeout":"{"enable":false,"strategy":""}","type":"SHELL","workerGroupId":-1}', state=SUBMITTED_SUCCESS, submitTime=Sun Apr 26 13:10:03 CST 2020, startTime=Sun Apr 26 13:10:03 CST 2020, endTime=null, host='null', executePath='null', logPath='null', retryTimes=0, alertFlag=NO, flag=YES, processInstance=null, processDefine=null, pid=0, appLink='null', flag=YES, dependency=null, duration=null, maxRetryTimes=0, retryInterval=1, taskInstancePriority=MEDIUM, processInstancePriority=MEDIUM, workGroupId=-1}
[INFO] 2020-04-26 13:10:03.628 org.apache.dolphinscheduler.common.queue.TaskQueueZkImpl:[126] - add task : /dolphinscheduler/tasks_queue/2_158_2_245_-1 to tasks queue , result success
[INFO] 2020-04-26 13:10:03.628 org.apache.dolphinscheduler.dao.ProcessDao:[975] - master insert into queue success, task : sparkPi
[INFO] 2020-04-26 13:10:03.629 org.apache.dolphinscheduler.dao.ProcessDao:[786] - submit task :sparkPi state:SUBMITTED_SUCCESS complete, instance id:158 state: RUNNING_EXEUTION

dolphinscheduler-worker.log
[INFO] 2020-04-26 13:10:03.607 org.apache.dolphinscheduler.common.queue.TaskQueueZkImpl:[211] - consume tasks: [2_158_2_245_-1],there still have 0 tasks need to be executed
[ERROR] 2020-04-26 13:10:03.627 org.apache.dolphinscheduler.server.worker.runner.FetchTaskThread:[261] - task instance is null. task id : 245
[WARN] 2020-04-26 13:10:03.627 org.apache.dolphinscheduler.server.worker.runner.FetchTaskThread:[188] - remove task queue : 2_158_2_245_-1 due to taskInstance is null
[INFO] 2020-04-26 13:10:03.627 org.apache.dolphinscheduler.common.queue.TaskQueueZkImpl:[278] - consume task /dolphinscheduler/tasks_queue/2_158_2_245_-1```

1

There are 1 best solutions below

0
liveForExperience On

some tips for this situation:

  • clear the task queue in zk for path: /dolphinscheduler/task_queue
  • change the state of the task to failed( integer value: 6).
  • run the work flow by recover from failed
  • 1.2.1 solved this problem,you can upgrade to latest