Recently, I tried to upgrade our cluster from 2.6.5 to 3.1.3 but failed. I then tried to rollback to the older version, but some strange things happened. Now, our cluster's datanode can't report the block's situation to the the Active NameNode. The datanode throws this exception:

enter image description here

and too many of the blocks are corrupted. Before the upgrade, everything in our cluster was OK. The datanode throw this problem all the time, and the NameNode Web UI shows:

"There are xxx missing blocks. The following files may be corrupted"

I suspect the HA NameNode is the issue, but I checked the namenode's log and it shows:

enter image description here

The HA cluster is OK, but datanode's report still can't find which host is active.

0

There are 0 best solutions below