In lab2b, I implemented the log replication, but I could not pass this test point. In Testfailagree2B, suppose S1 was the initial leader, whose term was 1. Then S2 disconnected and it started incrementing its term number. Then at certaim point, S2 connected to the net again, while S1 received the Start(106). Then S1 published AE to S2 and S3(the term number of log 106 was 1), and got the response saying its term number was out-of-date. So S1 converted into follower, then candidate and won the election again. However, the latest log was 106 whose term number was 1 and the new leader was not allowed to commit it. So the whole Raft system stopped progressing and finally timed out. I wonder how I can fix this tricky problem.
If I did not check log[N] == currentTerm while updating leaderCommit, I can pass all the tests in S2.It seems that the tests in lab2b are not that strict, which leads to trikier problem when coding lab2C(I am stuck in the test of figure8 and unreliable figure8).
There's a good chance something is wrong with the tests if what you have described is exactly what's happening. Since a Raft leader doesn't (and shouldn't) commit entries from previous terms, you cannot make progress at all. There's no trick to this.
You could force the previous entry to be committed by sending some idempotent log entries in the current term (something along the lines of an empty request) but if you're working on the MIT tests, you aren't really allowed to modify the tests.
What you've described is a common issue. The solution is usually to keep sending periodic log entries by an application that's using Raft to ensure that Raft keeps making progress.