Simulating node failure for testing purposes

119 Views Asked by At

I am developing fault tolerance mechanisms for a distributed application in Rust. I need to simulate failure of one node (and eventually more). The kind of failure to simulate is a node crash. I want the application to completely exit with error in a controlled manner. I want to choose which node fails and I when it does (as much as possible).

The different nodes of the application communicate to each other as peer-to-peer. Each node executes two threads and it would be best if both are be terminated.

In my testing environment I have each node running on a thread (and this thread creates a second one) in my laptop, and a network port assigned to each.

A preliminary idea would be to randomly exit a thread given a probability. This idea does not provide me the control I need to only exit one node and in the exact moment of the application I want to test my fault tolerance mechanisms. Also, this would leave the second thread of a node executing (as far as I know).

I am looking for a way to simulate the node crash in a way I can control and reproduce the same crash whenever I need.

0

There are 0 best solutions below