I have an Azure webrole project which involves a long startup task of installing 3rd party software on the instance; Occasionally, I've seen instances that don't respond, so I'm implementing a probe, for the load balancer to take note of this and not direct traffic to bad instances. This of course isn't enough - what I'd want is for Azure (Fabric?) to then reboot the instance, and if that doesn't help (that is, make the instance reply properly to the probe) - reimage the instance. Is that the behavior, and if so, where is that documented? I searched for quite a while but didn't find anything useful.
Thanks
Using the management API you should be able to externally monitor your role instances. Then, if one is taking to long you should be able to force it to be re-imaged.