C# Application Eventual Crash in Windows CE 5

393 Views Asked by At

I have written a pretty basic application in C# (.NET Compact Framework 2.0) using UDP sockets.

The program works fine for awhile (up to a couple weeks at a time), but always fails eventually. On top of my clients not being able to reconnect, this bug seems to adversely kill all activity from the associated NIC. Once this happens, I am no longer able to remote into the device (using CE Remote Display) - which is my only means of getting additional feedback for debugging. So at this point, I am not 100% certain whether the application itself crashes, or I am breaking something within the operating system via my socket code.

I have implemented an unhandled exception event that never gets raised. I also have a number of try/catch blocks that would output the exception message to a text file. I am not seeing any exceptions being thrown.

/// Removed old TCP code.

The clients themselves are simple little gateway devices that are configured as UDP servers. This is a remote system that I have access to sparingly, and although I have a test controller and gateway unit, the conditions are not identical and I have not yet been able to reproduce the issue on my end.

TIA for any feedback.

Edit:

I've been running with my test bench demo and periodically checking netstat on the server per some comment suggestions. In CE5 netstat does not take the -a flag so I've been using -n (not sure if this is going to tell me what I need...). I have been disconnecting and reconnecting my clients several times, forcing half-opens by unplugging Ethernet, etc. and the netstat table is only showing one connection per client (at the appropriate ports).

Edit 2:

Due to the sparse nature of the messaging during production, I changed the application over to connectionless UDP messaging, but I am still experiencing the same behavior (with about the same amount of time to failure). On my test hardware, the application runs successfully indefinitely with a high rate of messages (once every few seconds). However, in production where messages would be a lot less frequent, the program fails after running for about 10 days. I wouldn't think inactivity would matter, but perhaps I've got that wrong? Looking for any suggestions I can get.

New Send/Receive code:

    public void Send(string Message)
    {
        Socket udpClient = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
        EndPoint ep = new IPEndPoint(IPAddress.Parse(_ipAddress), _port);

        udpClient.Connect(ep);

        byte[] data = Encoding.ASCII.GetBytes(Message);
        // async send, sync receive
        udpClient.BeginSendTo(data, 0, data.Length, SocketFlags.None, ep, (ar) =>
        {
            try
            {
                udpClient.EndSendTo(ar);
                _lastSent = Message;

                string msg = this.ReceiveSync(udpClient, 3);
                if (!string.IsNullOrEmpty(msg))
                {
                    _lastReceived = msg;
                    DataReceived(new ReceiveDataEvent(_lastReceived));
                }
            }
            catch { }
            finally
            {
                udpClient.Close();
            }

        }, null);
    }

    private string ReceiveSync(Socket UdpClient, int TimeoutSec)
    {
        string msg = "";
        byte[] recBuffer = new byte[256];

        int elapsed = 0;
        bool terminate = false;
        do
        {
            // check for data avail every 500ms until TimeoutSecs elapsed
            if (UdpClient.Available > 0)
            {
                int bytesRead = UdpClient.Receive(recBuffer, 0, recBuffer.Length, SocketFlags.None);
                msg = Encoding.ASCII.GetString(recBuffer, 0, recBuffer.Length);
                terminate = true;
            }
            else
            {
                if ((elapsed / 2) == TimeoutSec) 
                    terminate = true;
                else
                {
                    elapsed++;
                    System.Threading.Thread.Sleep(500);
                }
            }
        } while (!terminate);

        return msg;
    }
1

There are 1 best solutions below

4
josef On

You probably run out of sockets on the server (Windows CE 5, 32bit OS). See similar at Is there a limit on number of tcp/ip connections between machines on linux?. "...Once a TCP socket is closed (by default) the port remains occupied in TIMED_WAIT status for 2 minutes..."

I am missing information on how many clients create/close connections per time. You probably have to thing about socket option SO_REUSEADDR (https://learn.microsoft.com/en-us/previous-versions/windows/embedded/ms884940%28v%3dmsdn.10%29).

You may do a circular network trace (30mins or so, just enough to have the chance to see what happens before, depends on how fast you can stop the trace after 'crash') in the server's subnet, to see what happens just before the 'crash'.

Another tought is to reboot the server periodically (one's in night), as all Windows Mobile CE devise do not run well 24/7.

Our customers use a lot of Windows Embedded Handheld 6.5 (CE5 based) devices. Even if they do not much network, these devices work most stable over the day, if they are rebooted every night. A periodic reboot would also reveal a faulty NIC driver on the CE5 server (who knows, some companies are not doing well in Platform software). Or try another vendor's NIC.

BTW: I have written my own netstat for Windows Mobile: http://www.hjgode.de/wp/2013/09/24/mobile-development-netstat-know-your-devices-open-ports/. I did not test it on Windows CE5, but it should work or can be made to work on CE5 too.