Have some problem with connecting to simulator

Nov 15, 2010 at 11:13 AM

Hi Nick,

 

When performing a large number of commands appears a strange situation with a connection to the simulator.

From Thales Simulator application:

Client from 127.0.0.1:3398 is connected
Client: 127.0.0.1:3398
Request: 0000A00002U
Parsing header and code of message 0000A00002U...
Searching for implementor of A0...
Found implementor ThalesSim.Core.HostCommands.BuildIn.GenerateKey_A0, instantiating...
Calling AcceptMessage()...
Calling ConstructResponse()...
Calling ConstructResponseAfterOperationComplete()...
Attaching header/response code to response...
Sending: 0000A100U6C1A2536BA8E393AF6E12AE0965D4C64E9D616
Calling Terminate()...
Implementor to Nothing
Client disconnected.
Client from 127.0.0.1:3399 is connected
...timeout here... Client disconnected. Client from 127.0.0.1:3402 is connected
...timeout here... Client disconnected.

From my application I get a log:

MasterHSM: tcpipsim: send
00 2A 30 30 30 30 42 55 30 32 31 55 36 43 31 41 .*0000BU021U6C1A
32 35 33 36 42 41 38 45 33 39 33 41 46 36 45 31 2536BA8E393AF6E1
32 41 45 30 39 36 35 44 34 43 36 34             2AE0965D4C64
MasterHSM: tcpipsim: answer length. recv error 10060
MasterHSM: disconnect: connection closed

My application connects and sends commands to ThalesSimulator, but the simulator does not receive the command.

After the expiration timeout my application closes the connection.


If I add after a connection delay (sleep 50ms), then everything works fine.

Do you have any idea how to fix it?

Coordinator
Nov 15, 2010 at 2:15 PM

The point where you get correct results if you introduce a 50ms connection delay leads me to believe that this is not a simulator error. It would be impossible for me to indicate what the problem is unless you can share some of your code (specifically, your socket implementation) and hope that it's a language I have experience with.

Nov 16, 2010 at 9:08 AM

OK, this is my С++ code. It's very easy and based on standard socket API.

#pragma pack(push, 1)
struct buffer_s {
unsigned short size;
char buffer[510];
};
#pragma pack(pop)

typedef struct buffer_s buffer_t;


int vexchange(const char *cmd, size_t cmdsize, char *out, size_t outsize, char *outprn, size_t outprnsize) {
buffer_t buffer;
buffer_t answer;
SOCKET sock;
sockaddr_in remote_addr;
unsigned short port = 9998;
int res;

sock = socket(AF_INET, SOCK_STREAM, 0);
if(sock == INVALID_SOCKET)
return -1;

int timeout = 10000; //in ms
ULONG reuseaddr = 1;
setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (char *)&timeout, sizeof(int));

remote_addr.sin_family = AF_INET;
remote_addr.sin_addr.s_addr = inet_addr("127.0.0.1");
remote_addr.sin_port = htons(port);

res = connect(sock, (SOCKADDR*)&remote_addr, sizeof(remote_addr));
if( res == SOCKET_ERROR ) {
syslog(0, "tcpipsim: can't connect to simulator");
return -2;
}
Sleep(50); // a time delay for the successful execution of the exchange

buffer.size = htons(cmdsize);
memcpy(buffer.buffer, cmd, cmdsize);

syslog(0, "tcpipsim: send");
syslogdump(0, (const char *)&buffer, cmdsize + 2);

res = send(sock, (const char *)&buffer, cmdsize + 2, 0);
if( res == SOCKET_ERROR ) {
syslog(0, "tcpipsim: send error %d", WSAGetLastError());
disconnect(sock);
closesocket(sock);
return -3;
}

memset(&answer, 0, sizeof(answer));
res = recv(sock, (char *)&answer.size, 2, 0);
if( res == SOCKET_ERROR || res == 0 ) {
syslog(0, "tcpipsim: answer length. recv error %d", WSAGetLastError());
disconnect(sock);
closesocket(sock);
return -4;
}

syslog(0, "tcpipsim: answer length read");
syslogdump(0, (char *)&answer.size, 2);

answer.size = ntohs(answer.size);
res = recv(sock, (char *)answer.buffer, answer.size, 0);
if( res == SOCKET_ERROR || res == 0 ) {
syslog(0, "tcpipsim: answer body. recv error %d", WSAGetLastError());
disconnect(sock);
closesocket(sock);
return -5;
}

syslog(0, "tcpipsim: answer body read");
syslogdump(0, answer.buffer, answer.size);

memcpy(out, answer.buffer, answer.size);
out[answer.size] = 0;

if( outprn ) { //if the command involves printing on a printer
buffer_t answer_prn;
memset(&answer_prn, 0, sizeof(answer_prn));
res = recv(sock, (char *)&answer_prn.size, 2, 0);
if( res == SOCKET_ERROR || res == 0 ) {
syslog(0, "tcpipsim: printer answer length. recv error %d", WSAGetLastError());
disconnect(sock);
closesocket(sock);
return -6;
}

syslog(0, "tcpipsim: printer answer length read");
syslogdump(0, (char *)&answer.size, 2);

answer_prn.size = ntohs(answer_prn.size);
res = recv(sock, (char *)answer_prn.buffer, answer_prn.size, 0);
if( res == SOCKET_ERROR || res == 0 ) {
syslog(0, "tcpipsim: printer answer body. recv error %d", WSAGetLastError());
disconnect(sock);
closesocket(sock);
return -7;
}

syslog(0, "tcpipsim: printer answer body read");
syslogdump(0, answer.buffer, answer.size);

memcpy(outprn, answer_prn.buffer, answer_prn.size);
outprn[answer_prn.size] = 0;
}

disconnect(sock);
closesocket(sock);
return 0;
}

int disconnect(SOCKET socket) {
int rc;
char buffer[512];

// shutdown the connection since no more data will be sent
rc = shutdown(socket, SD_SEND);
if( rc == SOCKET_ERROR ) {
syslog(0, "disconnect: shutdown failed with error: %d\n", WSAGetLastError());
closesocket(socket);
return -1;
}

// Receive until the peer closes the connection
do {
rc = recv(socket, buffer, sizeof(buffer), 0);
if( rc > 0 )
syslog(0, "disconnect: bytes received: %d\n", rc);
else if ( rc == 0 )
syslog(0, "disconnect: connection closed\n");
else
syslog(0, "disconnect: recv failed with error: %d\n", WSAGetLastError());

} while( rc > 0 );
return 0;
}

Coordinator
Nov 16, 2010 at 9:26 AM

Down to the weeds eh? It's been quite a while since I've dealt with sockets at this level. Although I can follow the code, I don't feel too confident in saying that I would spot a problem, if one is there.

Come to think of it, Thales sim attaches event handlers to each new socket client after a connection has been established - perhaps this might lead it to miss the first message. If you want to see whether that is the problem, you can try changing the order of a couple of statements in the code, specifically in ThalesMain.vb method ListenerThread there is the following:

 

Dim wClient As New TCP.WorkerClient(SL.AcceptTcpClient())
wClient.InitOps()

AddHandler wClient.Disconnected, AddressOf WCDisconnected
AddHandler wClient.MessageArrived, AddressOf WCMessageArrived

 

You can try changing it to:

Dim wClient As New TCP.WorkerClient(SL.AcceptTcpClient())

AddHandler wClient.Disconnected, AddressOf WCDisconnected
AddHandler wClient.MessageArrived, AddressOf WCMessageArrived

wClient.InitOps()

 

Nov 16, 2010 at 10:57 AM

Hi Nick,

I change the code as you say and I have got this:

 

Client disconnected.
Exception on listening thread (Access to the object can not be liquidated.
Object name: "System.Net.Sockets.Socket".)

 

Is it helpfull?

Remark: code work much better and exception occurs much later.


 

Nov 16, 2010 at 11:02 AM

Stack trace. maybe it help:

in System.Net.Sockets.Socket.get_RemoteEndPoint()    
in ThalesSim.Core.TCP.WorkerClient.ClientIP()
in ...\thales-sim\ThalesCore\TCP\WorkerClient.vb:line 118
in ThalesSim.Core.ThalesMain.ListenerThread()
in ...\thales-sim\ThalesCore\ThalesMain.vb:line 563"

 

 

Coordinator
Nov 16, 2010 at 11:12 AM

In the altered code, is the Logger.MajorInfo statement after the wClient.InitOps() statement?

Nov 16, 2010 at 11:23 AM

Yes.

    'Thread for the host listening socket.
    Private Sub ListenerThread()

        ReDim WC(-1)

        Try

            SL = New TcpListener(New System.Net.IPEndPoint(0, port))
            SL.Start()

            While True
                Dim wClient As New TCP.WorkerClient(SL.AcceptTcpClient())

                AddHandler wClient.Disconnected, AddressOf WCDisconnected
                AddHandler wClient.MessageArrived, AddressOf WCMessageArrived

                wClient.InitOps()

                Logger.MajorInfo("Client from " + wClient.ClientIP + " is connected")

                curCons += 1

                Dim slotedIt As Boolean = False

                For i As Integer = 0 To WC.GetUpperBound(0)
                    If WC(i) Is Nothing OrElse WC(i).IsConnected = False Then
                        WC(i) = wClient
                        slotedIt = True
                        Exit For
                    End If
                Next

                If slotedIt = False Then
                    ReDim Preserve WC(WC.GetLength(0))
                    WC(WC.GetUpperBound(0)) = wClient
                End If

                While curCons >= maxCons
                    Threading.Thread.Sleep(50)
                End While

            End While

        Catch ex As Exception
            Logger.MajorInfo("Exception on listening thread (" + ex.Message + ")")
            If Not SL Is Nothing Then
                SL.Stop()
                SL = Nothing
            End If
        End Try

    End Sub

Coordinator
Nov 16, 2010 at 11:53 AM

Geez, I have no clue where the problem comes from. Please let me have a look at it again tonight.

Nov 16, 2010 at 11:57 AM

Ok, It not a hot problem. I just added Sleep(50) in my code.

But I think that there is a bug in the simulator and need help to fix it.
I can write a test application which reproduces the error, if it helps the cause.

Coordinator
Nov 16, 2010 at 12:35 PM

If you can mail a small stand-alone executable that connects to 127.0.0.1, default port, and sends a command it would be a big step in helping me reproduce the problem. If you can do that, send me a private message so I can contact you via email.

Nov 16, 2010 at 4:40 PM

Howto send to you private message?

Coordinator
Nov 16, 2010 at 5:11 PM

Got your message and replied.

Nov 16, 2010 at 7:38 PM

I sent a letter with the test application. Check your mailbox.

Coordinator
Nov 16, 2010 at 7:46 PM

Got it, checking it now.

Coordinator
Nov 16, 2010 at 8:21 PM

On my PC the problem indeed occurred. When I moved the wClient.InitOps() statement after attaching the event handlers the problem went away...but when I tried to do a lot of things on the simulator window (like moving the window or changing the detail event) the problem sometimes re-appeared. When I used the simulator with Info or less detail level, the problem never appeared. With full detail the host messages were processed at a rate of 20 per second. Without details, the host messages were processed at a rate of 500 per second.

The fact is that the event handlers that receive event information simply do a lot of displaying on the output window synchronously. One thing that could change in the simulator is to dispatch the work done by the event handlers on a different thread or to queue the event notifications and peel off the queue on the GUI thread. This is something that I will look into when I get the chance. To be honest, it's going to get a very low priority because the simulator isn't really intended for any heavy use such as this.

So my initial guess turned out to be incorrect, since your code works just fine. One thing you may want to change, though, is to re-use the same socket as much as possible. Currently, when you're sending a command you open a new socket every time. This might get you into trouble in a production environment for various reasons so I think it's best to do a minor rework so you can establish a socket and send messages through it as long as it is alive.

wClient.InitOps()
Nov 17, 2010 at 7:45 AM
Edited Nov 17, 2010 at 7:48 AM

Exactly the same behavior of the program was for me...

Ok. it's not a big problem for me. 50ms on one connect solve the problem.

Somehow I thought that one command - one connection. I am redesign my apllication  and I have got incredible speed without crashes.

Thank you very much for the idea of a single connection.

 

P.S. but I still believe that somewhere there is a small mistake - should not lose messages TCP/IP

Nov 17, 2010 at 8:07 AM

Sorry, but I found one little bug in user interface: in exception handler in ListenerThread (ThalesMain.vb:591) you are stopped SL socket (TcpListener) but in interface all looks like normal state. 

Pressing button "Stop simulator" and then "Start simulator" rearm the simulator.

Coordinator
Nov 17, 2010 at 11:45 AM

You're right. I know that you got an exception there but I never got the exception mentioned, despite the fact that messages appeared to be lost.