Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Peculiarities of how START launches processes

Mar
10
0
Hello :-)

This is a bit of an odd request. I am on a bit of a detective case right now, and the problem I am trying to figure out isn't with TCC itself, but the evidence suggests that TCC, or something TCC is doing, plays a key role. Let me explain. :-)

I am a software developer and one of my projects involves a local Named Pipe server to which multiple clients can connect. Recently, I discovered a bizarre problem, where if many clients started in quick succession, some of them wouldn't be able to connect to the server at all (Windows behaves as if the server doesn't exist). Naturally, being a TCC user, the way I launched by barrage of clients was:

for /L %i in (1,1,50) do start Client.exe

When this is run, 50 windows pop up, but some very significant subset of them (10+ typically) at random get an error from the OS saying that no such named pipe server exists, even when the processes immediately before and after might succeed in connecting.

This immediately made me wonder if my pipe server isn't implemented properly, but after a good deal of piece-by-piece deconstruction and consultation with documentation and reference implementations, I can find nothing wrong with my server. Microsoft's documentation also specifically states that a client attempting to connect to a pipe will wait for the pipe server to come into existence if necessary. So, I set about creating a minimal reproduction. As part of this reproduction, I created a project to launch Client.exe in a controlled manner (and a manner repeatable by people who are not blessed to know the wonder of TCC :-). My project launches Client.exe in the most straightforward manner possible. I'm using .NET, so the OS API is thinly wrapped, but I am in essence doing nothing other than a straightforward CreateProcessEx (Process.Create). I could not reproduce the problem. Even if my driver application launches many hundreds of client instances as fast as it can, every single one gets its own connection to the server. I gradually reintroduced pieces so that my "minimal" reproduction became closer and closer to the real server, but nothing would reproduce the problem. Finally, tearing my hair out trying to figure out how it was different, I happened to use a for loop in TCC to start my minimal reproduction's Client.exe using the START built-in. Immediately, the problem recurred. I pared my test app back down, and the problem continues to occur but only if I am using TCC's START to launch the client processes.

Therefore I am wondering whether anyone from JP Software might be able to shed some light into what exactly TCC's START is doing beyond just calling CreateProcessEx to launch a child process. It seems that it must be doing something and whatever it is is causing some bizarre interference with Windows' named pipe infrastructure. Could it be related to jobs? I'm trying to figure this out, to see whether there's anything I need to change in my server to improve its reliability. I currently only know how to reproduce this problem using TCC to START the client processes, but without knowing why it is happening, I can't help but worry that whatever the underlying cause is, it could end up being accidentally triggered some other way and cause issues for our software down the road.

If necessary, I can probably share my minimal reproduction project (three .NET console applications, Server, Client and ClientRunner) if there's some chance that it'll help isolate the cause.

Hoping you can help me :-)

Thanks very much,

Jonathan Gilbert
 
START does not use CreateProcess, it uses ShellExecute(Ex).
Did you strace your clients to see how and why they are failing?
 
I haven't done a trace on them, but the .NET wrapper around named pipes is relatively thin. Based on a review of the reference source, the exception that arises reveals that the code is getting system error 121 from either WaitNamedPipe or CreateFile. I'm not entirely sure how to catch exactly which call it is, since Windows doesn't have a straight-up API monitor (I am aware of Rohitab's tool by exactly that name, but it doesn't seem to work on my Windows 10 64-bit installation), and in order to reproduce this problem, I am constrained to launching processes in bulk outside of the debugger. I'll see if I can't figure something out, though.
 
START does not use CreateProcess, it uses ShellExecute(Ex).
Did you strace your clients to see how and why they are failing?
That's not, in general, true. Using WinDBG with breakpoints on CreateProcess, ShellExecute, and ShellExecuteEx ...
Code:
START
uses CreateProcess (and not SE or SEEx) to start a new instance of TCC.
Code:
START notepad
uses CreateProcess (and not SE ot SEEx) to start notepad.
Code:
START https://www.google.com
uses both SE and SEEx (with SEEx ultimately calling CreateProcess if a new instance of the browser is needed).
Code:
START c:\
uses SE and SEEx
Code:
START file.pdf
used CreateProcess (and not SE or SEEx, a bit of a surprise).
 
Okay I duplicated the reference source for connecting to a named pipe into my project and can confirm that the ERROR_SEM_TIMEOUT error is coming back from WaitNamedPipe. This is in fact documented as being the expected result if the timeout period elapses, which suggests to me that this is essentially a bug in the .NET Framework; the code clearly tries to detect timeouts and turn them into TimeoutException objects instead, but this obvious case, where WaitNamedPipe's timeout elapses, doesn't get handled properly and a generic exception object is returned. :-P

So, this suggests that my pipe server isn't keeping up with the requests coming into it -- but only when TCC is doing a FOR loop. The next thing I'm going to check, then, is whether TCC is perhaps burning 100% CPU during the execution of the loop -- the hypothesis being that perhaps my server is being starved for CPU and isn't able to accept connections fast enough. For reference, remember that a minimal application that just calls CreateProcess in a loop as fast as it can can create hundreds of processes without triggering this problem, but for /L %i in (1,1,50) start Client.exe reliably produces several handfuls of failed connections every single time.
 
Well, there goes that theory. This graph shows a for loop creating not 50 but 100 child processes:

2270
 
I also did a check earlier using ShellExecute instead of CreateProcess from my driver, and it did not trigger the problem. So, whether START is using CreateProcess or ShellExecute does not seem to be a determining factor.
 
TCC doesn't do anything after the START (CreateProcess, not CreateProcessEx), unless you're running TCC inside a Take Command tab window (which requires a little communication between TCC & TCMD), or you've passed additional options to START to tell it to do something like wait for the START'd process to exit.

Is your client app a console or GUI app?
 
I have created a Git repository with a solution containing three projects that reproduce the issue on my end.

Git repository: logiclrd/TestPipeConnection

Projects:
- Server: Sets up a named pipe server, listens for connections, performs a basic protocol that does some unique communication with each client. Press Enter to exit.
- Client: Connects to an instance of Server, processes the data sent to it, and then sets its exit code based on whether it successfully connected and the data it received followed the expected protocol (0 == success).
- ClientRunner: Runs 100 instances of Client as quickly as possible and monitors their exit codes, reporting on the status of the batch to its console output.

To reproduce the findings so far in the thread:
1. Run Server.exe in one console window.
2. Run ClientRunner.exe in another console window. This will spam your system with 100 console windows, but they should all pretty much immediately say "Connected", and then they should disappear one by one as they received the expected data from the server. If all clients exit with the same code, ClientRunner reports this, and when I run it, the final line of output says:

All check results are "succeeded"

So far so good, this is what we want to see. But now, instead launch the Client instances directly using TCC's START command in a FOR loop:

[C:\code\TestPipeConnection\Client\bin\Debug]for /L %i in (1,1,50) do start Client.exe keepopen

The keepopen command-line option will cause Client.exe to wait until Enter is pressed if an exception occurs. (Otherwise the console window would disappear immediately and you wouldn't be able to see.)

I have two different systems where running this produces some significant fraction of windows with the error message:

System.IO.IOException: The semaphore timeout period has expired.

at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.Pipes.NamedPipeClientStream.Connect(Int32 timeout)
at Client.Program.Main(String[] args) in C:\code\TestPipeConnection\Client\Program.cs:line 25


(Technically, one of the systems is running 4NT 7.01.370, not TCC. Same behaviour.)
 
I am running TCC 19.10.51 x64. I notice a much newer version available for download. How does licensing intersect with that? Does the license purchased for TCC 19 carry forward, or is some sort of upgrade purchase required to run a newer version? I'd like to test with the latest version in addition to the 4NT 7.01.370 and TCC 19.10.51 I have tried so far, but I don't want to risk messing up my current properly-licensed TCC :-)
 
Your v19 licence will not work for v24, but there is a 30-day grace period. If you want to test v24, be sure to install it to a different location from your existing copy. (The installer should do this by default, but it never hurts to check.)
 
This seems to be a timing issue - TCC is creating the new processes faster than your app (or perhaps Windows?) can handle them. On my fairly fast system, almost all of the clients display that timeout error.

However, if I slow things down a bit:

Code:
for /L %i in (1,1,50) do (start Client.exe keepopen & delay /m 200)

then it works perfectly for all of them.
 
But surely my ClientRunner application runs them even faster -- it has no scripting overhead at all and simply calls Process.Start (CreateProcessEx) in a tight loop as fast as possible. I can set it to 200 child instances, and all the windows have appeared within about 5 seconds, and not a single one fails! TCC must be doing something different, but I can't begin to imagine what it might be.
 

Similar threads

Back
Top