Menu:
Home
UP
EDITORIALS
Links
Contact Info
My Resume
My Public Key
EMAIL Eric
BadTux Portal[et]>
|
Why You Should Avoid Threads With A Passion
My first design for BRU-Pro's tape server was a multi-threaded
monolithic server. After all, that's the fastest/most standard
method of organizing a server, right? That's what all the 'Softies
say anyhow, and you know Microsoft is never wrong, correct?
My initial prototype, however, proved that this approach was wrong-headed
and evil. Here's my observations:
- Resource management sucks. Resource management in multi-threaded
applications is a pain in the #$%!@. Using semaphors to protect all global
variables in partition is a PITA. Now I hear you say, "well, you shouldn't
have global variables!". But for some things, it's the most sensible way of
doing things. For example, if I instantiate a cryptrand object (which opens
a filehandle, does all sorts of magical things, and otherwise has a very hefty
startup time), I want to do it *ONCE* and then grab random numbers from it
as needed without having to do it every time I need random numbers.
- Killing threads sucks. This is part and parcel of the
resource management problem. Leaving a semaphor hanging around on a shared
structure will cause the whole program to lock solid the next time someone
needs that resource. I direct you to this text from the msdn.microsoft.com
web site:
- TerminateThread: TerminateThread can result in the following
problems:
- If the target thread owns a critical section, the critical section will not be released.
- If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's
process could be inconsistent.
- If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed,
affecting other users of the DLL.
What this basically means is that you can't kill a thread. You can ask
a thread nicely to kill itself, but you can't otherwise control its execution
once it's started running. The Unix process model, on the other hand, has
a mechanism to send a signal to a running process and have it terminate after
it tidies itself up. If, after a couple of seconds, it still hasn't terminated,
you can terminate it with extreme prejudice and *ALL* resources allocated by
that process are freed, even the semaphors.
- It encourages monolithic programs. Monolithic programs are
Evil(*). Componetized
programs are the Real Deal.
- Threaded programs are hard to debug.. Any thread can stomp on
any other thread's memory, you have the problem of hanging locks, etc.
With processes, you have no stomping, and locks can be auto-released
by the system when the process exits.
- It ain't
Unix!. Unix is based upon the principle of "many small tools
chained together". Unix was the original component-based operating
system, up until the day that the X window system came around and
ruined it.
Okay, so threads suck. So what IS the best way to organize a Unix program?
Here's a hint: fork runs at the same speed as spawn on Linux.
BRU-Pro was a classic three-box design:
- Client: This is the part the user interacts with. We had three
different clients -- web client, CLI client, and GUI (GTK+) client.
All clients were written in Python, and called programs on the tape
server to do the actual work.
- Tape server: This consisted of a
secure execution server similar to an augmented 'ssh' (we used a
Kerberos-style ticket system and would only execute programs within
the 'sandbox', and only after stripping all magic shell characters
from the command line to be executed), and a bunch of programs that
the client called to do the actual work. The programs ranged in size
from tiny 50 line applications that did nothing but set a flag in a
database record, to several-thousand-line backup and restore programs
that forked off a half-dozen processes. Every program executed
directly was a Python program, though "C" modules or external "C"
programs were called for everything that was CPU intensive or required
low-level access that could not be done via Python.
- Backup agent: This lived on the machines to be backed up (or
restored). This was yet another secure execution server, though much
simpler than the tape server's because it did not have to do ticketing
(the public key encryption used verified that only the tape server
could contact it, and any extensive security checking was done on the
tape server side, not on the backup agent side). Again, the tape
server did its work by calling programs on the backup agent via
'bprsh' (which worked similar to 'rsh' but was actually secure, unlike
'rsh'). This was solely "C" and /bin/sh because it had to actually fit
on a recovery floppy. Thus the emphasis upon simplicity -- "C" is a pain
to program large programs in.
So basically, BRU-Pro was a set of programs being run via 'ssh' commands.
Now let's look at the common arguments against that:
- Critic: Forking all those processes is slow.
Fact: On Linux, fork() takes the same amount of time as spawn().
It is exec() that is the time consumer. For the tape server, what we
did was pre-load all commonly used modules into the CTSP (Client-Tape
Server Protocol) server. When one of those commonly-used modules was
executed via the 'bsh' command from the client, the CTSP server
forked, but did not exec -- it instead called the main() entry point
of the pre-loaded module.
- Critic: Response time is more sluggish. Fact: On any
three-box architecture, response time is going to be sluggish. If you
wish to query the files on a particular machine, for example, first
the client must send the request to the tape server, then the tape
server must send the request to that particular machine, and then the
results get passed back via the same two hops. Some backup systems,
such as NDMP, bypass that by allowing clients to talk directly to end
user machines. In my opinion that is either a security nightmare or a
programming nightmare -- backup agents should be small enough to fit
on a recovery disk, they should not have everything and the kitchen
sink thrown into them.
I benchmarked the servelets on a Celeron 300
(not even a 300a), and we managed over 8 rsh operations per
second. On a Celeron 300. You aren't going to click your mouse 8 times per second in
normal use (and each operation did a big chunk of work, a mouse click
was basically one operation).
The fact of the matter was that any sluggishness in BRU-Pro was either
because of a) inefficiencies in the way we did things (which we knew
about, but did not have time to fix for release 1.0 of the product, we
were more concerned about being right than being fast for 1.0, 1.1
fixed most of those inefficiencies), or b) inherent in the three-box
architecture. The organization of the program had nothing to do with it.
- Critic: Startup time of those big programs is slow!
Fact: Those big programs were pre-loaded. We wasted no time loading
them off of disk. Their startup time would have
been the same whether they were forked or spawned.
- Critic: All this forking increases the memory footprint!
*FINALLY* a well warranted claim! This one was true. While all modern
Unix variants employ copy-on-write semantics for their fork() call, meaning
that memory is shared until such time as it is written to, the fact is
that the memory footprint IS larger than in a threaded design. And
256 megabytes of memory now sells for under $100. The point? I'd rather
have reliability than small footprint in today's world. The days when I
hand-optimized 6502 assembly code to fit into a 4k area of memory are
long gone.
So:
- Threads are evil.
- Secure remote execution servelets are good
- A true Unix program is "many small programs chained together".
So what are you waiting for? Burn your threads and fork it!
-- Eric
(*) Evil(tm) is a trademark of Microsoft Corporation and is used without permission.
|