Thursday, January 10, 2008

Multi-threading in .NET: Introduction and suggestions

One of the greatest understatements I've heard in a newsgroup was made by Patricia Shanahan, in a Java newsgroup in 2001: "Multi-threaded programming needs a little care." Multi-threading is probably one of the worst understood aspects of programming, and these days almost all application programmers need to understand it to some extent. This article acts as an introduction to multi-threading and gives some hints and tips for how to do it safely. Warning: I'm not an expert on the subject, and when the real experts start discussing it in detail, my head starts to spin somewhat. However, I've tried to pay attention to those who know what they're doing, and hopefully the contents of this article form at least part of a multi-threading "best practice".

This article uses the C# type shorthands throughout - int for Int32 etc. I hope this makes it easier for C# developers to read, and won't impede any other developers too much. It also only talks about the C# ways of declaring variables to be volatile and locking monitors. Developers using other languages can find the equivalents in their own preferred environment, I'm sure.

Introduction: What is multi-threading?
The fact that you're reading this article in the first place means you probably have at least some idea of what multi-threading is about: it's basically trying to do more than one thing at a time within a process.

So, what is a thread? A thread (or "thread of execution") is a sort of context in which code is running. Any one thread follows program flow for wherever it is in the code, in the obvious way. Before multi-threading, effectively there was always one thread running for each process in an operating system (and in many systems, there was only one process running anyway). If you think of processes running in parallel in an operating system (e.g. a browser downloading a file and a word processor allowing you to type, both "at the same time"), then apply the same kind of thinking within a single process, that's a reasonable way to visualise threading.

Multi-threading can occur in a "real" sense, in that a multi-processor box may have more than one processor executing instructions for a particular process at a time, or it may be effectively "simulated" by multiple threads executing in sequence: first some code for thread 1 is executed, then some code for thread 2, then back to thread 1 etc. In this situation, if both thread 1 and thread 2 are "compute bound" (all they're doing is computation, without waiting for any input from the network, or file system, or user etc) then that won't actually speed things up at all - in fact, it'll slow things down as the operating system has to switch between threads, and the memory cache probably won't be as effective. However, much of today's computing involves waiting for something to happen, and during that time the processor can be doing something else. Intel's "Hyper-Threading" technology which is on some of its more recent chips (bearing in mind that this article was written in early 2004!) is a sort of hybrid between this "real" and "simulated" threading - for more information, see Intel's web page on the subject.

How does multi-threading work in .NET?
.NET has been designed from the start to support multi-threaded operation. There are two main ways of multi-threading which .NET encourages: starting your own threads with ThreadStart delegates, and using the ThreadPool class either directly (using ThreadPool.QueueUserWorkItem) or indirectly using asynchronous methods (such as Stream.BeginRead, or calling BeginInvoke on any delegate).

In general, you should create a new thread "manually" for long-running tasks, and use the thread pool only for brief jobs. The thread pool can only run so many jobs at once, and some framework classes use it internally, so you don't want to block it with a lot of tasks which need to block for other things. The examples in this article mostly use manual thread creation. On the other hand, for short-running tasks, particularly those created often, the thread pool is an excellent choice.

Multi-threaded "Hello, world"
Here is virtually the simplest threading example which actually shows something happening:

using System;
using System.Threading;

public class Test
{
static void Main()
{
ThreadStart job = new ThreadStart(ThreadJob);
Thread thread = new Thread(job);
thread.Start();

for (int i=0; i < 5; i++)
{
Console.WriteLine ("Main thread: {0}", i);
Thread.Sleep(1000);
}
}

static void ThreadJob()
{
for (int i=0; i < 10; i++)
{
Console.WriteLine ("Other thread: {0}", i);
Thread.Sleep(500);
}
}
}



The code creates a new thread which runs the ThreadJob method, and starts it. That thread counts from 0 to 9 fairly fast (about twice a second) while the main thread counts from 0 to 4 fairly slowly (about once a second). The way they count at different speeds is by each of them including a call to Thread.Sleep, which just makes the current thread sleep (do nothing) for the specified period of time. Between each count in the main thread we sleep for 1000ms, and between each count in the other thread we sleep for 500ms. Here are the results from one test run on my machine:

Main thread: 0
Other thread: 0
Other thread: 1
Main thread: 1
Other thread: 2
Other thread: 3
Main thread: 2
Other thread: 4
Other thread: 5
Main thread: 3
Other thread: 6
Other thread: 7
Main thread: 4
Other thread: 8
Other thread: 9



One important thing to note here is that although the above is very regular, that's by chance. There's nothing to stop the first "Other thread" line coming first, or the pattern being slightly off - Thread.Sleep is always going to be somewhat approximate, and there's no guarantee that the sleeping thread will immediately start running as soon as the sleep finishes. (It will become able to run, but another thread may be currently running, and on a single processor machine that means the thread which has just "woken up" will have to wait until the thread scheduler decides to give it some processor time before it next does anything.)

As with all delegates, there's nothing to restrict you to static methods, or methods within the class that the delegate is used from. You need to have access to the method, of course, and if you want to specify an instance method, you have to use a particular instance. Here's another version of the program above, using an instance method in a different class. If the Count method had been static, the value of the job variable would have been new ThreadStart(Counter.Count). Most examples given in this article use methods within the same class, but that's just for brevity and simplicity.

using System;
using System.Threading;

public class Test
{
static void Main()
{
Counter foo = new Counter();
ThreadStart job = new ThreadStart(foo.Count);
Thread thread = new Thread(job);
thread.Start();

for (int i=0; i < 5; i++)
{
Console.WriteLine ("Main thread: {0}", i);
Thread.Sleep(1000);
}
}
}

public class Counter
{
public void Count()
{
for (int i=0; i < 10; i++)
{
Console.WriteLine ("Other thread: {0}", i);
Thread.Sleep(500);
}
}
}

No comments: