Thursday, January 10, 2008

Constructors in C#

In the C# newsgroup, there was recently (at the time of writing) a discussion about various aspects of constructors. This page provides most of the important bits about how I believe constructors work in C#, including references to the language specification to back them up. If you disagree with anything in this page, I ask you to follow the language specification link to the appropriate section. If you then believe I'm incorrectly interpreting the spec, please mail me with details. The ECMA spec and the MS spec are identical as far as I've seen, apart from how they number their sections. I'll give links to both so you can use whichever you feel more comfortable with. In fact,

A little terminology...
The C# language spec refers to two types of constructors: instance constructors and static constructors. This page only deals with instance constructors, and I'll just call them constructors for short. That's what most people understand by the term "constructor" anyway. Static constructors are the equivalent of static initializers in Java - they give the code which is executed when a class is first used.

What is a constructor?
(ECMA) (MS).
To quote the spec, "An instance constructor is a member that implements the actions required to initialize an instance of a class." That's actually a pretty good description on its own. A constructor is invoked when you use the "new" operator, or use the various methods of reflection to create an instance of a class.

What do constructors look like?
A constructor looks very much like a method, but with no return type and a name which is the same as the name of the class. (The modifiers you can use with a constructor are slightly different, however - see the spec for more information about that.) Here's a short class to use as an example:

public class MySimpleClass
{
public MySimpleClass (int x)
{
Console.WriteLine (x);
}
}



This class has a single constructor, which takes an int as a parameter, and prints that int's value to the console.

Constructor Initializers
Now, in fact, there's something else going on here, which is hidden by the compiler. The above is actually equivalent to:

public class MySimpleClass
{
public MySimpleClass (int x) : base()
{
Console.WriteLine (x);
}
}



(ECMA) (MS).
Note the base() bit. That's saying which other constructor should be invoked before any of the rest of the code in the constructor runs. Every constructor of every class other than plain object does this, either explicitly or implicitly. There are two forms of constructor initializer - one which calls a base class constructor (as above) and one which calls another constructor from this class, using the this (...) syntax. There must always be a "chain" of constructors which runs constructors all the way up the class hierarchy. Every class in the hierarchy will have a constructor invoked, although some of those constructors may not explicitly appear in the code. (See the section on default constructors, later.) The parameters (if any) within the brackets of base(...) or this(...) are passed as the parameters to the invoked constructors. They can be the parameters given in the constructor declaration, but don't have to be. Here's an example:

public class MyBaseClass
{
public MyBaseClass (int x) : base() // Invoke the parameterless constructor in object
{
Console.WriteLine ("In the base class constructor taking an int, which is "+x);
}
}

public class MyDerivedClass : MyBaseClass
{
public MyDerivedClass () : this (5) // Invoke the MyDerivedClass constructor taking an int
{
Console.WriteLine ("In the derived class parameterless constructor.");
}

public MyDerivedClass (int y) : base (y+1) // Invoke the MyBaseClass constructor taking an int
{
Console.WriteLine ("In the derived class constructor taking an int parameter.");
}

public MyDerivedClass (string x) : base (10) // Invoke the MyBaseClass constructor taking an int
{
Console.WriteLine ("In the derived class constructor taking a string parameter.");
}
}



With the above code, a bit of code saying new MyDerivedClass(); would invoke the MyDerivedClass parameterless constructor, which would in turn invoke the MyDerivedClass constructor which takes an int parameter (with 5 as that parameter value), which would in turn invoke the MyBaseClass constructor which takes an int parameter (with 6 as that parameter value). Note that the specified constructor is run before the constructor body, so the result of new MyDerivedClass(); would be the following output on the console:

In the base class constructor taking an int, which is 6
In the derived class constructor taking an int parameter.
In the derived class parameterless constructor.



Not all constructors in the hierarchy need to be invoked (as demonstrated above - the constructor taking a string parameter is not invoked at all when you do new MyDerivedClass();) but as I said earlier, there must be at least one constructor invoked in each class in the hierarchy.

Default constructor initializers
(ECMA) (MS), second paragraph.
Any constructor which doesn't have a constructor initializer has one provided for it by the compiler. The initializer is of the form base() - in other words, a call to the base class's parameterless constructor. You'll get a compile-time error if you don't provide a constructor initializer and the base class doesn't have an accessible parameterless constructor. (Note, however, that it may have one without you putting one in yourself - see the next section.)

Default constructors
(ECMA) (MS)
If you don't specify any constructors at all, a default constructor is provided by the compiler. This default constructor is a parameterless constructor with no body, which calls the parameterless constructor of the base class. In other words:

public class MySimpleClass
{
int someMemberVariable;
}



... is exactly equivalent to:

public class MySimpleClass
{
int someMemberVariable;

public MySimpleClass() : base()
{
}
}



Following from the previous section, this means that if the base class has no accessible parameterless constructor (including a default one), you get a compile-time error if the derived class doesn't have any constructors - because the default constructor will implicitly try to call a parameterless constructor from the base type.

If the class is abstract, the default constructor provided has protected accessibility; otherwise it has public accessibility.

Instance variable initializers
(ECMA) (MS)
When a member variable declaration also has an assignment, that's called a variable initializer. All the variable initializers for a class are implicitly run directly before the invocation of whichever base class constructor is invoked. (Note that this is a change from a previous version of the page, where I believed that they were invoked after the base constructor, as they are in Java.) Here's a simple example of instance variable initializers:

public class MySimpleClass
{
int someMemberVariable = 10;

public MySimpleClass()
{
Console.WriteLine ("someMemberVariable={0}", someMemberVariable);
}
}



The output of the above is 10 when a new instance is created, whereas without the instance variable initializer, it would be 0 (the default value for instance variables of type int). Demonstrating the difference between Java and C# in terms of when the instance variable initializers are run requires calling an overridden method from the base class constructor. This is a really bad idea - wherever possible, only call non-virtual methods from constructors, and if you absolutely must call a virtual method, document very carefully that it is called from the constructor, so that people wishing to override it are aware that their object may not be in a consistent state when it is called (as their own constructor won't have run yet). Here's an example:

public class MyBaseClass
{
public MyBaseClass ()
{
Console.WriteLine (this.ToString());
}
}

public class MyDerivedClass : MyBaseClass
{
string name="hello";

public MyDerivedClass : base()
{
Console.WriteLine (this.ToString());
}

public override string ToString()
{
return name;
}
}



When a new instance of MyDerivedClass is created in C#, the output is:

hello
hello



The first line is hello because the instance variable initializer for the name variable has run directly before the base class constructor. The equivalent code in Java syntax would output:

null
hello



Here the first line is null because the instance variable initializer for the name variable only runs directly after the base class constructor has returned (but before ToString is called for the second time).

Constructors are not inherited
(ECMA) (MS), second paragraph from the bottom.
Constructors are not inherited. In other words, just because a base class has a constructor taking a certain list of parameters doesn't mean that a derived class has a constructor taking that list of parameters. (It can, by providing one itself, but it doesn't inherit it from the base class.) To demonstrate this, here's an example which doesn't compile:

public class MyBaseClass
{
public MyBaseClass (int x)
{
}
}

public class MyDerivedClass : MyBaseClass
{
// This constructor itself is okay - it invokes an
// appropriate base class constructor
public MyDerivedClass () : base (5)
{
}

public static void Main()
{
new MyDerivedClass (10);
}
}



Here, we try to invoke a constructor for MyDerivedClass which takes an int parameter. There isn't one, however, as constructors aren't inherited. The MyBaseClass constructor which takes an int parameter can be invoked by a constructor in MyDerivedClass (as is shown by the parameterless MyDerivedClass constructor) but isn't actually inherited. Removing the "10" from the above code would make it compile and run with no problems - the parameterless MyDerivedClass constructor would be invoked, and that would in turn invoke the MyBaseClass constructor taking an int parameter, with 5 as that parameter value.

Some people have said that they would rather constructors were inherited, making the language act as if all derived classes had constructors with all the parameter lists from the constructors from the base class, and just invoking them with the parameters provided. I believe this would be a very bad idea. Take, for instance, the FileInfo class. You must logically provide a filename when constructing a FileInfo instance, as otherwise it won't know what it's meant to be providing information on. However, as object has a parameterless constructor, constructors being inherited would then mean that FileInfo had a parameterless constructor. Some have suggested that this could be fixed by allowing you to "override" the parameters you didn't want invoked as private, but this goes against the idea that you should never be able to override anything to give it more restrictive access, and also means that class developers would have to change their code every time a new constructor was added to a base class.

No comments: