Friday, January 11, 2008

The Test/Code Cycle in XP: Part 1, Model

This paper demonstrates the development of a small bibliographic system using Extreme Programming techniques. Part 1 shows development of the model; part 2 (only available in Java as at Dec 2000) shows development of the associated user interface.

Specifically, it shows how unit tests and simple design work together while programming. Observe how the coding process occurs in small steps, just enough implementation to make each test run. There's a rhythm to it, like a pendulum of a clock: test a little, code a little, test a little, code a little. (To bring this out, we'll align test code to the left of the page, and application code to the right.)

For this example, suppose we have bibliographic data with author, title, and year of publication. Our goal is to write a system that can search that information for values we specify. We have in mind an interface something like this:



We'll divide our work into two parts: the model and the user interface.

For our model, we have a collection of Documents, which know their metadata. A Searcher knows how to find Documents: given a Query, it will return a Result (a set of Documents). We'll create our unit tests (and our classes) bottom-up: Document, Result, Query, and Searcher.

Document, Result, and Query
Document
A Document needs to know its author, title, and year. We'll start with a "data bag" classs, beginning with its test:

public void TestDocument() {
Document d = new Document("a", "t", "y");
AssertEquals("a", d.Author);
AssertEquals("t", d.Title);
AssertEquals("y", d.Year);
}


This test doesn't compile (as we haven't created the Document class yet). Create the class with stubbed-out methods.

Run the test again to make sure it fails. This may seem funny - don't we want the tests to pass? Yes, we do. But by seeing them fail first, we get some assurance that the test is valid. And once in a while, a test passes unexpectedly: "that's interesting!"

Fill in the constructor and the methods to make the test pass.

Let's highlight this mini-process:

The Test/Code Cycle in XP
Write one test.
Compile the test. It should fail, as you haven't implemented anything yet.
Implement just enough to compile. (Refactor first if necessary.)
Run the test and see it fail.
Implement just enough to make the test pass.
Run the test and see it pass.
Refactor for clarity and "once and only once".
Repeat from the top.



This process ensures that you've seen the test both fail and pass, which gives you assurance that the test did test something, that your change made a difference, and that you've added valued functionality.

Some people will advocate not bothering to test simple properties. ("They can't possibly be wrong.", "It's a pain to write a bunch of property tests.") I tend to write the tests anyway:

It doesn't take that much time to write the test, and it's certainly not hard, but it gives you that extra edge. ("You thought you were sure it's ok; now you have a test that demonstrates it.")
A test will often have a longer lifetime than the code it's testing. The test is there so when you add caching, or don't create objects until required, or add logging, etc., you still have assurance that the original function will work.
Boring tests full of setters and getters are often trying to tell you something: the class may not be pulling its weight. When a class is almost a "struct", it's often a sign that the responsibilities aren't distributed right between classes.
Result
A Result needs to know two things: the total number of items, and the list of Documents it contains. First we'll test that an empty result has no items.

public void TestEmptyResult() {
Result r = new Result();
Assert ("count=0 for empty result", r.Count == 0);
}


Create the Result class and stub out its Count property. See it fail until you add "return 0;" as its implementation.

Next test a result with two documents.

public void testResultWithTwoDocuments() {
Document d1 = new Document("a1", "t1", "y1");
Document d2 = new Document("a2", "t2", "y2");
Result r = new Result(new Document[]{d1, d2});
Assert (r.Count == 2);
Assert (r[0] == d1);
Assert (r[1] == d2);
}
Add the indexer declaration (returning null) and watch the test fail. (I'm going to stop mentioning that, but keep doing it. It takes a few seconds, but gives that extra bit of reassurance.) Implementing a simple version of Result will give:


public class Result
{
Document[] myDocuments = new Document[0];

public Result() {}

public Result(Document[] aDocumentArray) {
myDocuments = aDocumentArray;
}

public Document this [int anIndex] {
get { return myDocuments[anIndex]; }
}

public int Count {
get { return myDocuments.Length; }
}
}
The test runs, so we're done with this class.

Query
We'll represent the Query as just its query string.

public void TestSimpleQuery()
{
Query q = new Query("test");
AssertEquals("test", q.Value);
}

Create the Query class with a constructor, so that it remembers its query string and reports it via Value.

Searcher
The Searcher is the most interesting class. The easy case is first: we should get nothing back from an empty collection of Documents.

public void TestEmptyCollection() {
Searcher searcher = new Searcher();
Result r = searcher.Find(new Query("any"));
Assert(r.Count == 0);
}

This test doesn't compile, so stub out the Searcher class.

public class Searcher {
public Searcher() {}
public Result Find(Query aQuery) { return null; }
}

The test compiles, but fails to run correctly (because Find() returns null). We can fix this with this change:

public Result Find(Query aQuery) {return new Result();}
Things get more interesting when we try real searches. Then we face the issue of where the Searcher gets its documents. We'll begin by passing an array of Documents to the Searcher's constructor. But first, a test.

public void TestOneElementCollection()
{
Document d = new Document("a", "a word here", "y");
Searcher searcher = new Searcher(new Document[]{d});

Query q1 = new Query("word");
Result r1 = searcher.Find(q1);
Assert(r1.Count == 1);

Query q2 = new Query("notThere");
Result r2 = searcher.Find(q2);
Assert (r2.Count == 0);
}

This test shows us that we have to find what is there, and not find what's not there.

To implement this, we have to provide the new constructor that makes the test compile (though it still fails). Then we have to get serious about implementation.

First, we can see that a search has to retain knowledge of its collection between calls to Find(), so we'll add a member variable to keep track, and have the constructor remember its argument:

Document[] myDocuments = new Document[0];

public Searcher(Document[] aDocumentArray) {
myDocuments = aDocumentArray;
}

Now, the simplest version of Find() can iterate through its documents, adding each one that matches the query to a Result:


public Result Find(Query aQuery) {
Result result = new Result();

foreach (Document doc in myDocuments) {
if (doc.Matches(aQuery)) {
result.Add(doc);
}
}

return result;
}

This looks good, except for two problems: Document has no Matches() method, and Result has no Add() method.

Let's add a test: we'll check that each field can be matched, and that a document doesn't match queries it shouldn't:

public void TestDocumentMatchingQuery() {
Document d = new Document("1a", "t2t", "y3");
Assert(d.Matches(new Query("1")));
Assert(d.Matches(new Query("2")));
Assert(d.Matches(new Query("3")));
Assert(!d.Matches(new Query("4")));
}

There are three situations for queries that we should deal with eventually: empty queries, partial matches, and case sensitivity. For now, we'll assume empty strings and partial matches should match, and the search is case-sensitive. In the future we might change our mind.

This is enough information to let us implement matches:

public bool Matches(Query aQuery) {
String query = aQuery.Value;

return myAuthor.IndexOf(query) != -1
|| myTitle.IndexOf(query) != -1
|| myYear.IndexOf(query) != -1;
}

This will enable TestDocumentMatchingQuery() to work, but TestOneElementCollection() will still fail because Result has no Add() method yet. So, add a test for the method Result.Add():

public void TestAddingToResult()
{
Document d1 = new Document("a1", "t1", "y1");
Document d2 = new Document("a2", "t2", "y2");

Result r = new Result();
r.Add(d1);
r.Add(d2);

Assert ("2 items in result", r.Count == 2);
Assert ("First item", r[0] == d1);
Assert ("Second item", r[1] == d2);
}

This test fails. Result already remembers its list by using an array, but that is not the best choice for a structure that needs to change its size. We'll change to use an ArrayList:

ArrayList myDocuments = new ArrayList();

public Result(Document[] aDocumentArray) {
foreach (Document doc in aDocumentArray) {
myDocuments.Add(doc);
}
}

public Document this [int anIndex] {
get { return (Document)myDocuments[anIndex]; }
}

public int Count {
get { return myDocuments.Count; }
}

Make sure the old unit tests TestEmptyResult() and TestResultWithTwoDocuments() still pass. Add the new method:

public void Add(Document aDocument)
{
myDocuments.Add(aDocument);
}

Let's consider the the new Result(Document[]) constructor. It was introduced to support the TestResultWithTwoDocuments() test, because it was the only way we could create Results containing documents. Later, we introduced Result.Add(), which is what Searcher needs. The array constructor is no longer needed. So, we'll put on a testing hat and revise that test. Instead of Result r = new Result(new Document[]{d1,d2}), we'll use:

Result r = new Result();
r.Add(d1);
r.Add(d2);

We verify that all tests still pass, so it is now safe to remove the array-based constructor. We also see that TestAddingToResult() is now essentially a duplicate of TestResultWithTwoDocuments(), so we'll remove the latter.

Finally, all our tests pass for Document, Result, Query, and Searcher.

Initialization
Loading Documents
Where does a searcher get its documents? Currently, you'd call its constructor from the main routine, passing in an array of documents. Instead, we want the searcher to own the process of loading its documents.

We begin with a test. We'll pass in a Reader, and be prepared to see exceptions. We've also postulated a Count property, used only by tests to verify that something was loaded. An advantage of having the tests in the same Assembly as the class under test is that you can provide non-public methods that let tests view an object's internal state.

public void TestLoadingSearcher() {
try {
String docs = "a1\tt1\ty1\na2\tt2\ty2"; // \t=field, \n=row
StringReader reader = new StringReader(docs);
Searcher searcher = new Searcher();
searcher.Load(reader);
Assert("Loaded", searcher.Count == 2);
}
catch (IOException e) {
Fail("Loading exception: " + e);
}
}

Notice that Searcher still uses an array (the simplest choice at the time). We'll do as we did for Result, a refactoring converting from an array to an ArrayList.

// Searcher:
ArrayList myDocuments = new ArrayList();

public Searcher() {}

public Searcher(Document[] aDocumentArray) {
foreach (Document doc in aDocumentArray) {
myDocuments.Add(doc);
}
}

public Result Find(Query aQuery) {
Result result = new Result();

foreach (Document doc in myDocuments) {
if (doc.Matches(aQuery)) {
result.Add(doc);
}
}

return result;
}

(Verify that the old tests pass.) Now we're in a position to do the loading:


// Searcher:
public Query MakeQuery(string aQueryString) {
return new Query(aQueryString);
}

public void Load(TextReader aReader) {
try {
String line = aReader.ReadLine();
while (line != null) {
myDocuments.Add(new Document(line));
line = aReader.ReadLine();
}
}
finally {
try { aReader.Close(); }
catch (Exception) { /*ignore*/ }
}
}

internal int Count {
get { return myDocuments.Count; }
}


// Document:
public Document(String line) {
string[] tokens = line.Split(new char[] {'\t'});

if (tokens.Length != 3) {
throw new ArgumentException("Author, " +
"Title and Year not provided.");
}

myAuthor = tokens[0];
myTitle = tokens[1];
myYear = tokens[2];
}

Searcher's array-based constructor is no longer needed. We'll adjust the test and delete the constructor:

public void TestOneElementCollection() {
Searcher searcher = new Searcher();

try {
StringReader reader = new StringReader("a\ta word here\ty");
searcher.Load(reader);
}
catch (Exception ex) {
Fail ("Couldn't load Searcher: " + ex);
}

Query q1 = searcher.MakeQuery("word");
Result r1 = searcher.Find(q1);
Assert(r1.Count == 1);

Query q2 = searcher.MakeQuery("notThere");
Result r2 = searcher.Find(q2);
Assert (r2.Count == 0);
}

SearcherFactory
Where does a Searcher come from? Currently, that's left up to whoever calls its constructor. Instead of letting clients depend on the constructor, we'd like to introduce a factory method responsible for locating the Searcher. (For the test, we'll put a file "test.dat" in the directory for testing. If we wanted to be less lazy, we'd have the test create and delete the file as well.) public void TestSearcherFactory() {
try {
Searcher s = SearcherFactory.GetSearcher ("test.dat");
Assert (s != null);
}
catch (Exception ex) {
Fail ("SearcherFactory can't load: " + ex);
}
}

We can implement:

public class SearcherFactory {
public static Searcher GetSearcher(String filename) {
Searcher s = new Searcher();
s.Load(new StreamReader(
new FileStream(filename, FileMode.Open)));
return s;
}
}

Now, a client obtains a Searcher by asking a SearcherFactory to give it one.

Looking Back
I'd like to put a design hat on, and look at the methods we've developed, from two perspectives: the search client and the Searcher class. Who uses each public method and property?

Search Client Searcher class
Document.Author
Document.Title
Document.Year
new Query()
Result.Count
Result[index]
Searcher.Find() new Document()
Document.Matches()
Query.Value
new Result()
Result.Add()

Looking at the Document and Query classes, I still have twinges that say they may not be doing enough (being not much more than a "data bag"). But both seem like good, meaningful "near-domain" classes, so we'll hold off on any impulse to change them. The Result and Searcher classes feel like they have the right balance.

What about the development process? It seemed to generate some blind alleys. For example, we had to change data structures from arrays to array lists (twice!). Is this a flaw in our process? No, it's not. The array was an adequate structure when it was introduced, and it was changed when necessary. We don't mind blind alleys, as long as they're never one-way dead ends. We're not omniscient, so there will be times we need to change our minds; the key is making sure we never get stuck with a bad or over-complex design.

Moving Forward: Interfaces
The implementation we've derived above is a good starting point, but is not in final form. In real systems, the bibliographic information is often kept elsewhere, perhaps in a database, XML file, on another network, etc. We don't want our clients to know which alternative they're using.

The methods in the "Search Client" column of the table above show the interfaces required by clients. "Query" is probably OK as a class (since clients have to be able to construct them), but we would like to introduce interfaces for Searcher, Result, and Document. We'll apply the "Extract Interface" refactoring (from Fowler's book).

Unfortunately, the names we'd like for our interfaces are the same as the ones we already use for the classes. Since we'd like things to be better from the client point of view, and the classes so far are based on strings, we'll rename Searcher to StringSearcher, etc. and reserve the shorter names for the interfaces.

So, move Searcher.java to StringSearcher.java. Fix every call site and reference. Run the tests to verify that we've renamed correctly.

Introduce the interface:

public interface Searcher {
Result Find(Query q);
}

(Run the tests.) Make StringSearcher implement the interface. (Run the tests.) Now, the only place that must reference StringSearcher by name is the SearcherFactory interface. (We could remove that dependency, and perhaps also put the String* objects in a different package, but we won't do that here for reasons of space.)

Apply the same process to Result, renaming the old Result to StringResult, and introducing the interface:

public interface Result {
Document this [int anIndex] { get; }
int Count { get; }
}

The StringSearcher class should still construct a StringResult object, but its return type should remain Result. (We don't mind if the String* classes depend on each other, but we don't want to make clients aware of that fact.)

Finally, introduce the interface for Document:

public interface Document {
string Author { get; }
string Title { get; }
string Year { get; }
}

We're left with two concrete classes that clients will depend on: SearcherFactory and Query. Clients depend on the interfaces for Searcher, Result, and Document, but not directly on the classes implementing those interfaces.

Conclusion
We've developed the bibliographic system's model in a typical Extreme Programming style, emphasizing simple design and a process of alternately testing and coding. The unit tests supported us in designing, coding, and refactoring. The resulting system could have either a simple command line or a graphical user interface attached to it.

Resources
Source code (zip file 4KB).
Dot Net Dan's .NET Discussion
The Test/Code Cycle in XP: Part 1, Model, in Java by William Wake
The Test/Code Cycle in XP: Part 2, GUI, in Java by William Wake.
William Wake's XPlorations
Extreme Programming Explained, Kent Beck.
Refactoring, Martin Fowler.
NUnit home
JUnit home
[Written by William Wake 1-25-2000; re-titled and revised 2-3-2000; added search.zip 7-2-00; Ported to C# by Dan Green 12-23-2000.]

No comments: