Sunday, November 25, 2007

Coding Peeves

This'll probably be the first in a series of pet peeves in API design. The common thread is that there is value in consistency. This is true throughout application design. Consistency means predictability. Your end users will thank you for sparing them lots of training and headache learning your app. This is even more true for developer users of your APIs. If they have to guess what your black-box methods do, they are wasting lots of very expensive time.

"A foolish inconsistency is the hobgoblin of mediocre programmers" -cdude.

So here is an example of what not to do.

Your API returns an array if there are any appropriate elements, and null if not.

So I have to write this everywhere:

    Widget[] widgets = Utility.GetWidgets();
    if (widgets != null)
    {
        foreach (Widget widget in widgets)
        {
            ...
        }
    }

ARGGH. Here's what the same code looks like if the API returns an empty array:

    foreach (Widget widget in Utility.GetWidgets())
    {
        ...
    }

Am I being lazy? Yes, but that's no excuse for a bad API. Probably a third of all consumers of this API will forget to check for null, even if they are adamant that they should. (I can support this with statistics on projects I have worked on). Your QA department may even forget to try a use case with no appropriate elements. But your users won't forget to try it. And you will be publishing a patch.

JScript Arrays

Have you ever tried to use a .NET array in JScript? For example, suppose you have a class library with a method that returns an array of a custom type (MyObject[]). You would think you could enumerate the members using the syntax

    for (var x in GetMyObjects())

After all, an array implements IEnumerable. Well, it does, but it does it wrong, at least in JScript. The values of x will be the indices of the array. Yeah, that's right: 1, 2, 3...

The for-in construct in JScript is not quite the same as foreach in C#. It has to accomodate expando properties as well. Somehow it gets confused with .NET arrays.

Here's a workaround. Wrap your array in an IEnumerable. In C#:

    using System;
    using System.Collections.Generic;
    using System.Collections;

    public class ProtectedArray<T> : IEnumerable<T>, IEnumerable
    {
        private T[] _innerArray;

        public ProtectedArray(T[] innerArray)
        {
            _innerArray = innerArray;
        }

        public IEnumerator<T> GetEnumerator()
        {
            return ((IEnumerable)_innerArray).GetEnumerator();
        }

        public Int32 Length
        {
            get    {return _innerArray.Length;}
        }

        public T this[Int32 index]
        {
            get {return _innerArray[index];}
            set    {_innerArray[index] = value;}
        }

        public T[] ToArray()
        {
            return _innerArray;
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return (IEnumerator)this.GetEnumerator();
        }
    }

(My apologies if <T> disappeared anywhere in there. The editor thinks it is an unknown HTML tag. I tried to find them all.)

Now, if GetMyObjects() returns a ProtectedArray, you can enumerate in JScript using the above syntax. You can also try the following if you don't want to change your API, but you may need to modify the ProtectedArray class because JScript cannot handle the generic declaration.

    for (var x in new ProtectedArray(GetMyObjects()))

RCW Refcounts

You're accessing Microsoft Excel from .NET managed code using the Primary Interop Assembly (PIA). Your program ends, but the Excel process is still alive in task manager (a bad thing). It can happen in anytime you use a Runtime Callable Wrapper (RCW), but the problem is most acute in Office PIAs because of the peculiar VBA APIs.

The VBA syntax encourages stringing properties along, something like this (in C#):

    using Microsoft.Office.Interop.Excel;
    ...
    Application app = new Application();
    Workbook workbook = app.Workbooks.Open(...);
    Range range = workbook.Worksheets[1].UsedRange;
    ...

The RCW increments the reference count for each wrapped object, so you have to release the reference. For example, I might call "Release(workbook)" where Release is given by:

    private static void Release(Object o)
    {
        try
        {
            Int32 i = 1;
            while (o != null && i > 0)
            {
                i = Marshal.ReleaseComObject(o);
            }
        }
        catch
        {}
    }

It is my belief that the refcount will never be more than 1 as adding another managed reference is only copying the pointer to the wrapper, which holds the real interop reference. So the Release() method can be simplified a bit.

The tricky part is, in the VBA style syntax you don't declare references for the intermediate steps. Now you might think there is no refcount if you don't have a reference, but you'd be wrong. The wrapper still needs to be created at each step to call the method of the next property in the chain, and that adds the refcount. So if you don't keep a reference to release later, you're hosed.

Meanwhile, any time you throw an exception you will miss your Release(), so put the Release() calls in 'finally' blocks. I am fairly paranoid, so I release things in exactly the same order I referenced them. If I have to do things between releases that might throw an exception, the try-finally blocks can nest pretty badly, but that's the way it goes.