Sunday, November 25, 2007

Coding Peeves

This'll probably be the first in a series of pet peeves in API design. The common thread is that there is value in consistency. This is true throughout application design. Consistency means predictability. Your end users will thank you for sparing them lots of training and headache learning your app. This is even more true for developer users of your APIs. If they have to guess what your black-box methods do, they are wasting lots of very expensive time.

"A foolish inconsistency is the hobgoblin of mediocre programmers" -cdude.

So here is an example of what not to do.

Your API returns an array if there are any appropriate elements, and null if not.

So I have to write this everywhere:

    Widget[] widgets = Utility.GetWidgets();
    if (widgets != null)
    {
        foreach (Widget widget in widgets)
        {
            ...
        }
    }

ARGGH. Here's what the same code looks like if the API returns an empty array:

    foreach (Widget widget in Utility.GetWidgets())
    {
        ...
    }

Am I being lazy? Yes, but that's no excuse for a bad API. Probably a third of all consumers of this API will forget to check for null, even if they are adamant that they should. (I can support this with statistics on projects I have worked on). Your QA department may even forget to try a use case with no appropriate elements. But your users won't forget to try it. And you will be publishing a patch.

JScript Arrays

Have you ever tried to use a .NET array in JScript? For example, suppose you have a class library with a method that returns an array of a custom type (MyObject[]). You would think you could enumerate the members using the syntax

    for (var x in GetMyObjects())

After all, an array implements IEnumerable. Well, it does, but it does it wrong, at least in JScript. The values of x will be the indices of the array. Yeah, that's right: 1, 2, 3...

The for-in construct in JScript is not quite the same as foreach in C#. It has to accomodate expando properties as well. Somehow it gets confused with .NET arrays.

Here's a workaround. Wrap your array in an IEnumerable. In C#:

    using System;
    using System.Collections.Generic;
    using System.Collections;

    public class ProtectedArray<T> : IEnumerable<T>, IEnumerable
    {
        private T[] _innerArray;

        public ProtectedArray(T[] innerArray)
        {
            _innerArray = innerArray;
        }

        public IEnumerator<T> GetEnumerator()
        {
            return ((IEnumerable)_innerArray).GetEnumerator();
        }

        public Int32 Length
        {
            get    {return _innerArray.Length;}
        }

        public T this[Int32 index]
        {
            get {return _innerArray[index];}
            set    {_innerArray[index] = value;}
        }

        public T[] ToArray()
        {
            return _innerArray;
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return (IEnumerator)this.GetEnumerator();
        }
    }

(My apologies if <T> disappeared anywhere in there. The editor thinks it is an unknown HTML tag. I tried to find them all.)

Now, if GetMyObjects() returns a ProtectedArray, you can enumerate in JScript using the above syntax. You can also try the following if you don't want to change your API, but you may need to modify the ProtectedArray class because JScript cannot handle the generic declaration.

    for (var x in new ProtectedArray(GetMyObjects()))

RCW Refcounts

You're accessing Microsoft Excel from .NET managed code using the Primary Interop Assembly (PIA). Your program ends, but the Excel process is still alive in task manager (a bad thing). It can happen in anytime you use a Runtime Callable Wrapper (RCW), but the problem is most acute in Office PIAs because of the peculiar VBA APIs.

The VBA syntax encourages stringing properties along, something like this (in C#):

    using Microsoft.Office.Interop.Excel;
    ...
    Application app = new Application();
    Workbook workbook = app.Workbooks.Open(...);
    Range range = workbook.Worksheets[1].UsedRange;
    ...

The RCW increments the reference count for each wrapped object, so you have to release the reference. For example, I might call "Release(workbook)" where Release is given by:

    private static void Release(Object o)
    {
        try
        {
            Int32 i = 1;
            while (o != null && i > 0)
            {
                i = Marshal.ReleaseComObject(o);
            }
        }
        catch
        {}
    }

It is my belief that the refcount will never be more than 1 as adding another managed reference is only copying the pointer to the wrapper, which holds the real interop reference. So the Release() method can be simplified a bit.

The tricky part is, in the VBA style syntax you don't declare references for the intermediate steps. Now you might think there is no refcount if you don't have a reference, but you'd be wrong. The wrapper still needs to be created at each step to call the method of the next property in the chain, and that adds the refcount. So if you don't keep a reference to release later, you're hosed.

Meanwhile, any time you throw an exception you will miss your Release(), so put the Release() calls in 'finally' blocks. I am fairly paranoid, so I release things in exactly the same order I referenced them. If I have to do things between releases that might throw an exception, the try-finally blocks can nest pretty badly, but that's the way it goes.

Monday, September 3, 2007

The end of Moore's law

Briefly, Moore's law is the observation, valid for the last four decades or so, that the number of transistors you can pack on a chip has been doubling every two years. This translates to computing power mostly because you can flip a smaller switch faster.

Eventually this must come to an end when the circuits reach the atomic level, around 2020. Some futurists (Vernor Vinge, Ray Kurzweil) think some other unspecified technology will allow the law to continue beyond this. It's fun to speculate about a technological singularity, where an AI becomes smarter than humans, and can advance the field so fast that humans can no longer participate meaningfully. Vinge predicts the end of the human era before 2030.

But lets be a little more sober for a sec. First of all, Moore's law is very nearly at a standstill already. The problem is, clock speeds reach a hard limit around 4GHz. After that they run too hot. Combine that with the fact that they can process a number of instructions on the order of 1, and you have a problem. That's why in recent years they have been making multi-core processors, and developing architectures that can execute two or even three instructions per clock tick. That will help for a while, but what happens when you have as many cores as processes? (note: someone recently put 1000 cores on one wafer)

Second, there is much more to AI than computing power. A computer can win at chess, by brute force computation. But the best go programs in the world cannot compete with even casual club players. And it's not for lack of trying. It's just that computers are not good at judgement. Correct me if I'm wrong, but no robot can run on two legs, because they can't balance well enough. Surely they already have the necessary computing power for that task. And visual perception is primitive. And is it really all that hard to figure out what letters are scribbled in those little boxes to foil spammers? Even the tiny brain of a mosquito exceeds our best AI efforts.

So, sorry Vernor. I don't know what's in store for 2100, but your singularity just ain't happening before 2030.