csammisrun

A rare situation

Finding the phantom ArgumentException

without comments

Last night I knocked out a bug in shaim that had been annoying me for months. It had to do with the OscarLib driver, and changing the order of your contact list from another client. I’d finally gotten it fixed to the point where moving a contact from one group to another would work sometimes, seemingly depending on the order of the groups that were involved in the move, but the inconsistency was driving me insane.

A little background: Groups and contacts are represented in OscarLib by fairly simple objects:

class Group
{
  public string Name;
  public ushort ID;
  public ushort[] childIDs;
}

class Contact
{
  public string Name;
  public ushort ID;
  public ushort parentID;
}

When the AIM protocol sends out notice of a remote list update, it does so in several steps:

  1. Alert the client that the list is about to change
  2. Send a “contact removed from original group” notification
  3. Send a “contact added to new group” notification
  4. Tell the client that the original group has had its children changed
  5. Tell the client that the new group has had its children changed
  6. Send an “All done” message

Whew, that’s a lot of messages! The OscarLib driver needs to remember the order of events so that, when the “new group children changed” message comes in, it can add the contact to the new group in the appropriate order (people complain a lot when their list gets rearranged).

To effect this memory, the OscarLib driver maintains a list of groups and the contacts in those groups that have been updated.

Dictionary<Group, List<Contact>> outstandingGroupUpdates;

It keeps a list of contacts to update because it’s possible for the server to send multiple list updates before it sends the final group update, so they’re all performed at the same time when the server says it’s done. Let’s look at the message order again, and see what the driver does.

  1. Alert the client that the list is about to change
  2. Send a “contact removed from original group” notification. Remove item from shaim’s list
  3. Send a “contact added to new group” notification. Add the contact to outstandingGroupUpdates
  4. Tell the client that the original group has had its children changed
  5. Tell the client that the new group has had its children changed. If the group has a list of contacts to update in outstandingGroupUpdates, perform the list updates
  6. Send an “All done” message

That’s all the necessary background. The problem was happening in the second-to-last step, where the driver attempted to look in the Dictionary and see if there were any updates to perform:

void GroupChildrenChanged(Group group)
{
  if(outstandingGroupRequests.ContainsKey(group))
  {
    foreach(Contact contact in outstandingGroupRequests[group])
    {
      ProcessUpdate(group, contact);
    }
  }
}

The only clue I had was that when this code executed, and the driver was inclined to fail (random), the console would print

A first-chance exception of System.ArgumentException was thrown in mscorlib

…and the method would just stop. The exception didn’t ever make it up to user code, it just lived and apparently died in the Framework itself. But where the hell was it coming from? Putting in a lot of breakpoints and console prints, I found out that it was the Dictionary.ContainsKey method causing the exception. Well crap, it’s not supposed to throw ArgumentException!

The usual suspects - a null group being passed to the GroupChildrenChanged handler, the dictionary being in an invalid state, the group not successfully being added to the dictionary in the first place - all turned up bupkiss. I broke out Reflector and peered into mscorlib’s implementation of Dictionary to see if it was throwing an ArgumentException at any point and just handling it itself (though badly) - nothing. In fact, the only thing that could throw was the IEqualityComparer used by the Dictionary class to compare two objects to determine whether an object was, in fact, contained by the dictionary.

The IEqualityComparer was a bit of an enigma and a good candidate for Problem Child. The Framework contains four or five (I forget) implementations of an equality comparer that were chosen by the type of the generic objects it was comparing. It could be strings, bytes, comparable objects, or miscellaneous, and no way to tell which was being chosen at runtime*. The objects getting compared were my plain old Group objects, so I guessed that the miscellaneous comparer was being used, and Reflected into that. Surprise, no explicitly thrown ArgumentExceptions there either, but I wanted to rule out the miscellaneous comparer as the culprit entirely. I went through my Group and Contact objects and implemented IComparable on them, so the IEqualityComparer would choose its “comparable objects” implementation…and it started working, no other modifications necessary.

The moral of the story would seem to be to make sure that the keys for a Dictionary are IComparable, but that’s pretty weird. Every object has a default implementation of Equals, and since I didn’t need comparison above and beyond reference equality, that should have been sufficient. And I’m still not entirely sure where the ArgumentException was coming from…was it the default implementation of object.Equals in use by the Group object? Was it the generic IEqualityComparer being used by the Dictionary? Why was the exception not bubbling up to the user code? Madness!

* While I was investigating this, I read an MSDN blog post that indicated that the next version of Visual Studio would, in fact, be able to debug down into the Framework source. I could have stepped through live code until I found the problem. Oh well.

Written by Chris

October 4th, 2007 at 3:48 pm

Posted in Development, General

Leave a Reply