A state aware generic list

Dec 18, 2007

For so many years I’ve written code that needed to know whether or not a collection of objects has been changed or not. Now I’ve finally written the code that tells me just that.

The scenario is typical. You have a class called Order with a property called OrderLines which is a generic List containing OrderLine objects. You retrieve the Order class from the database including all the OrderLines, change some properties and then you save the Order object back to the database. Now the question is if you need to save the OrderLines as well. To know that, you need to know if the OrderLines collection has been changed. It would be a waste of resource to persist all the order lines each time you make a modification to the Orders object. To know if it has changed, we need a list that is aware of its own state.

The StateList<T>

This class is a subclass of the List<T> generic class but with a few twists. It does exactly what a normal List<T> does, but has two new members – a MarkUnChanged method and an IsChanged property. It also overrides the GetHashCode method. That’s all it does.

The MarkUnChanged method tells the StateList what the original list contains. You call this method after you’ve retrieved all the OrderLines and populated the list. That let’s the StateList know what the comparison is for reference. Behind the scenes, the MarkUnChanged method stores the hash code of the items in the list in a private integer variable.

The IsChanged property returns a Boolean value that indicates whether or not the items in the list produce the same hash code as the one stored in the private variable. If it doesn’t, then the list is changed. This is the class.

[System.Serializable]

public class StateList<T> : System.Collections.Generic.List<T>

{

  public override int GetHashCode()

  {

    long hash = 0;

    foreach (T item in this)

    {

      hash += item.GetHashCode();

    }

 

    return hash.GetHashCode();

  }

 

  private int _HashCode = 0;

 

  /// <summary>

  /// Gets if this list's data has been changed.

  /// </summary>

  public virtual bool IsChanged

  {

    get

    {

      return this.GetHashCode() != _HashCode;

    }

  }

 

  /// <summary>

  /// Marks the object as being clean,

  /// which means not changed.

  /// </summary>

  public virtual void MarkUnChanged()

  {

    _HashCode = this.GetHashCode();

    base.TrimExcess();

  }

}

Example

Here is a little taste of how it works:

StateList<int> list = new StateList<int>();

list.Add(3);

list.Add(4);

list.Add(5);

// list.IsChanged = true

 

list.MarkUnChanged();

// list.IsChanged = false

 

list.Add(6);

// list.IsChanged = true

 

list.Clear();

list.Add(3);

list.Add(4);

list.Add(5);

// list.IsChanged = false

The order of the items

The code above produces the same hash code no matter in what order the individual items are located in the list. So if the same items are located in the list but at different locations, the IsChanged property is false. To support location awareness, you can just switch the overridden GetHashCode method with this one:

public override int GetHashCode()

{

  long hash = 0;

  for (int i = 0; i < this.Count; i++)

  {

    hash += this[i].GetHashCode() ^ i;

  }

 

  return hash.GetHashCode();

}

* Only $4.95/month ASP.NET & Windows 2008 + IIS 7 Hosting! FREE SQL Included

Comments (10) -

Jesper Blad Jensen
Jesper Blad Jensen Denmark
12/19/2007 5:48:21 AM #

Cool idea to use the hashcode. I have never though of that. If I should build it it, would be some crazy thing with an interface and that it would be the orderlines responsibility to say that its changed - but this is way cooler.

oVan
oVan Belgium
12/19/2007 8:11:54 AM #

Any thoughts on the performance for big lists? I imagine the IsChanged function can take a while if the list contains quite a few entries?

Paul
Paul United Kingdom
12/19/2007 10:09:19 AM #

So how do you go about making sure the sum of hash codes (over which you have no control) doesn't match?

ie.
string s0 = "black";
string s1 = "white";
string s2 = "red";
string s2 = "blue";

StateList<string> list = new StateList<string>();
list.Add(s0);
list.Add(s1);

// list.IsChanged = true
list.Clear();

// list.IsChanged = true
list.Add(s2);
list.Add(s3);

// it turns out that by an unfortunately turn of fate
//     s0.GetHashCode() + s1.GetHashCode() == s2.GetHashCode() + s3.GetHashCode()
// list.IsChanged = FALSE!!!

or am I missing something?

Scott Walker
Scott Walker United States
12/19/2007 12:12:25 PM #

Yes I would have to agree with Paul that something seems amiss here.

I believe by definition of a hash function that:

a==b implies a.GetHashCode() == b.GetHashCode()
a.GetHashCode() = b.GetHashCode() does not imply that a == b

Hence the problem of hash collisions etc...

mcgurk
mcgurk United States
12/19/2007 12:54:23 PM #

I concur that ur hash code blows.  Adding child hash codes together?  That's... bizzare.  And you cannot at all guarantee that one hash won't equal another hash.  Hash codes in .NET are kinda crappy.

I'd suggest that, when overriding hashes in the future, you XOR together child object hash codes and throw a little salt in there for shits and giggles.  Then, after you have your nice hash function all set up, you never use it.

For this object, what I would do is:
1) seal it
2) new-over the base functions and properties that callers use to change state (this[], add(), remove(), etc)
3) Examine the requested state change to determine if it actually changes the collection's state (i.e., me[key] = me[key] should not change your state to Dirty)
4) Perform the state change and mark yourself as changed or not changed
5) Realize that objects you store may be changed behind your back and you cannot possibly guarantee that you will be able to detect this has happened and drop the whole idea completely.

Mads Kristensen
Mads Kristensen Denmark
12/19/2007 1:04:04 PM #

You're probably all right about the hash code not being good enough for standard CLR types. I use this collection for objects that I build my self and control the hash code of. In such a scenario it works perfectly.

Nicholas DL
Nicholas DL United States
12/19/2007 2:15:25 PM #

Many stated that the GetHashCode is insufficient; however, no one supplied the solution? Considering his hash algorithm, what is the probability that duplication among instances might occur (Reflector reveals that the BCL often implements hashes similarly)?

You might implement the INotifyPropertyChanged interface.

Dave Transom
Dave Transom New Zealand
12/19/2007 8:50:09 PM #

Well, it could always use the hash code of the items in the list to generate a unique key for the state of the collection, but not override the GetHashCode method. Instead introduce a new method: GetStateKey()

Scott Walker
Scott Walker United States
12/20/2007 12:18:09 AM #

>>Many stated that the GetHashCode is insufficient; however, no one supplied the solution? Considering his hash algorithm, what is >>the probability that duplication among instances might occur (Reflector reveals that the BCL often implements hashes similarly)?

>>You might implement the INotifyPropertyChanged interface.


There probably isn't a good 100% working implementation that relies on the hash code of the internal objects. At some point (granted it may be a huge number) you're guaranteed a hash collision. And the hash algorithm in this class really has nothing to do with the potential of such a collision as all it does is really rely on the hash codes of the stored objects. You would have to look at the hashcode implementation in those objects and figure out how likely those hashes are to collide.

If the objects Mads is storing are tightly controlled as he says and he is comfortable with the potential failure rate then that is all well and good. One should not consider this a end-all be-all solution to this very common problem however for the reaons we've all explained above.

I too would recommened considering the INotifyPropertyChanged interface.  You could look at Rocky's code from CSLA (http://www.lhotka.net) to see how he solves a similar problem.

Vijay Santhanam
Vijay Santhanam Australia
1/10/2008 7:14:38 AM #

I'd prefer to use ObservableCollection<T>. It has events for monitoring changes.

Pingbacks and trackbacks (2)+

Comments are closed

About the author

Mads Kristensen

Mads Kristensen
Program Manager at the Microsoft Web Platform team and founder of BlogEngine.NET.

More...

Month List

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.