A state aware generic list

by Mads Kristensen 19. December 2007 04:50

For so many years I’ve written code that needed to know whether or not a collection of objects has been changed or not. Now I’ve finally written the code that tells me just that.

The scenario is typical. You have a class called Order with a property called OrderLines which is a generic List containing OrderLine objects. You retrieve the Order class from the database including all the OrderLines, change some properties and then you save the Order object back to the database. Now the question is if you need to save the OrderLines as well. To know that, you need to know if the OrderLines collection has been changed. It would be a waste of resource to persist all the order lines each time you make a modification to the Orders object. To know if it has changed, we need a list that is aware of its own state.

The StateList<T>

This class is a subclass of the List<T> generic class but with a few twists. It does exactly what a normal List<T> does, but has two new members – a MarkUnChanged method and an IsChanged property. It also overrides the GetHashCode method. That’s all it does.

The MarkUnChanged method tells the StateList what the original list contains. You call this method after you’ve retrieved all the OrderLines and populated the list. That let’s the StateList know what the comparison is for reference. Behind the scenes, the MarkUnChanged method stores the hash code of the items in the list in a private integer variable.

The IsChanged property returns a Boolean value that indicates whether or not the items in the list produce the same hash code as the one stored in the private variable. If it doesn’t, then the list is changed. This is the class.

[System.Serializable]

public class StateList<T> : System.Collections.Generic.List<T>

{

  public override int GetHashCode()

  {

    long hash = 0;

    foreach (T item in this)

    {

      hash += item.GetHashCode();

    }

 

    return hash.GetHashCode();

  }

 

  private int _HashCode = 0;

 

  /// <summary>

  /// Gets if this list's data has been changed.

  /// </summary>

  public virtual bool IsChanged

  {

    get

    {

      return this.GetHashCode() != _HashCode;

    }

  }

 

  /// <summary>

  /// Marks the object as being clean,

  /// which means not changed.

  /// </summary>

  public virtual void MarkUnChanged()

  {

    _HashCode = this.GetHashCode();

    base.TrimExcess();

  }

}

Example

Here is a little taste of how it works:

StateList<int> list = new StateList<int>();

list.Add(3);

list.Add(4);

list.Add(5);

// list.IsChanged = true

 

list.MarkUnChanged();

// list.IsChanged = false

 

list.Add(6);

// list.IsChanged = true

 

list.Clear();

list.Add(3);

list.Add(4);

list.Add(5);

// list.IsChanged = false

The order of the items

The code above produces the same hash code no matter in what order the individual items are located in the list. So if the same items are located in the list but at different locations, the IsChanged property is false. To support location awareness, you can just switch the overridden GetHashCode method with this one:

public override int GetHashCode()

{

  long hash = 0;

  for (int i = 0; i < this.Count; i++)

  {

    hash += this[i].GetHashCode() ^ i;

  }

 

  return hash.GetHashCode();

}

* Only $4.95/month ASP.NET & Windows 2008 + IIS 7 Hosting! FREE SQL Included

Tags:

Server-side

Comments

12/19/2007 2:48:21 PM #

Jesper Blad Jensen

Cool idea to use the hashcode. I have never though of that. If I should build it it, would be some crazy thing with an interface and that it would be the orderlines responsibility to say that its changed - but this is way cooler.

Jesper Blad Jensen Denmark |

12/19/2007 5:11:54 PM #

oVan

Any thoughts on the performance for big lists? I imagine the IsChanged function can take a while if the list contains quite a few entries?

oVan Belgium |

12/19/2007 7:09:19 PM #

Paul

So how do you go about making sure the sum of hash codes (over which you have no control) doesn't match?

ie.
string s0 = "black";
string s1 = "white";
string s2 = "red";
string s2 = "blue";

StateList<string> list = new StateList<string>();
list.Add(s0);
list.Add(s1);

// list.IsChanged = true
list.Clear();

// list.IsChanged = true
list.Add(s2);
list.Add(s3);

// it turns out that by an unfortunately turn of fate
//     s0.GetHashCode() + s1.GetHashCode() == s2.GetHashCode() + s3.GetHashCode()
// list.IsChanged = FALSE!!!

or am I missing something?

Paul United Kingdom |

12/19/2007 9:12:25 PM #

Scott Walker

Yes I would have to agree with Paul that something seems amiss here.

I believe by definition of a hash function that:

a==b implies a.GetHashCode() == b.GetHashCode()
a.GetHashCode() = b.GetHashCode() does not imply that a == b

Hence the problem of hash collisions etc...

Scott Walker United States |

12/19/2007 9:39:32 PM #

trackback

Trackback from DotNetKicks.com

A state aware generic list

DotNetKicks.com |

12/19/2007 9:54:23 PM #

mcgurk

I concur that ur hash code blows.  Adding child hash codes together?  That's... bizzare.  And you cannot at all guarantee that one hash won't equal another hash.  Hash codes in .NET are kinda crappy.

I'd suggest that, when overriding hashes in the future, you XOR together child object hash codes and throw a little salt in there for shits and giggles.  Then, after you have your nice hash function all set up, you never use it.

For this object, what I would do is:
1) seal it
2) new-over the base functions and properties that callers use to change state (this[], add(), remove(), etc)
3) Examine the requested state change to determine if it actually changes the collection's state (i.e., me[key] = me[key] should not change your state to Dirty)
4) Perform the state change and mark yourself as changed or not changed
5) Realize that objects you store may be changed behind your back and you cannot possibly guarantee that you will be able to detect this has happened and drop the whole idea completely.

mcgurk United States |

12/19/2007 10:04:04 PM #

Mads Kristensen

You're probably all right about the hash code not being good enough for standard CLR types. I use this collection for objects that I build my self and control the hash code of. In such a scenario it works perfectly.

Mads Kristensen Denmark |

12/19/2007 11:15:25 PM #

Nicholas DL

Many stated that the GetHashCode is insufficient; however, no one supplied the solution? Considering his hash algorithm, what is the probability that duplication among instances might occur (Reflector reveals that the BCL often implements hashes similarly)?

You might implement the INotifyPropertyChanged interface.

Nicholas DL United States |

12/20/2007 5:50:09 AM #

Dave Transom

Well, it could always use the hash code of the items in the list to generate a unique key for the state of the collection, but not override the GetHashCode method. Instead introduce a new method: GetStateKey()

Dave Transom New Zealand |

12/20/2007 9:18:09 AM #

Scott Walker

>>Many stated that the GetHashCode is insufficient; however, no one supplied the solution? Considering his hash algorithm, what is >>the probability that duplication among instances might occur (Reflector reveals that the BCL often implements hashes similarly)?

>>You might implement the INotifyPropertyChanged interface.


There probably isn't a good 100% working implementation that relies on the hash code of the internal objects. At some point (granted it may be a huge number) you're guaranteed a hash collision. And the hash algorithm in this class really has nothing to do with the potential of such a collision as all it does is really rely on the hash codes of the stored objects. You would have to look at the hashcode implementation in those objects and figure out how likely those hashes are to collide.

If the objects Mads is storing are tightly controlled as he says and he is comfortable with the potential failure rate then that is all well and good. One should not consider this a end-all be-all solution to this very common problem however for the reaons we've all explained above.

I too would recommened considering the INotifyPropertyChanged interface.  You could look at Rocky's code from CSLA (http://www.lhotka.net) to see how he solves a similar problem.

Scott Walker United States |

12/20/2007 8:51:21 PM #

pingback

Pingback from alvinashcraft.com

» Daily Bits - December 20, 2007 Alvin Ashcraft’s Daily Geek Bits: Daily links plus random ramblings about development, gadgets and raising rugrats.

alvinashcraft.com |

1/10/2008 4:14:38 PM #

Vijay Santhanam

I'd prefer to use ObservableCollection<T>. It has events for monitoring changes.

Vijay Santhanam Australia |

Comments are closed

About the slave

Mads Kristensen Mads Kristensen
Web developer at ZYB and founder of BlogEngine.NET. More...

LinkedIn ZYB Facebook Last.fm Twitter View Mads Kristensen's profile on Technorati

The Lounge

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008