20 October 2010

How do I – Filter values in two lists of custom classes using Linq

I originally titled this post “Linq – So wonderful and oh so damn frustrating”, but then decided that the post probably warrants its current name since someone might actually be trying to do the same thing and search for examples of how to achieve that.  There are tons of examples of just how cool Linq is and I’m not saying that it isn’t, but man it can be so frustrating to work with at times.  Now I’m not going to pretend that I’m a Linq expert.  I’m not.  I know just enough to be dangerous.  Linq has been useful to me in the past, but this week I stumbled across an issue that just drove me batty!
<rant>
Just this weekend I was telling Jess (my wonderful SciFi, geeky wife) that 80% of a developer’s time is spent figuring out why code (methods & APIs) does NOT work the way it’s supposed to.  Not how we THINK it’s supposed to work, but how it was PUBLISHED and advertised to work. 
</rant>
OK, off my soapbox and back to the Linq issue…
I have a class defined as SystemFile.  I’m basically trying to compare a folder with all its files and sub folders to another folder with its files and sub folders.  In the process I have two lists of SystemFile containing the info about the two folders I wish to compare.  Using Linq to compare the lists, we can define a comparer class of type IEqualityComparer<T> to do the comparison of our custom class and assist in the filtering.  My comparer class is defined thus:

public class SystemFileGACComparor : IEqualityComparer<SystemFile>
{
    public bool Equals(SystemFile source, SystemFile target)
    {
        if (source.FullPath.ToLower().Contains(@"c:\windows\assembly\temp"))
        {
            return true;
        }
        else
        {
            if (source.FullPath.ToLower().Contains(@"c:\windows\assembly\tmp"))
            {
                return true;
            }
            else
            {
                if (source.FullPath == target.FullPath)
                {
                    return true;
                }
                else
                {
                    return false;
                }
            }
        }
    }

    public int GetHashCode(SystemFile source)
    {
        return source.FullPath.GetHashCode();
    }
}

A note about my logic.  The easiest code implementation of the Equals() method could have been simply as
   return (source.FullPath == target.FullPath);
but since I am actually comparing GAC folders in this case, there are the the “Temp” and “Tmp” folders to consider (and exclude) in my comparison.  Since “Tmp” is used during installation and “Temp” during uninstallation of of assemblies, they will have temporary GUID values in their path names which will always be different between systems.  As a result, we have to exclude anything from these folders in our comparison.  For that reason, I added the checks for these folders in the path of the source being checked.
With the comparer class in place, let’s implement it.  The code is straight forward thus:

SystemFileGACComparor compare = new SystemFileGACComparor();
IEnumerable<SystemFile> sfSource = LoadFileListFromXML("1WEB.xml");
IEnumerable<SystemFile> sfTarget = LoadFileListFromXML("1APP.xml");
List<SystemFile> lstOnSourceNotTarget = sfSource.Except(sfTarget, compare).ToList();
First we define a an instance of the comparer class for use.
Then we load the XML dump of our two lists into an IEnumerable<SystemFile> structure so that Linq can work on them.
Now simply ask Linq to compare the two lists and produce a list with the differences.  Per the published MSDN documentation, the Except() method will take our sfSource and remove any records that it finds in sfTarget that match, finally returning what remains.
For the record, here is what I’m comparing:
1WEB.xml
<?xml version=”1.0″ encoding=”utf-8″?>
<GAC>
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a\adodb.dll” FileName=”adodb.dll” Size=”110592″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-07T14:27:58.103-04:00″ LastModifiedAt=”2010-07-07T14:27:58.119-04:00″ LastAccessedAt=”2010-07-07T14:27:58.103-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a\__AssemblyInfo__.ini” FileName=”__AssemblyInfo__.ini” Size=”196″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-07T14:29:40.671-04:00″ LastModifiedAt=”2010-07-07T14:29:40.703-04:00″ LastAccessedAt=”2010-07-07T14:29:40.671-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” FileName=”7.0.3300.0__b03f5f7f11d50a3a” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC\ADODB” CreatedAt=”2010-07-07T14:27:58.103-04:00″ LastModifiedAt=”2010-07-07T14:29:40.671-04:00″ LastAccessedAt=”2010-07-07T14:29:40.671-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB” FileName=”ADODB” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC” CreatedAt=”2010-07-07T14:29:40.703-04:00″ LastModifiedAt=”2010-07-07T14:29:40.718-04:00″ LastAccessedAt=”2010-07-07T14:29:40.718-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a\envdte.dll” FileName=”envdte.dll” Size=”245760″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-13T11:42:59.953-04:00″ LastModifiedAt=”2010-07-13T11:42:59.968-04:00″ LastAccessedAt=”2010-07-13T11:42:59.953-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a\__AssemblyInfo__.ini” FileName=”__AssemblyInfo__.ini” Size=”194″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-13T11:43:16.625-04:00″ LastModifiedAt=”2010-07-13T11:43:16.625-04:00″ LastAccessedAt=”2010-07-13T11:43:16.625-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” FileName=”8.0.0.0__b03f5f7f11d50a3a” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE” CreatedAt=”2010-07-13T11:42:59.953-04:00″ LastModifiedAt=”2010-07-13T11:43:16.625-04:00″ LastAccessedAt=”2010-07-13T11:43:16.625-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE” FileName=”EnvDTE” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC” CreatedAt=”2010-07-13T11:43:16.64-04:00″ LastModifiedAt=”2010-07-13T11:43:16.64-04:00″ LastAccessedAt=”2010-07-13T11:43:16.64-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\30ROE94YXW\One.MasterPages.dll” FileName=”One.MasterPages.dll” Size=”6144″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\temp\30ROE94YXW” CreatedAt=”2010-10-11T10:43:21.996-04:00″ LastModifiedAt=”2010-10-11T10:43:21.996-04:00″ LastAccessedAt=”2010-10-11T10:43:21.996-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\30ROE94YXW” FileName=”30ROE94YXW” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\temp” CreatedAt=”2010-10-11T15:27:36.922-04:00″ LastModifiedAt=”2010-10-11T15:27:36.922-04:00″ LastAccessedAt=”2010-10-11T15:27:36.922-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\5ZY4HSUIYK\One.EVMS.Dashboard.dll” FileName=”One.EVMS.Dashboard.dll” Size=”837632″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\temp\5ZY4HSUIYK” CreatedAt=”2010-10-04T08:56:07.183-04:00″ LastModifiedAt=”2010-10-04T08:56:07.277-04:00″ LastAccessedAt=”2010-10-04T08:56:07.183-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\5ZY4HSUIYK” FileName=”5ZY4HSUIYK” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\temp” CreatedAt=”2010-10-11T10:51:38.965-04:00″ LastModifiedAt=”2010-10-11T10:51:38.965-04:00″ LastAccessedAt=”2010-10-11T10:51:38.965-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp” FileName=”temp” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly” CreatedAt=”2009-07-14T00:58:28.892-04:00″ LastModifiedAt=”2010-10-11T17:33:53.016-04:00″ LastAccessedAt=”2010-10-11T17:33:53.016-04:00″ />
  <File FullPath=”C:\Windows\Assembly\tmp” FileName=”tmp” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly” CreatedAt=”2010-07-07T14:18:45.924-04:00″ LastModifiedAt=”2010-10-11T17:35:50.969-04:00″ LastAccessedAt=”2010-10-11T17:35:50.953-04:00″ />
</GAC>
1APP.xml
<?xml version=”1.0″ encoding=”utf-8″?>
<GAC>
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a\adodb.dll” FileName=”adodb.dll” Size=”110599″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-07T14:27:58.103-04:00″ LastModifiedAt=”2010-07-07T14:27:58.119-04:00″ LastAccessedAt=”2010-07-07T14:27:58.103-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a\__AssemblyInfo__.ini” FileName=”__AssemblyInfo__.ini” Size=”196″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-07T14:29:40.671-04:00″ LastModifiedAt=”2010-07-07T14:29:40.703-04:00″ LastAccessedAt=”2010-07-07T14:29:40.671-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB\7.0.3300.0__b03f5f7f11d50a3a” FileName=”7.0.3300.0__b03f5f7f11d50a3a” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC\ADODB” CreatedAt=”2010-07-07T14:27:58.103-04:00″ LastModifiedAt=”2010-07-07T14:29:40.671-04:00″ LastAccessedAt=”2010-07-07T14:29:40.671-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\ADODB” FileName=”ADODB” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC” CreatedAt=”2010-07-07T14:29:40.703-04:00″ LastModifiedAt=”2010-07-07T14:29:40.718-04:00″ LastAccessedAt=”2010-07-07T14:29:40.718-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.1__b03f5f7f11d50a3a\envdte.dll” FileName=”envdte.dll” Size=”245760″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-13T11:42:59.953-04:00″ LastModifiedAt=”2010-07-13T11:42:59.968-04:00″ LastAccessedAt=”2010-07-13T11:42:59.953-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.1__b03f5f7f11d50a3a\__AssemblyInfo__.ini” FileName=”__AssemblyInfo__.ini” Size=”194″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.0__b03f5f7f11d50a3a” CreatedAt=”2010-07-13T11:43:16.625-04:00″ LastModifiedAt=”2010-07-13T11:43:16.625-04:00″ LastAccessedAt=”2010-07-13T11:43:16.625-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE\8.0.0.1__b03f5f7f11d50a3a” FileName=”8.0.0.0__b03f5f7f11d50a3a” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC\EnvDTE” CreatedAt=”2010-07-13T11:42:59.953-04:00″ LastModifiedAt=”2010-07-13T11:43:16.625-04:00″ LastAccessedAt=”2010-07-13T11:43:16.625-04:00″ />
  <File FullPath=”C:\Windows\Assembly\GAC\EnvDTE” FileName=”EnvDTE” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\GAC” CreatedAt=”2010-07-13T11:43:16.64-04:00″ LastModifiedAt=”2010-07-13T11:43:16.64-04:00″ LastAccessedAt=”2010-07-13T11:43:16.64-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\33ROE94YXW\One.MasterPages.dll” FileName=”One.MasterPages.dll” Size=”6144″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\temp\30ROE94YXW” CreatedAt=”2010-10-11T10:43:21.996-04:00″ LastModifiedAt=”2010-10-11T10:43:21.996-04:00″ LastAccessedAt=”2010-10-11T10:43:21.996-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\33ROE94YXW” FileName=”30ROE94YXW” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\temp” CreatedAt=”2010-10-11T15:27:36.922-04:00″ LastModifiedAt=”2010-10-11T15:27:36.922-04:00″ LastAccessedAt=”2010-10-11T15:27:36.922-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\53Y4HSUIYK\One.EVMS.Dashboard.dll” FileName=”One.EVMS.Dashboard.dll” Size=”837632″ IsReadOnly=”false” IsFolder=”false” DirectoryName=”C:\Windows\Assembly\temp\5ZY4HSUIYK” CreatedAt=”2010-10-04T08:56:07.183-04:00″ LastModifiedAt=”2010-10-04T08:56:07.277-04:00″ LastAccessedAt=”2010-10-04T08:56:07.183-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp\53Y4HSUIYK” FileName=”5ZY4HSUIYK” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly\temp” CreatedAt=”2010-10-11T10:51:38.965-04:00″ LastModifiedAt=”2010-10-11T10:51:38.965-04:00″ LastAccessedAt=”2010-10-11T10:51:38.965-04:00″ />
  <File FullPath=”C:\Windows\Assembly\temp” FileName=”temp” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly” CreatedAt=”2009-07-14T00:58:28.892-04:00″ LastModifiedAt=”2010-10-11T17:33:53.016-04:00″ LastAccessedAt=”2010-10-11T17:33:53.016-04:00″ />
  <File FullPath=”C:\Windows\Assembly\tmp” FileName=”tmp” Size=”0″ IsReadOnly=”false” IsFolder=”true” DirectoryName=”C:\Windows\Assembly” CreatedAt=”2010-07-07T14:18:45.924-04:00″ LastModifiedAt=”2010-10-11T17:35:50.969-04:00″ LastAccessedAt=”2010-10-11T17:35:50.953-04:00″ />
</GAC>
When we run through the code and break after the Except() method, this is what we see for the sfSource an sfTarget:

image
As we expected, we see all the files in the sfSource.  Now let’s look at the value of the results list lstOnSourceNotTarget:

image
Hmm… that’s curious… I expected the three EnvDTE records to be there, but the \temp\ files SHOULD have been filtered out by our comparer class’ Equal() method, right?
Confused, I set a break inside our comparer class and rerun our code to see what is actually being compared and filtered.  This is what we see:
Breaking here:

image
First break

image
Second break

image
Third break

image
Fourth break

image
Fifth break

image
Sixth break

image
Seventh break

image
And ???

image
Hmm… Seven breaks for 14 files and NONE of them were the 4 that shows up at the end with \temp\ in the name.  WTF???!!!
Are you confused?  I sure am!!!
The closest thing to a “rationalization” I can make for myself on this is that it has something to do with Linq’s LAZY nature.  So the items doesn’t get checked unless I iterate over them.  (I thought that’s what the Except() method was doing, but oh well…).
OK, enough time spent on something that DOESN’T WORK AS PUBLISHED!!!
Let’s get a workaround in place…
We can use the Where() method in Linq to get the subset of records that actually contains the string we’re trying to parse out and then reversing our logic, we can pass that set of records to the Except() method to exclude them from the original list.  Our new code looks like this: 

SystemFileGACComparor compare = new SystemFileGACComparor();
IEnumerable<SystemFile> sfSource = LoadFileListFromXML("1WEB.xml");
sfSource = sfSource.Except(
    sfSource.Where(filter => filter.FullPath.ToLower().Contains(@"c:\windows\assembly\temp")), compare).Except(
    sfSource.Where(filter => filter.FullPath.ToLower().Contains(@"c:\windows\assembly\tmp")), compare);
IEnumerable<SystemFile> sfTarget = LoadFileListFromXML("1APP.xml");
sfTarget = sfTarget.Except(
    sfTarget.Where(filter => filter.FullPath.ToLower().Contains(@"c:\windows\assembly\temp")), compare).Except(
    sfTarget.Where(filter => filter.FullPath.ToLower().Contains(@"c:\windows\assembly\tmp")), compare);
List<SystemFile> lstOnSourceNotTarget = sfSource.Except(sfTarget, compare).ToList();

We start by taking the list and applying the Where() method against it.  Inside the Where() method expression, we use the .FullPath property and convert its value to all lower case using the ToLower() method.  Once in all lower case, we use the Contains() method to check for the “temp” reference.  This will produce a list of only the records that actually contains the “temp” values.  Passing that off to the Except() method leaves us with the original list MINUS the records containing “temp”.
Wash, rinse, repeat…
We simply drop in a second Except() with the same code and a reference to “tmp” instead and tada!  We have a list that doesn’t contain either “temp” or “tmp”.
Finally, we can move onto the next problems that doesn’t work as published…



Cheers
C

No comments:

Post a Comment

Comments are moderated only for the purpose of keeping pesky spammers at bay.

Microsoft Authentication Library (MSAL) Overview

The Microsoft Authentication Library (MSAL) is a powerful library designed to simplify the authentication process for applications that conn...