How can I efficiently compare the properties of two large lists of objects in C#?

Question

I have a dataset of two lists of objects, which has an ID that will be consistent in both lists but other properties that may or may not be different. How can I most efficiently retrieve the ones that are different based on one or more properties?

My usual approach has been something along the lines of this. My object is set up like:

   public class Person
        {
            public int ID { get; set; }
            public string Name { get; set; }
            public int Age { get; set; }

            public bool IsEqual(Person other)
            {
                if (Name != other.Name)
                {
                    return false;
                }
                if (Age != other.Age)
                {
                    return false;
                }
                return true;
            }
        } 

Where the IsEqual comparator is used to compare it to some equivalent object.

And then my method for finding modified people is like:

  public static List<Person> FindModifiedPeople(List<Person> listA, List<Person> listB)
        {
            var modifiedPeople = new List<Person>();
            foreach (var personA in listA)
            {
                var matchingPerson = listB.FirstOrDefault(e => e.ID == personA.ID);
                if (matchingPerson == null)
                {
                    continue;
                }

                if (!personA.IsEqual(matchingPerson))
                {
                    modifiedPeople.Add(personA);
                }
            }
            return modifiedPeople;
        }

In my dataset, I don't care about people that are in listB but not listA, so I don't need to loop through both lists. I only need to check listA for the element in listB (that may or may not be there) and return a list of people that have been modified (with the elements from listA).

This approach worked fine for reasonably small lists, but now I have two lists with about 160,000 people and this approach takes several minutes. Is there any way to make this method more efficient while still returning what I need it do?


Show source
| c#   2017-11-29 18:11 2 Answers

Answers to How can I efficiently compare the properties of two large lists of objects in C#? ( 2 )

  1. 2017-11-29 18:11

    If you can change your lists to be a Dictionary<int, Person> with the person's ID as the key they this would work for you. This will run in O(n) as opposed to your O(n^2).

    public static List<Person> FindModifiedPeople(Dictionary<int, Person> dictA, Dictionary<int, Person> dictB)
    {
        var modifiedPeople = new List<Person>();
        foreach (var personA in dictA)
        {
            Person matchingPerson;
            if(dictB.TryGetValue(personA.Key, out matchingPerson))
            {
                if (!personA.Value.IsEqual(matchingPerson))
                {
                    modifiedPeople.Add(personA.Value);
                }
            }
        }
        return modifiedPeople;
    }
    

    You could also change the return type from List to another Dictionary as well depending on what you need it for.

    EDIT

    As @maccettura pointed out in his comment, you really should override the built in equals method. That would make your code look something like this.

    public override bool Equals(Object obj)
    {
        if (obj == null || GetType() != obj.GetType())
            return false;
    
        var otherPerson = (Person)obj;
    
        if (Name != otherPerson.Name)
        {
            return false;
        }
        if (Age != otherPerson.Age)
        {
            return false;
        }
        return true;
    }
    

    This will allow your code to work with any stuff that is expecting to use the default Equals method as opposed to your custom one.

  2. 2017-11-29 18:11

    Are you sure that the comparison is the bottleneck? I think that the problem comes form the search you do in this line:

    var matchingPerson = listB.FirstOrDefault(e => e.ID == personA.ID);
    

    There, you are doing a search with a logartihmic complexity of O(n), which coupled with the foreach loop gives a total complexity of O(n^2). Instead, you could create a dictionary upfront, which takes some time, but in which lookups are much faster. The dictionary should have the ID as keys, and can be easily created like this BEFORE THE foreach LOOP:

    var dictB = listB.ToDictionary(p => p.ID);
    

    After that, your lookup would be much faster, like this:

        Person matchingPerson;
        if (dictB.TryGetValue(personA.ID, out matchingPerson))
        {
            if (!personA.IsEqual(matchingPerson))
            {
                modifiedPeople.Add(personA);
            }
        }
    

Leave a reply to - How can I efficiently compare the properties of two large lists of objects in C#?

◀ Go back