Ruby - Determine Big O comparing two arrays

Question

My algorithm skills are lackluster. I created a method to see if two arrays contain the same elements (duplicates don't matter):

one = [1, "taco", 3, 2, :piece, 4, 5, 5, 5, 5]
two = [:piece, 2, 5, 4, 1, "taco", 3]

def same_elements?(array_one, array_two)
  return true if ( (array_one - array_two).empty? && (array_two - array_one).empty? )
  return false
end

same_elements?(one, two)

This returns true (which is correct). The problem is I am not sure what the efficiency of this algorithm is? My first guess is O(n^2) since you have to check both a-b and b-a. I know that O(n^2) is pretty terrible. Is there a more efficient way to do this?

Thanks


Show source
| ruby   | algorithm   | big-o   2016-12-27 21:12 2 Answers

Answers to Ruby - Determine Big O comparing two arrays ( 2 )

  1. 2016-12-27 22:12

    Let the size of first and second array be m and n respectively. Looking at rb_ary_diff's source code (see Joel Cornett's comment above), there's a for loop which runs O(m) times. Inside the loop there's a call that seems to search the hash. This operation in general takes O(n) time. Thus, assuming that all other calls are asymptotically faster than O(mn), then the overall difference function complexity is O(mn). Calling this function twice followed by an emptyness check results in your algorithm being O(mn).

    Edit: On average hash search is constant, i.e., O(1), which means that in this case your algorithm performs in O(n). Still, hash search worst case complexity is O(n) meaning your algorithm is O(mn). It's a good exercise to find an example which demonstrates this.

  2. 2016-12-27 23:12

    Short answer

    O(n+m) on average

    The worst case is O(nm), but it only happens if you really want to make it happen (see last paragraph).

    If array_one is chosen to be bigger than array_two, O(m+n) is just O(n), so this algorithm runs in linear time on average.

    Alternative

    Another shorter way to check would be :

    one = [1, "taco", 3, 2, :piece, 4, 5, 5, 5, 5]
    two = [:piece, 2, 5, 4, 1, "taco", 3]
    
    puts Set[*one] == Set[*two] #=> true
    # or
    puts one.to_set == two.to_set #=> true
    

    Small refactoring

    return true if x
    return false
    

    is equivalent to just

    x
    

    So your code can be written :

    def same_elements?(array_one, array_two)
      (array_one - array_two).empty? && (array_two - array_one).empty?
    end
    

    Benchmark

    I created one array with 1E6 elements, half of which are random numbers between 0 and 199999 (for collisions), the other half being plain Ruby objects.

    The other array is just the first one, shuffled randomly.

    N = 1_000_000
    
    one = (1..N).map{rand < 0.5 ? rand(N/5) : Object.new}
    two = one.sort_by{rand}
    

    It takes 1 min to compare the sets, and fruity reports that set comparison is about 20% faster than the OP's method.

    For smaller arrays of integers, the OP's method was a bit faster.

    Note: code proposed by @engineersmnky in comments was reported to have similar speed than the other methods.

    Time complexity

    Your code is surely not O(nm) when used with usual Arrays.

    Approximate times are :

    • 1s for 1E4
    • 8s for 1E5
    • 160s for 1E6

    Looking at rb_ary_diff in array.c, it's no wonder all the methods described above run in the same time : they basically work the same.

    rb_ary_diff creates a hash table for array_two ( in O(m)), and iterates over each element of array_one (in O(n)), looking for the value in the hash table (O(1) on average). The whole operation would be O(n+m) on average.

    This blog post analyses set intersection, which is implemented in a very similar way.

    Doing it twice doesn't change anything, so the overall time complexity stays O(n+m).

    Looking for O(mn)

    One way to make this algorithm O(mn) is to completely disable the hash method. There's no reason to do so, except to prove that it's a very bad idea.

    With 10_000 KeyObjects :

    class KeyObject < Object
    end
    

    the set comparison takes less than 1 s.

    With 10_000 KeyObjects :

    class KeyObject < Object
      def hash
        1
      end
    end
    

    the set comparison takes more than 14 mins!

    The probability that 2 distinct, random Ruby objects have the same hash is about 1E-20. Strictly speaking, the worst case for this algorithm is O(mn), but it will never happen if you don't go looking for it. Finding a collision with 2 elements isn't trivial, finding a collision with 1E6 elements is not going to happen by chance.

Leave a reply to - Ruby - Determine Big O comparing two arrays

◀ Go back