## Ruby - Determine Big O comparing two arrays

Question

My algorithm skills are lackluster. I created a method to see if two arrays contain the same elements (duplicates don't matter):

``````one = [1, "taco", 3, 2, :piece, 4, 5, 5, 5, 5]
two = [:piece, 2, 5, 4, 1, "taco", 3]

def same_elements?(array_one, array_two)
return true if ( (array_one - array_two).empty? && (array_two - array_one).empty? )
return false
end

same_elements?(one, two)
``````

This returns true (which is correct). The problem is I am not sure what the efficiency of this algorithm is? My first guess is O(n^2) since you have to check both a-b and b-a. I know that O(n^2) is pretty terrible. Is there a more efficient way to do this?

Thanks

Show source

## Answers to Ruby - Determine Big O comparing two arrays ( 2 )

1. Let the size of first and second array be m and n respectively. Looking at `rb_ary_diff`'s source code (see Joel Cornett's comment above), there's a `for` loop which runs O(m) times. Inside the loop there's a call that seems to search the hash. This operation in general takes O(n) time. Thus, assuming that all other calls are asymptotically faster than O(mn), then the overall difference function complexity is O(mn). Calling this function twice followed by an emptyness check results in your algorithm being O(mn).

Edit: On average hash search is constant, i.e., O(1), which means that in this case your algorithm performs in O(n). Still, hash search worst case complexity is O(n) meaning your algorithm is O(mn). It's a good exercise to find an example which demonstrates this.

O(n+m) on average

The worst case is O(nm), but it only happens if you really want to make it happen (see last paragraph).

If `array_one` is chosen to be bigger than `array_two`, O(m+n) is just O(n), so this algorithm runs in linear time on average.

# Alternative

Another shorter way to check would be :

``````one = [1, "taco", 3, 2, :piece, 4, 5, 5, 5, 5]
two = [:piece, 2, 5, 4, 1, "taco", 3]

puts Set[*one] == Set[*two] #=> true
# or
puts one.to_set == two.to_set #=> true
``````

# Small refactoring

``````return true if x
return false
``````

is equivalent to just

``````x
``````

So your code can be written :

``````def same_elements?(array_one, array_two)
(array_one - array_two).empty? && (array_two - array_one).empty?
end
``````

# Benchmark

I created one array with 1E6 elements, half of which are random numbers between 0 and 199999 (for collisions), the other half being plain Ruby objects.

The other array is just the first one, shuffled randomly.

``````N = 1_000_000

one = (1..N).map{rand < 0.5 ? rand(N/5) : Object.new}
two = one.sort_by{rand}
``````

It takes 1 min to compare the sets, and fruity reports that set comparison is about 20% faster than the OP's method.

For smaller arrays of integers, the OP's method was a bit faster.

Note: code proposed by @engineersmnky in comments was reported to have similar speed than the other methods.

# Time complexity

Your code is surely not `O(nm)` when used with usual Arrays.

Approximate times are :

• 1s for 1E4
• 8s for 1E5
• 160s for 1E6

Looking at `rb_ary_diff` in `array.c`, it's no wonder all the methods described above run in the same time : they basically work the same.

`rb_ary_diff` creates a hash table for `array_two` ( in O(m)), and iterates over each element of `array_one` (in O(n)), looking for the value in the hash table (O(1) on average). The whole operation would be O(n+m) on average.

This blog post analyses set intersection, which is implemented in a very similar way.

Doing it twice doesn't change anything, so the overall time complexity stays O(n+m).

# Looking for O(mn)

One way to make this algorithm O(mn) is to completely disable the hash method. There's no reason to do so, except to prove that it's a very bad idea.

With 10_000 KeyObjects :

``````class KeyObject < Object
end
``````

the set comparison takes less than 1 s.

With 10_000 KeyObjects :

``````class KeyObject < Object
def hash
1
end
end
``````

the set comparison takes more than 14 mins!

The probability that 2 distinct, random Ruby objects have the same hash is about 1E-20. Strictly speaking, the worst case for this algorithm is O(mn), but it will never happen if you don't go looking for it. Finding a collision with 2 elements isn't trivial, finding a collision with 1E6 elements is not going to happen by chance.