Why is sum so much faster than inject(:+)?

Question

So I was running some benchmarks in Ruby 2.4.0 and realized that

(1...1000000000000000000000000000000).sum

calculates immediately whereas

(1...1000000000000000000000000000000).inject(:+)

takes so long that I just aborted the operation. I was under the impression that Range#sum was an alias for Range#inject(:+) but it seems like that is not true. So how does sum work, and why is it so much faster than inject(:+)?

N.B. The documentation for Enumerable#sum (which is implemented by Range) does not say anything about lazy evaluation or anything along those lines.


Show source
| ruby   2017-01-03 18:01 1 Answers

Answers to Why is sum so much faster than inject(:+)? ( 1 )

  1. 2017-01-03 19:01

    Short answer

    For an integer range :

    • Enumerable#sum returns (range.max-range.min+1)*(range.max+range.min)/2
    • Enumerable#inject(:+) iterates over every element.

    Theory

    The sum of integers between 1 and n is called a triangular number, and is equal to n*(n+1)/2.

    The sum of integers between n and m is the triangular number of m minus the triangular number of n-1, which is equal to m*(m+1)/2-n*(n-1)/2, and can be written (m-n+1)*(m+n)/2.

    Enumerable#sum in Ruby 2.4

    This property in used in Enumerable#sum for integer ranges :

    if (RTEST(rb_range_values(obj, &beg, &end, &excl))) {
        if (!memo.block_given && !memo.float_value &&
                (FIXNUM_P(beg) || RB_TYPE_P(beg, T_BIGNUM)) &&
                (FIXNUM_P(end) || RB_TYPE_P(end, T_BIGNUM))) { 
            return int_range_sum(beg, end, excl, memo.v);
        } 
    }
    

    int_range_sum looks like this :

    VALUE a;
    a = rb_int_plus(rb_int_minus(end, beg), LONG2FIX(1));
    a = rb_int_mul(a, rb_int_plus(end, beg));
    a = rb_int_idiv(a, LONG2FIX(2));
    return rb_int_plus(init, a);
    

    which is equivalent to:

    (range.max-range.min+1)*(range.max+range.min)/2
    

    the aforementioned equality!

    Complexity

    Thanks a lot to @k_g and @Hynek-Pichi-Vychodil for this part!

    sum

    (1...1000000000000000000000000000000).sum requires three additions, a multiplication, a substraction and a division.

    It's a constant number of operations, but multiplication is O((log n)²), so Enumerable#sum is O((log n)²) for an integer range.

    inject

    (1...1000000000000000000000000000000).inject(:+)

    requires 999999999999999999999999999998 additions!

    Addition is O(log n), so Enumerable#inject is O(n log n).

    With 1E30 as input, inject with never return. The sun will explode long before!

    Test

    It's easy to check if Ruby Integers are being added :

    module AdditionInspector
      def +(b)
        puts "Calculating #{self}+#{b}"
        super
      end
    end
    
    class Integer
      prepend AdditionInspector
    end
    
    puts (1..5).sum
    #=> 15
    
    puts (1..5).inject(:+)
    # Calculating 1+2
    # Calculating 3+3
    # Calculating 6+4
    # Calculating 10+5
    #=> 15
    

    Indeed, from enum.c comments :

    Enumerable#sum method may not respect method redefinition of "+" methods such as Integer#+.

Leave a reply to - Why is sum so much faster than inject(:+)?

◀ Go back