Count number of clusters of non-zero values in Python?

Question

My data looks something like this:

a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]

Essentially, there's a bunch of zeroes before non-zero numbers and I am looking to count the number of groups of non-zero numbers separated by zeros. In the example data above, there are 3 groups of non-zero data so the code should return 3.

  • Number of zeros between groups of non-zeros is variable

Any good ways to do this in python? (Also using Pandas and Numpy to help parse the data)


Show source
| python   | numpy   | pandas   2016-12-31 22:12 5 Answers

Answers to Count number of clusters of non-zero values in Python? ( 5 )

  1. 2016-12-31 23:12

    With a as the input array, we could have a vectorized solution -

    m = a!=0
    out = (m[1:] > m[:-1]).sum() + m[0]
    

    Alternatively for performance, we might use np.count_nonzero which is very efficient to count bools as is the case here, like so -

    out = np.count_nonzero(m[1:] > m[:-1]) + m[0] 
    

    Basically, we get a mask of non-zeros and count rising edges. To account for the first element that could be non-zero too and would not have any rising edge, we need to check it and add to the total sum.

    Also, please note that if input a is a list, we need to use m = np.asarray(a)!=0 instead.

    Sample runs for three cases -

    In [92]: a  # Case1 :Given sample
    Out[92]: 
    array([ 0,  0,  0,  0,  0,  0, 10, 15, 16, 12, 11,  9, 10,  0,  0,  0,  0,
            0,  6,  9,  3,  7,  5,  4,  0,  0,  0,  0,  0,  0,  4,  3,  9,  7,
            1])
    
    In [93]: m = a!=0
    
    In [94]: (m[1:] > m[:-1]).sum() + m[0]
    Out[94]: 3
    
    In [95]: a[0] = 7  # Case2 :Add a non-zero elem/group at the start
    
    In [96]: m = a!=0
    
    In [97]: (m[1:] > m[:-1]).sum() + m[0]
    Out[97]: 4
    
    In [99]: a[-2:] = [0,4] # Case3 :Add a non-zero group at the end
    
    In [100]: m = a!=0
    
    In [101]: (m[1:] > m[:-1]).sum() + m[0]
    Out[101]: 5
    
  2. 2016-12-31 23:12

    simple python solution, just count changes from 0 to non-zero, by keeping track of the previous value (rising edge detection):

    a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
    
    previous = 0
    count = 0
    for c in a:
        if previous==0 and c!=0:
            count+=1
        previous = c
    
    print(count)  # 3
    
  3. 2016-12-31 23:12

    You may achieve it via using itertools.groupby() with list comprehension expression as:

    >>> from itertools import groupby
    
    >>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
    3
    
  4. 2016-12-31 23:12
    sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])
    
  5. 2017-01-01 00:01
    • pad array with a zero on both sides with np.concatenate
    • find where zero with a == 0
    • find boundaries with np.diff
    • sum up boundaries found with sum
    • divide by two because we will have found twice as many as we want

    def nonzero_clusters(a):
        return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
    

    demonstration

    nonzero_clusters(
        [0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
    )
    
    3
    

    nonzero_clusters([0, 1, 2, 0, 1, 2])
    
    2
    

    nonzero_clusters([0, 1, 2, 0, 1, 2, 0])
    
    2
    

    nonzero_clusters([1, 2, 0, 1, 2, 0, 1, 2])
    
    3
    

    timing
    a = np.random.choice((0, 1), 100000)
    code

    from itertools import groupby
    
    def div(a):
        m = a != 0
        return (m[1:] > m[:-1]).sum() + m[0]
    
    def pir(a):
        return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
    
    def jean(a):
        previous = 0
        count = 0
        for c in a:
            if previous==0 and c!=0:
                count+=1
            previous = c
        return count
    
    def moin(a):
        return len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
    
    def user(a):
        return sum([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])
    

    enter image description here

Leave a reply to - Count number of clusters of non-zero values in Python?

◀ Go back