Return the subset of NumPy array according to the first element of each row

Question

I am trying to get the subset x of the given NumPy array alist such that the first element of each row must be in the list r.

>>> import numpy 
>>> alist = numpy.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1), (3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
>>> alist
array([[0, 2],
   [0, 4],
   [1, 3],
   [1, 4],
   [2, 1],
   [3, 1],
   [3, 2],
   [4, 1],
   [4, 3],
   [4, 2]])
>>> r = [1,3]
>>> x = alist[where first element of each row is in r] #this i need to figure out.
>>> x
array([[1, 3],
   [1, 4],
   [3, 1],
   [3, 2]])

Any easy way (without looping as I've a large dataset) to do this in Python?


Show source
| python   | arrays   | numpy   | vectorization   2016-12-20 12:12 2 Answers

Answers ( 2 )

  1. 2016-12-20 12:12

    Slice the first column off input array (basically selecting first elem from each row), then use np.in1d with r as the second input to create a mask of such valid rows and finally index into the rows of the array with the mask to select the valid ones.

    Thus, the implementation would be like so -

    alist[np.in1d(alist[:,0],r)]
    

    Sample run -

    In [258]: alist   # Input array
    Out[258]: 
    array([[0, 2],
           [0, 4],
           [1, 3],
           [1, 4],
           [2, 1],
           [3, 1],
           [3, 2],
           [4, 1],
           [4, 3],
           [4, 2]])
    
    In [259]: r  # Input list to be searched for
    Out[259]: [1, 3]
    
    In [260]: np.in1d(alist[:,0],r) # Mask of valid rows
    Out[260]: array([False, False,  True,  True, False,  True,  True,
                            False, False, False], dtype=bool)
    
    In [261]: alist[np.in1d(alist[:,0],r)] # Index and select for final o/p
    Out[261]: 
    array([[1, 3],
           [1, 4],
           [3, 1],
           [3, 2]])
    
  2. 2016-12-20 13:12

    You can construct the index array for the valid rows using some indexing tricks: we can add an additional dimension and check equality with each element of your first column:

    import numpy as np
    alist = np.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1),
                         (3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
    
    inds = (alist[:,0][:,None] == r).any(axis=-1)
    x = alist[inds,:] # the valid rows
    

    The trick is that we take the first column of alist, make it an (N,1)-shaped array, make use of array broadcasting in the comparison to end up with an (N,2)-shape boolean array, and if any of the values in a given row is True, we keep that index. The resulting index array is the exact same as the np.in1d one in Divakar's answer.

◀ Go back