## Return the subset of NumPy array according to the first element of each row

Question

I am trying to get the subset x of the given NumPy array alist such that the first element of each row must be in the list r.

``````>>> import numpy
>>> alist = numpy.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1), (3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])
>>> alist
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])
>>> r = [1,3]
>>> x = alist[where first element of each row is in r] #this i need to figure out.
>>> x
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
``````

Any easy way (without looping as I've a large dataset) to do this in Python?

Show source
2016-12-20 12:12 2 Answers

## Answers to Return the subset of NumPy array according to the first element of each row ( 2 )

1. Slice the first column off input array (basically selecting first elem from each row), then use `np.in1d` with `r` as the second input to create a mask of such valid rows and finally index into the rows of the array with the mask to select the valid ones.

Thus, the implementation would be like so -

``````alist[np.in1d(alist[:,0],r)]
``````

Sample run -

``````In [258]: alist   # Input array
Out[258]:
array([[0, 2],
[0, 4],
[1, 3],
[1, 4],
[2, 1],
[3, 1],
[3, 2],
[4, 1],
[4, 3],
[4, 2]])

In [259]: r  # Input list to be searched for
Out[259]: [1, 3]

In [260]: np.in1d(alist[:,0],r) # Mask of valid rows
Out[260]: array([False, False,  True,  True, False,  True,  True,
False, False, False], dtype=bool)

In [261]: alist[np.in1d(alist[:,0],r)] # Index and select for final o/p
Out[261]:
array([[1, 3],
[1, 4],
[3, 1],
[3, 2]])
``````
2. You can construct the index array for the valid rows using some indexing tricks: we can add an additional dimension and check equality with each element of your first column:

``````import numpy as np
alist = np.array([(0, 2), (0, 4), (1, 3), (1, 4), (2, 1),
(3, 1), (3, 2), (4, 1), (4, 3), (4, 2)])

inds = (alist[:,0][:,None] == r).any(axis=-1)
x = alist[inds,:] # the valid rows
``````

The trick is that we take the first column of `alist`, make it an `(N,1)`-shaped array, make use of array broadcasting in the comparison to end up with an `(N,2)`-shape boolean array, and if any of the values in a given row is `True`, we keep that index. The resulting index array is the exact same as the `np.in1d` one in Divakar's answer.