Remove row of numpy array based on values from two columns

Question

I have a numpy array with four columns and many rows:

``````>>> dat
array([['4/5/2004', '17', 0.0, 0.0],
['4/5/2004', '7', 0.0, 0.0],
['4/5/2004', '19:48:20', 58.432488, -135.9202205],
['4/5/2004', '19:48:32', 58.432524300000004, 0.0],
['4/5/2004', '19:48:36', 58.4325365, -150.9202813]], dtype=object)
``````

I would like to remove all rows where the value in columns 3 or 4 equal 0 so the result would be:

``````   ([['4/5/2004', '19:48:20', 58.432488, -135.9202205],
['4/5/2004', '19:48:36', 58.4325365, -150.9202813]])
``````

I can do this one column at a time with:

``````a = dat[~(dat[:,2]==0), :]
``````

Which returns rows where value in column 3 does not equal 0. I could do this iteratively for multiple columns, but it would be convenient to do it all in one command.

I thought something like the following two examples would work (but they do not):

``````a = dat[~(dat[:,2]==0), :] or dat[~(dat[:,3]==0), :]
a = dat[~(dat[:,2&3]==0), :]
``````

Hopefully there's some simple syntax I'm missing and can't find in the numpy help.

Show source

Answers to Remove row of numpy array based on values from two columns ( 3 )

1. What about using `&`:

``````>>> dat[(dat[:,2] != 0) & (dat[:,3] != 0), :]
array([['4/5/2004', '19:48:20', 58.432488, -135.9202205],
['4/5/2004', '19:48:36', 58.4325365, -150.9202813]], dtype=object)
``````

which yields the element-wise "and".

I've changed it for `!= 0` thus the `&` which avoids the additional inversions with `~`.

2. Assuming the data array is `2D`, we could slice and look for the valid ones -

``````dat[~(dat[:,2:4]==0).any(1)]
``````

Alternatively, we can use `np.all` on the `!=0` ones -

``````dat[(dat[:,2:4]!=0).all(1)]
``````

When the columns of interest are not contiguous ones, we need to slice them using those column IDs and use the same technique. So, let's say the column IDs to be examined are stored in an array or list named `colID`, then we would have the approaches modified, like so -

``````dat[~(dat[:,colID]==0).any(1)]
dat[(dat[:,colID]!=0).all(1)]
``````

Thus, for the stated case of columns 3 and 4, we would have : `colID = [2,3]`.

3. You got the idea of using `or` conceptually correct. The main difference is that you want to do logical or (`|`) or logical and (`&`) (just like you are using logical not (`~`)).

This works because an operation like `dat[:,3] == 0` creates an array or booleans of the same size as a column of `dat`. When this array is used as an index, `numpy` interprets it as a mask. Splitting off the mask array to highlight this concept:

``````mask = (dat[:, 2] != 0) & (dat[:, 3] != 0)
``````mask = np.logical_and.reduce(dat[:, 2:] != 0, axis=1)
`np.logical_and.reduce` shrinks the input array across the columns (`axis=1`) by applying `np.logical_and` (which is the function that processes the `&` operator) to the rows, so you get a True where all the elements of the selected portion of each row are True.