Remove row of numpy array based on values from two columns

Question

I have a numpy array with four columns and many rows:

>>> dat
array([['4/5/2004', '17', 0.0, 0.0],
   ['4/5/2004', '7', 0.0, 0.0],
   ['4/5/2004', '19:48:20', 58.432488, -135.9202205],
   ['4/5/2004', '19:48:32', 58.432524300000004, 0.0],
   ['4/5/2004', '19:48:36', 58.4325365, -150.9202813]], dtype=object)  

I would like to remove all rows where the value in columns 3 or 4 equal 0 so the result would be:

   ([['4/5/2004', '19:48:20', 58.432488, -135.9202205],
   ['4/5/2004', '19:48:36', 58.4325365, -150.9202813]])

I can do this one column at a time with:

a = dat[~(dat[:,2]==0), :]  

Which returns rows where value in column 3 does not equal 0. I could do this iteratively for multiple columns, but it would be convenient to do it all in one command.

I thought something like the following two examples would work (but they do not):

a = dat[~(dat[:,2]==0), :] or dat[~(dat[:,3]==0), :] 
a = dat[~(dat[:,2&3]==0), :]

Hopefully there's some simple syntax I'm missing and can't find in the numpy help.


Show source
| python   | arrays   | numpy   2016-12-30 22:12 3 Answers

Answers ( 3 )

  1. 2016-12-30 22:12

    What about using &:

    >>> dat[(dat[:,2] != 0) & (dat[:,3] != 0), :]
    array([['4/5/2004', '19:48:20', 58.432488, -135.9202205],
           ['4/5/2004', '19:48:36', 58.4325365, -150.9202813]], dtype=object)
    

    which yields the element-wise "and".

    I've changed it for != 0 thus the & which avoids the additional inversions with ~.

  2. 2016-12-30 22:12

    Assuming the data array is 2D, we could slice and look for the valid ones -

    dat[~(dat[:,2:4]==0).any(1)]
    

    Alternatively, we can use np.all on the !=0 ones -

    dat[(dat[:,2:4]!=0).all(1)]
    

    When the columns of interest are not contiguous ones, we need to slice them using those column IDs and use the same technique. So, let's say the column IDs to be examined are stored in an array or list named colID, then we would have the approaches modified, like so -

    dat[~(dat[:,colID]==0).any(1)]
    dat[(dat[:,colID]!=0).all(1)]
    

    Thus, for the stated case of columns 3 and 4, we would have : colID = [2,3].

  3. 2016-12-30 22:12

    You got the idea of using or conceptually correct. The main difference is that you want to do logical or (|) or logical and (&) (just like you are using logical not (~)).

    This works because an operation like dat[:,3] == 0 creates an array or booleans of the same size as a column of dat. When this array is used as an index, numpy interprets it as a mask. Splitting off the mask array to highlight this concept:

    mask = (dat[:, 2] != 0) & (dat[:, 3] != 0)
    dat = dat[mask, :]
    

    Another way to compute the mask would be as follows:

    mask = np.logical_and.reduce(dat[:, 2:] != 0, axis=1)
    

    np.logical_and.reduce shrinks the input array across the columns (axis=1) by applying np.logical_and (which is the function that processes the & operator) to the rows, so you get a True where all the elements of the selected portion of each row are True.

◀ Go back