## what is the most efficient way to find the position of the first np.nan value?

Question

consider the array `a`

```
a = np.array([3, 3, np.nan, 3, 3, np.nan])
```

I could do

```
np.isnan(a).argmax()
```

But this requires finding all `np.nan`

just to find the first.

Is there a more efficient way?

I've been trying to figure out if I can pass a parameter to `np.argpartition`

such that `np.nan`

get's sorted first as opposed to last.

EDIT regarding [dup].

There are several reasons this question is different.

- That question and answers addressed equality of values. This is in regards to
`isnan`

. - Those answers all suffer from the same issue my answer faces. Note, I provided a perfectly valid answer but highlighted it's inefficiency. I'm looking to fix the inefficiency.

EDIT regarding second [dup].

Still addressing equality and question/answers are old and very possibly outdated.

Show source

## Answers ( 4 )

When looking for the first match in various scenarios, we could iterate through and look for the first match and exit out on the first match rather than going/processing the entire array. So, we would have an approach using

`Python's next function`

, like so -Sample runs -

Here is a pythonic approach using

`itertools.takewhile()`

:Benchmark with generator_expression_within_

`next`

approach:^{ 1 }But still (by far) slower than numpy approach:

_{ 1. The problem with this approach is using enumerate function. Which returns an enumerate object from the numpy array first (which is an iterator like object) and calling the generator function and next attribute of the iterator will take time. }It might also be worth to look into

`numba.jit`

; without it, the vectorized version will likely beat a straight-forward pure-Python search in most scenarios, but after compiling the code, the ordinary search will take the lead, at least in my testing:Edit: As pointed out by @hpaulj in their answer,

`numpy`

actually ships with an optimized short-circuited search whose performance is comparable with the JITted search above:I'll nominate

With

`@fuglede's`

test array:I don't have

`numba`

installed, so can compare that. But my speedup relative to`short`

is greater than`@fuglede's`

6x.I'm testing in Py3, which accepts

`<np.nan`

, while Py2 raises a runtime warning. But the code search suggests this isn't dependent on that comparison.`/numpy/core/src/multiarray/calculation.c`

`PyArray_ArgMax`

plays with axes (moving the one of interest to the end), and delegates the action to`arg_func = PyArray_DESCR(ap)->f->argmax`

, a function that depends on the dtype.In

`numpy/core/src/multiarray/arraytypes.c.src`

it looks like`BOOL_argmax`

short circuits, returning as soon as it encounters a`True`

.And

`@fname@_argmax`

also short circuits on maximal`nan`

.`np.nan`

is 'maximal' in`argmin`

as well.Comments from experienced

`c`

coders are welcomed, but it appears to me that at least for`np.nan`

, a plain`argmax`

will be as fast you we can get.Playing with the

`9999`

in generating`a`

shows that the`a.argmax`

time depends on that value, consistent with short circuiting.