## Matching IDs Between Pandas DataFrames and Applying Function

Question

I have two data frames that look like the following:

df_A:

```
ID x y
a 0 0
c 3 2
b 2 5
```

df_B:

```
ID x y
a 2 1
c 3 5
b 1 2
```

I want to add a column in db_B that is the Euclidean distance between the x,y coordinates in df_B from df_A for each identifier. The desired result would be:

```
ID x y dist
a 2 1 1.732
c 3 5 3
b 1 2 3.162
```

The identifiers are not necessarily going to be in the same order. I know how to do this by looping through the rows of df_A and finding the matching ID in df_B, but I was hoping to avoid using a for loop since this will be used on data with tens of millions of rows. Is there some way to use apply but condition it on matching IDs?

Show source

## Answers ( 3 )

If

`ID`

isn't the index, make it so.Since index and columns are already aligned, simply doing the math should just work.

Solution which uses sklearn.metrics.pairwise.paired_distances method:

For performance, you might want to work with NumPy arrays and for euclidean distance computations between corresponding rows,

`np.einsum`

would be do it pretty efficiently.Incorporating the fixing of rows to make them aligned, here's an implementation -

Sample input, output -

Here's a

`runtime test`

comparing`einsum`

against few other counterparts.