Combining "by" and "on" to join and create summary column for data.table

Question

I have two datasets, one a detailed dataset of weight's and another that's supposed to be a summary dataset. I am trying to create the summary dataset by joining the detail dataset and aggregating, but it isn't working as expected.

Here's a sample code.

mytesta <- data.table(cola = c("a","b"), groupa = c(1,2))  # summary
mytestb <- data.table(groupa = c(1,1,1,1,2,2,2), weighta = c(10,20,30,25,15,30,10))  #detail

And this is my desired output.

   cola groupa weighta
1:    a      1      85
2:    b      2      55

What I tried to do is,

mytesta[mytestb, on = "groupa", weight_summary := sum(i.weighta), by = "groupa"]

The problem is that when by is used, the columns of the inner data.table disappear (for instance, mytesta[mytestb, on = "groupa", .SD, by = "groupa"]). Is there a way around this?


Show source
| r   | data.table   2017-03-13 23:03 2 Answers

Answers to Combining &quot;by&quot; and &quot;on&quot; to join and create summary column for data.table ( 2 )

  1. 2017-03-14 01:03

    Here is a solution where I first merge your two data.tables, then summarize.

    tab = merge(mytesta, mytestb, by="groupa")
    tab
    #    groupa cola weighta
    # 1:      1    a      10
    # 2:      1    a      20
    # 3:      1    a      30
    # 4:      1    a      25
    # 5:      2    b      15
    # 6:      2    b      30
    # 7:      2    b      10
    
    res = tab[, list(weighta=sum(weighta)), by=list(cola, groupa)]
    res
    #    cola groupa weighta
    # 1:    a      1      85
    # 2:    b      2      55
    
  2. 2017-03-14 18:03

    I would do

    mytesta[, v := mytestb[.SD, on=.(groupa), sum(weighta), by=.EACHI]$V1 ]
    

    In a X[Y] join, we're looking up each row of Y in X.

    So if the end goal is to create a new column in Y computed per row, we'll need a join Y[, v := X[Y, ...]] even though Y[X, v := ...] might seem more intuitive at first.

Leave a reply to - Combining "by" and "on" to join and create summary column for data.table

◀ Go back