lundi 27 juin 2016

Subtract two rows based on condition in Python Pandas

I'm working with a data set where I have time and the concentration of several different species of microorganism with replicates, so it's just a time column and a bunch of numbers for the sake of this question. I was taking measurements every two hours and sometimes I would take two measurements consecutively and these measurements would have timestamps very similar to each other. For those similar timestamps, I would like to take the average of the two rows for all the columns and return those averages into a new data frame where the two values were placed before.

Here is what the dataframe looks like. The timestamps have been converted into numerical values because the relative time/date is irrelevant. You can see an example of what I'm talking about, where there are two very similar times at the 9th and 10th index

      Time        A1       A2       A3
 0    0.000069    118.0    108.0    70.0
 1    0.087049    189.0    54.0     89.0
 2    0.156551    154.0    122.0    107.0
 3    0.721516    129.0    148.0    148.0
 4    0.789329    143.0    162.0    212.0
 5    0.882743    227.0    229.0    149.0
 6    0.964907    208.0    255.0    241.0
 7    1.041424    200.0    241.0    222.0
 8    1.731806    733.0    838.0    825.0
 9    1.794340    804.0    996.0    954.0
10    1.794769    861.0    987.0    1138.0

It seems obvious to round the numbers in the time column to a sensible value, whereby I can use a groupby() function (if I actually needed to group them) and then average the "duplicate" values, but I've gone down a new philosophical road where I would like to use the pandas iterrows() function to go through the rows, 1 by 1, and compare every two consecutive rows and apply a condition to them to achieve the same result. I've arrived at something like this, which has no error code but doesn't seem to do anything.

for i, row in df.iterrows():
    row2 = row + 1 #I feel like this line is the crux of the problem
    if row2.Time - row.Time >= 0.1:
        row = (row2 + row)/2
    else:
        row = row

Out of curiosity, I'd be curious to know which is faster, the groupby and average way or the for loop and average way. Maybe there's a nifty lamba function way to do this as well? I've searched extensively for this type of thing and I would love to see what you all can come up with.

Cheers

Aucun commentaire:

Enregistrer un commentaire