Temporally aligning data with pandas

This is the place for queries that don't fit in any of the other categories.

Temporally aligning data with pandas

Postby tnknepp » Fri May 24, 2013 12:12 pm

I've recently converted over to using pandas for my data analysis. One nice tool it has is .align, which allows you to align two data sets (as expected) by their respective indexes. I do analysis with time series data, so a tool that quickly temporally aligns data sets is quite useful. However, there is a snag.

I have sets of data saved as pandas.DataFrame types:
Code: Select all
>>>a[:2]
date                  a_water
2010-06-24 14:03:52    3.0102
2010-06-24 14:05:19    3.0218

>>> b[:2]
date                   b_water
2010-06-24 14:01:00     17.840
2010-06-24 14:06:00     18.369

>>>c[:2]
pan[:5]
date                   c_water                 
2010-06-24 12:25:02     41.50
2010-06-27 14:07:19     41.21

When these data frames are aligned they will have no shared indexes (i.e. datetime values). The problem is that some of my data is recorded every minute, on the minute (e.g. "b" above; great, that's easy to deal with!), while some is recorded every 2-5 minues (e.g. like "a" above; not necessarily on the minute), while other data is only recorded once per week (e.g. like "c" above). This alignment issue can be partially mitigated by rounding to the nearest minute, but when I want to compare data like "a" & "c" I run into problems. Is there a way to align data from "a" with "c" with the condition that any data point within a specified cutoff (say 5 minutes) is a valid match? Ideally, this would allow some flexibility in my alignment procedures without requiring me to do some nasty filling of the data sets.
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 122
Joined: Mon Mar 11, 2013 7:41 pm

Re: Temporally aligning data with pandas

Postby setrofim » Fri May 24, 2013 2:50 pm

You can resample the time series so that they are at the same resolution (e.g. 5 mins) before aligning them.
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm

Re: Temporally aligning data with pandas

Postby tnknepp » Fri May 24, 2013 4:24 pm

Yes, though I was trying to avoid applying a blind re-sampling of the data since this can lead to problems. What I mean by that is: If I do a 30-min re-sampling of the data then I get two data points every hour (e.g. average data from 11:45 - 12:15 and 12:15 - 12:45 to yield data time stamped at 12:00 and 12:30 respectively). However, if I have data from another source that was recorded at 12:15, then neither the 12:00 nor the 12:30 value is really representative of this time. What I really need is the data averaged from 12:00 - 12:30 (i.e. centered about my secondary measurement's time).

I've been looking at .resample, and I think it can do what I want...but I'm not finding a clear-cut explanation. If I have two DataFrames, both indexed to datetime, e.g.
Code: Select all
>>>a[:1]
date                  a_water
2010-06-24 14:03:52    3.0102

>>> b[:1]
date                   b_water
2010-06-24 14:01:00     17.840


can I specify, in .resample, to do a 30-minute average on "a" that is centered about "b"s index values? This is what I really want to do. I could always use a for loop, though that is not nearly as pretty.

I suppose since I am using time series data, I should really switch from dataframes to timeseries...
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 122
Joined: Mon Mar 11, 2013 7:41 pm

Re: Temporally aligning data with pandas

Postby setrofim » Fri May 24, 2013 5:37 pm

tnknepp wrote:can I specify, in .resample, to do a 30-minute average on "a" that is centered about "b"s index values?

Erm, not sure what you mean by "center", but you can apply an offset to a after you resample it.
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm


Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot] and 2 guests