Setting -999 to NaN in DataFrame fails, unless transposed

This is the place for queries that don't fit in any of the other categories.

Setting -999 to NaN in DataFrame fails, unless transposed

Postby tnknepp » Fri Aug 23, 2013 5:06 pm

Interesting problem here. I load a data frame that has a mixed index (series of datetime values, and two rows of spectrometer data) such as:

index
count_wl...
error_wl...
<Timestamp: 2011-03-25 11:09:00>...
<Timestamp: 2011-03-25 11:10:00>...

All columns consist of either floats or integers, so the only "mixed" part is in the index. Within my dataframe I have invalid points flagged as -999. I want these removed before I start doing my stats work, but this fails:

Code: Select all
from numpy import*
import pandas as pd
tmp = pd.load(filename)
tmp[tmp==-999] = NaN


Though this works:
Code: Select all
tmp = tmp.T
tmp[tmp==-999] = NaN


I get the following error when I do not transpose:
ValueError: Cannot do boolean setting on mixed-type frame

I guess this is from the mixed index, but I can't imagine why I would get flagged for this.

I have another problem. Normally, the .describe() function will return a list of stats such as:
Code: Select all
tmp = pd.load(filename)
>>> tmp.pixel_0.describe()
count        859.000000
mean        3388.121071
std        99045.175315
min         -999.000000
25%            1.000000
50%           12.000000
75%           21.000000
max      2902894.000000
dtype: float64


But, after replacing my -999 values:
Code: Select all
tmp = pd.load(filename).T
tmp[tmp==-999] = NaN
tmp = tmp.T

>>> tmp.pixel_0.describe()
count     858
unique    119
top         9
freq       32
dtype: int64


Again, this is indicative of a mixed type, but i cannot imagine what the "mixed" portion is. Everything within the pixel_0 column is of the same type. Anyone who can shed light on this will be very helpful.
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 145
Joined: Mon Mar 11, 2013 7:41 pm

Re: Setting -999 to NaN in DataFrame fails, unless transpose

Postby stranac » Fri Aug 23, 2013 10:40 pm

You need to tell pandas that the first column is the index.
Something like this should work:
Code: Select all
tmp.set_index(0, inplace=True)
tmp[tmp == -999] = NaN


Also, load() is deprecated, you should use read_pickle() instead.
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1246
Joined: Thu Feb 07, 2013 3:42 pm

Re: Setting -999 to NaN in DataFrame fails, unless transpose

Postby tnknepp » Mon Aug 26, 2013 11:20 am

Thanks stranac. I had no idea about read_pickle.

***EDIT***
I'm using Pandas 0.11.0, and I'm not finding read_pickle in this version, nor in 0.12. Are you sure this is a Pandas command?
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 145
Joined: Mon Mar 11, 2013 7:41 pm

Re: Setting -999 to NaN in DataFrame fails, unless transpose

Postby tnknepp » Mon Aug 26, 2013 4:08 pm

Thanks stranac, but this is not helping with my replacement issue. After loading my data frame I continue to get the error posted below. I followed the traceback but am unable to understand what is causing this error. It is true that my data frame's columns are a mix of integers and floats, and my index is a combination of datetime values and strings, but this should not be a problem. I am able to construct similarly formatted dataframes and perform replacements on them in the command line. There is something I am missing here, but do not understand enough of what is going on behind the scenes to troubleshoot further.

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 2035, in __setitem__
self._setitem_frame(key, value)
File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 2070, in _setitem_frame
raise ValueError('Cannot do boolean setting on mixed-type frame')
ValueError: Cannot do boolean setting on mixed-type frame
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 145
Joined: Mon Mar 11, 2013 7:41 pm

Re: Setting -999 to NaN in DataFrame fails, unless transpose

Postby stranac » Mon Aug 26, 2013 4:37 pm

Yeah, pandas still thinks that you have a mixed-type dataframe.
Hard to guess what's going on without seeing the actual data.
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1246
Joined: Thu Feb 07, 2013 3:42 pm

Re: Setting -999 to NaN in DataFrame fails, unless transpose

Postby tnknepp » Mon Aug 26, 2013 4:52 pm

stranac wrote:Yeah, pandas still thinks that you have a mixed-type dataframe.
Hard to guess what's going on without seeing the actual data.


What is a mixed-type frame anyway? I admit I have integer/float columns, but this shouldn't matter (as far as I know).

I've attached a file (the smallest dataset I have, extension changed to .txt for upload...though this doesn't really matter) for your perusal, though I am unsure of how much this will help.
Attachments
test.txt
(230.76 KiB) Downloaded 57 times
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 145
Joined: Mon Mar 11, 2013 7:41 pm

Re: Setting -999 to NaN in DataFrame fails, unless transpose

Postby tnknepp » Mon Aug 26, 2013 5:06 pm

SOLVED!

I forgot that I inserted the <routine> column, which is type=string. Hopefully this helps someone else, since I feel foolish.
I figured this out via:

Code: Select all
>>> import pandas as pd
>>> data = pd.load('file.df')
>>> unique(data.dtypes)
pixel_538      int32
nan8         float64
routine       object
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 145
Joined: Mon Mar 11, 2013 7:41 pm


Return to General Coding Help

Who is online

Users browsing this forum: No registered users and 5 guests