Postby pythonista » Fri Aug 01, 2014 10:07 pm

Hello all, I'm not new to python but I am new to Pandas and this forum. I hope I am in the right place for this help request.

I need to visualize some data but first I need to perform some calculations that seem too cumbersome in Tableau (am I hated if I say tableau sucks!)

My data set is a number of fields associated to usage of an application by user id. So there are potentially multiple entries for each user id and each entry (record) has information in columns such as time they began using app, end time, price they paid, whether they were on wifi, and other attributes (dimensions).

I have one year of data and want to do things like calculate average/total of duration/price paid in app over each month and over the full year of each user (remember each user will appear multiple times-each time they sign in).

I know some basics, like appending a column which subtracts start time from end time to get time spent and my python is fully functional but my data capabilities are amateur.

My question is, say I want the following attributes (measures) calculated (all per user id): average price, total price, max/min price, median price, average duration, total duration, max/min duration, median duration, and number of times logged in (so number of instances of id) and all on a per month and per year basis. I know that I could calculate each of these things but what is the best way to store them for use in a visualization?

For context, I may want to visualize the group of users who paid on average more than 8$ and were in the app a total of more than 3 hours in terms of what shows they watched and whether they were on wifi (other attributes in the data set) and I may want to see it broken down monthly. Would it then be best to create a yearly table and a table for each month for a total of 13 tables each of which contain the user id's over that time period with all the original information and then append a column for each calculation (if the calc is an avg then I enter the same value for each instance of an id)?

Any help is much appreciated, I'm so hoping it makes sense to do this in python as opposed to Tableau....please help :)

Re: Help with Pandas

Postby pythonista » Sun Aug 03, 2014 6:11 pm

I realize this is a long post; I was trying to thoroughly describe my problem...can anyone comment on why they can't comment :) is it a tough problem? am I in the wrong place generally for data help? any lead would be great!
Re: Help with Pandas

Postby micseydel » Sun Aug 03, 2014 8:31 pm

Pandas gets brought up on this forum on occasion, but I don't think that any of our members use it. You can try using the search feature to see if there's any precedent for your question. We get enough questions here that it wouldn't surprise me if there exists somewhere an entire forum devoted to Pandas.
