Help with Pandas

This is the place for queries that don't fit in any of the other categories.

Help with Pandas

Postby pythonista » Fri Aug 01, 2014 10:07 pm

Hello all, I'm not new to python but I am new to Pandas and this forum. I hope I am in the right place for this help request.

I need to visualize some data but first I need to perform some calculations that seem too cumbersome in Tableau (am I hated if I say tableau sucks!)

My data set is a number of fields associated to usage of an application by user id. So there are potentially multiple entries for each user id and each entry (record) has information in columns such as time they began using app, end time, price they paid, whether they were on wifi, and other attributes (dimensions).

I have one year of data and want to do things like calculate average/total of duration/price paid in app over each month and over the full year of each user (remember each user will appear multiple times-each time they sign in).

I know some basics, like appending a column which subtracts start time from end time to get time spent and my python is fully functional but my data capabilities are amateur.

My question is, say I want the following attributes (measures) calculated (all per user id): average price, total price, max/min price, median price, average duration, total duration, max/min duration, median duration, and number of times logged in (so number of instances of id) and all on a per month and per year basis. I know that I could calculate each of these things but what is the best way to store them for use in a visualization?

For context, I may want to visualize the group of users who paid on average more than 8$ and were in the app a total of more than 3 hours in terms of what shows they watched and whether they were on wifi (other attributes in the data set) and I may want to see it broken down monthly. Would it then be best to create a yearly table and a table for each month for a total of 13 tables each of which contain the user id's over that time period with all the original information and then append a column for each calculation (if the calc is an avg then I enter the same value for each instance of an id)?

Any help is much appreciated, I'm so hoping it makes sense to do this in python as opposed to Tableau....please help :)

Last edited by micseydel on Fri Aug 01, 2014 11:31 pm, edited 1 time in total.
Reason: First post lock.
Posts: 2
Joined: Fri Aug 01, 2014 9:40 pm

Re: Help with Pandas

Postby pythonista » Sun Aug 03, 2014 6:11 pm

I realize this is a long post; I was trying to thoroughly describe my problem...can anyone comment on why they can't comment :) is it a tough problem? am I in the wrong place generally for data help? any lead would be great!
Posts: 2
Joined: Fri Aug 01, 2014 9:40 pm

Re: Help with Pandas

Postby micseydel » Sun Aug 03, 2014 8:31 pm

Pandas gets brought up on this forum on occasion, but I don't think that any of our members use it. You can try using the search feature to see if there's any precedent for your question. We get enough questions here that it wouldn't surprise me if there exists somewhere an entire forum devoted to Pandas.
Due to the reasons discussed here we will be moving to on October 1, 2016.

This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.
User avatar
Posts: 3000
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Return to General Coding Help

Who is online

Users browsing this forum: Bing [Bot], Yahoo [Bot] and 4 guests