best format for data for python analysis

A forum for general discussion of the Python programming language.

best format for data for python analysis

Postby portia » Sat Jun 15, 2013 1:27 pm

Hi there,

I am starting to gather some lexical data that I'd like to analyse with python later on. It'll mostly do with:
-occurence of words
- if another word appeared withing 2-3 lines of the matched word
- if a word is found, is there another word in a neighbouring entry.
- finding unique values
- and some simple calculations (on dates)
That's for next year. First I'm trying to figure out the most optimal way of storing it for further analysis. Sometimes there will be a few entries per day. Sometimes there won't be any entry over a couple of days so the amount of data will not be massive. After a year, I'm going to start analysing it so I want to get the first steps right.

I think "visually" the data will be comprised of the following lines/entries apart from date, they are all strings, not numbers:

Code: Select all
date,   string,    short_dictionary,     list_A,     list_B,     long_string


What's the best way of organising/storing this data in python? eg, using classes (OOP) and/or sqlite, json, xml?
portia
 
Posts: 17
Joined: Sun Apr 14, 2013 10:03 pm

Re: best format for data for python analysis

Postby ochichinyezaboombwa » Sat Jun 15, 2013 3:35 pm

Stay away from OOP, json, and especially xml, unless you know very clearly why you need them. sqlite would make sense if you needed to query your data in SQL, but you are planning to analyze it and I doubt it that SQL is sufficient for that. Looks like you are going to have a few hundred entries after a year, so I'd recommend to stay with plain tab-separated text.

Also: I am not sure what are you going to do but you might like this great book.
ochichinyezaboombwa
 
Posts: 200
Joined: Tue Jun 04, 2013 7:53 pm

Re: best format for data for python analysis

Postby micseydel » Sat Jun 15, 2013 6:44 pm

OOP isn't for data storage.

JSON rocks, but...

TSV does seem like a really straighforward way to store this data. sqlite in my mind seems like overkill for something like this. I'd rather keep it portable with TSV or JSON.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 923
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: best format for data for python analysis

Postby portia » Sat Jun 15, 2013 10:41 pm

Thanks for your replies. Based on your recommendations, I'll go with TSV. I know OOP is not for storing data but wouldn't it make sense to organise my program to treat an entry as a class and then have, eg:
entry.date
entry.list1, etc.?

Just asking. I'm neither a python nor OOP expert. Just thought OOP would be suitable in this case.
portia
 
Posts: 17
Joined: Sun Apr 14, 2013 10:03 pm

Re: best format for data for python analysis

Postby micseydel » Sun Jun 16, 2013 8:04 am

OOP is a way of programming. You can represent OOP's objects as lines in a TSV, JSON objects, SQL (I believe). Even XML although now that JSON is popular XML is less useful (though certainly not useless).
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 923
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA


Return to General Discussions

Who is online

Users browsing this forum: Bing [Bot] and 1 guest