I am starting to gather some lexical data that I'd like to analyse with python later on. It'll mostly do with:
-occurence of words
- if another word appeared withing 2-3 lines of the matched word
- if a word is found, is there another word in a neighbouring entry.
- finding unique values
- and some simple calculations (on dates)
That's for next year. First I'm trying to figure out the most optimal way of storing it for further analysis. Sometimes there will be a few entries per day. Sometimes there won't be any entry over a couple of days so the amount of data will not be massive. After a year, I'm going to start analysing it so I want to get the first steps right.
I think "visually" the data will be comprised of the following lines/entries apart from date, they are all strings, not numbers:
- Code: Select all
date, string, short_dictionary, list_A, list_B, long_string
What's the best way of organising/storing this data in python? eg, using classes (OOP) and/or sqlite, json, xml?