Postby älgkött » Fri Mar 15, 2013 6:21 pm


In the program I'm working on I'm supposed to first check if the file exists and also check or validate if the data in the text file is reasonable. The check if the file exists I use the os.path and this seems to work, but any ideas what and how I can validate the data?

I guess my question is if there are any "standard things" one should check when importing data from file?

My text file includes both number and text and looks like this:



and so on..

def read(filename):
    if os.path.exists(filename):
Postby setrofim » Fri Mar 15, 2013 6:48 pm

älgkött wrote:I guess my question is if there are any "standard things" one should check when importing data from file?

Not really, since what's "valid" depends entirely on the application. From looking at your example format, you seem to have records that are delimited by blank lines. So you split the data you read in on a blank line and then for each record you check that
  • There are five lines in a record.
  • The first line is text. You don't really need to validate that as it will be read as text to begin with, but you might wanna check that it has certain properties (see below).
  • The subsequent four lines are integers.
There are other things you might want to check for depending on how you want to use the data and what the constraints are, e.g.
  • Does the next need to be in a certain format (e.g. are spaces allowed)?
  • Does it have a maximum length?
  • Is it ASCII-only or can it contain, e.g. accented characters?
  • Do the integers have to be within a certain range (e.g. are negative values allowed)?
  • Are they always integer, or can they be decimal numbers?
The list goes on. Basically, you need to think carefully about what assumptions your application makes about the input data and then check that those assumptions are not violated.
