UCI format

This is the place for queries that don't fit in any of the other categories.

UCI format

Postby chliu52912 » Mon Sep 30, 2013 10:43 am

I try the following code to parse uci dataset (http://www.cs.gsu.edu/~zding/research/i ... 11data.txt) into a DataFrame for manipulation latter on. However, I notice that split using white space shows that the columns length is not equal per line, like output below. What is the correct way to parse uci format? Any tool can help load uci format dataset into ndarray or dataframe?

Thanks

Code: Select all
... # how many columns per line
xxxx len(ary): 95
xxxx len(ary): 100
xxxx len(ary): 98
xxxx len(ary): 94
xxxx len(ary): 98
xxxx len(ary): 93
xxxx len(ary): 98
xxxx len(ary): 101
xxxx len(ary): 98
...




Code: Select all
def uci_to_files(file_name):
  lines = [line.strip() for line in open(file_name)]
  rows = []
  for line in lines:
    row = []
    ary = line.split(' ')
    print "xxxx len(ary): %d" % len(ary)
    for col in ary:
      column_value = col.split(':')
      if len(column_value) == 2:
        if column_value[1] is None:
          v = float(0)
        else:
          v = float(column_value[1])
      elif len(column_value) == 1:
        if column_value[0] == '+1':
          v = float(1)
        else:
          v = float(column_value[0])
      row.append(v)
    rows.append(row)
  return rows
Last edited by Mekire on Mon Sep 30, 2013 1:29 pm, edited 1 time in total.
Reason: First post lock
chliu52912
 
Posts: 3
Joined: Mon Sep 30, 2013 10:35 am

Re: UCI format

Postby micseydel » Mon Sep 30, 2013 4:46 pm

I don't have the time at the moment to take a close look, but the csv module may solve this problem for you.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1358
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: UCI format

Postby ochichinyezaboombwa » Tue Oct 01, 2013 5:30 pm

The numbers are different simply because not all fields are present in each line. For example, in line 2 of your file,
Code: Select all
50:
71:
79:
90:
91:
are not there.

Such is just the nature of the data you have to deal with. But I don't see any real problem with that; depending on your needs, you might just ignore the fact (as each column has its number:) or fill in the missing columns with some garbage, like:
Code: Select all
71:*
ochichinyezaboombwa
 
Posts: 200
Joined: Tue Jun 04, 2013 7:53 pm

Re: UCI format

Postby chliu52912 » Fri Oct 04, 2013 9:50 am

You are right. By complementing the missing fields I can create/read the data correctly now.

Thank you very much.

ochichinyezaboombwa wrote:The numbers are different simply because not all fields are present in each line. For example, in line 2 of your file,
Code: Select all
50:
71:
79:
90:
91:
are not there.

Such is just the nature of the data you have to deal with. But I don't see any real problem with that; depending on your needs, you might just ignore the fact (as each column has its number:) or fill in the missing columns with some garbage, like:
Code: Select all
71:*
chliu52912
 
Posts: 3
Joined: Mon Sep 30, 2013 10:35 am


Return to General Coding Help

Who is online

Users browsing this forum: Baidu [Spider], Bing [Bot] and 2 guests