Converting row file of various length to column file

This is the place for queries that don't fit in any of the other categories.

Converting row file of various length to column file

Postby abhis1 » Tue Jun 25, 2013 5:26 am

I have a space separated file in the following format --

Code: Select all
001 1234 A_Spend B_Spend C_Spend D_Spend
002 2345 A_Spend E_Spend
003 4567 B_Spend C_Spend D_Spend E_Spend
004 7896 D_Spend E_Spend F_Spend G_Spend H_Spend A_Spend


Where first 2 columns are ID and next variables are spend in the industry. Now i want to create a file such that output looks like following -
Code: Select all
ID1 ID2 A_Spend B_Spend C_Spend D_Spend E_Spend F_Spend G_Spend H_Spend
001 1234    1   1   1   1   0   0   0   0
002 2345    1   0   0   0   1   0   0   0
003 4567    0   1   1   1   1   0   0   0
004 7896    1   0   0   1   1   1   1   1


I am new to python and please help.
Thanks
Last edited by Yoriz on Tue Jun 25, 2013 5:31 am, edited 1 time in total.
Reason: Added code tags to make input and required output stand out
abhis1
 
Posts: 5
Joined: Tue Jun 25, 2013 5:17 am

Re: Converting row file of various length to column file

Postby Yoriz » Tue Jun 25, 2013 5:34 am

You could use the csv module for help with reading and writing to files.
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Spam topic disapproval technician
Windows7, Python 2.7.4., WxPython 2.9.5.0., some Python 3.3
User avatar
Yoriz
 
Posts: 871
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

Re: Converting row file of various length to column file

Postby abhis1 » Tue Jun 25, 2013 5:40 am

Yoriz wrote:You could use the csv module for help with reading and writing to files.


I can use that for reading for file...but i am not sure how i can use that for creating columns and tag using the same
abhis1
 
Posts: 5
Joined: Tue Jun 25, 2013 5:17 am

Re: Converting row file of various length to column file

Postby micseydel » Tue Jun 25, 2013 6:00 am

Have you written any code to attempt this yourself?
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1390
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Converting row file of various length to column file

Postby abhis1 » Tue Jun 25, 2013 7:38 am

micseydel wrote:Have you written any code to attempt this yourself?


Please find the below code that i am using.... it is more of forced way... but i am looking for more elegant way of doing things...
Code: Select all
d = {}
columns = set()
with open('chk1.txt') as f:
    for line in f:
        row = line.split()
        key = row.pop(0)
        key1 = row.pop(0)
        newk=key+' '+key1
        row= row[2:]
        d[newk] = set(row)
        columns.update(row)

columns = sorted(columns)


print('ID1  ID2 {0}'.format(' '.join(columns)))

for newk in sorted(d):
    values = d[newk]
    line = newk.ljust(10)
    line += ''.join(('1' if col in values else '0').ljust(4) for col in columns)
    print(line)
Last edited by micseydel on Tue Jun 25, 2013 7:42 am, edited 1 time in total.
Reason: Added code tags.
abhis1
 
Posts: 5
Joined: Tue Jun 25, 2013 5:17 am

Re: Converting row file of various length to column file

Postby micseydel » Tue Jun 25, 2013 7:46 am

You've made two posts without code tags, which indicates you haven't read this (I added them on your most recent post). Be sure to read it before posting again, and this is very useful for asking questions as well although isn't specific to this forum.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1390
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Converting row file of various length to column file

Postby abhis1 » Tue Jun 25, 2013 10:38 am

abhis1 wrote:
micseydel wrote:Have you written any code to attempt this yourself?


Please find the below code that i am using.... it is more of forced way... but i am looking for more elegant way of doing things... the variable newk is of varying length and i want to define the dynamic length for this variable..
please help
Code: Select all
d = {}
columns = set()
with open('chk1.txt') as f:
    for line in f:
        row = line.split()
        key = row.pop(0)
        key1 = row.pop(0)
        newk=key+' '+key1
        row= row[2:]
        d[newk] = set(row)
        columns.update(row)

columns = sorted(columns)


print('ID1  ID2 {0}'.format(' '.join(columns)))

for newk in sorted(d):
    values = d[newk]
    line = newk.ljust(10)
    line += ''.join(('1' if col in values else '0').ljust(4) for col in columns)
    print(line)
abhis1
 
Posts: 5
Joined: Tue Jun 25, 2013 5:17 am

Re: Converting row file of various length to column file

Postby micseydel » Tue Jun 25, 2013 11:53 am

I thought it was in the link I provided, but was not (and is now): include your current program's output (which you have not done), the output you want (you have done this) as well as your explanation of how the two differ. Your goal is to make it take as little work as possible for us to help you. You don't want us to need to create two files, run your code, and then interpret the results if we don't need to.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1390
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Converting Single row file to multiple row file

Postby abhis1 » Tue Jun 25, 2013 1:06 pm

Hi,

I have a data set in the following format
Code: Select all
001 1234,67569 Spend_A Spend_B Spend_C Spend_D
002 2345 Spend_A Spend_E
003 4567,9089873,9815 Spend_B Spend_C Spend_D Spend_E
004 7896 Spend_D Spend_E Spend_F Spend_G Spend_H Spend_A

And i want the output in following format-
Code: Select all
ID1 ID2 Spend_A Spend_B Spend_C Spend_D Spend_E Spend_F Spend_G Spend_H
001 1234 1 1 1 1 0 0 0 0
001 67569 1 1 1 1 0 0 0 0
002 2345 1 0 0 0 1 0 0 0
003 4567 0 1 1 1 1 0 0 0
003 9089873 0 1 1 1 1 0 0 0
003 9815 0 1 1 1 1 0 0 0
004 7896 1 0 0 1 1 1 1 1


This is the code the that i am using
Code: Select all
d = {}
columns = set()
with open('chk1.txt') as f:
    for line in f:
        row = line.split()
        key1 = row.pop(0)
        key2 = row.pop(0)
        row1=key2.split(',')
        print ( row1)
        lc=key2.count(',',0,len(key2))
        for i in range(0,lc):
             key3=row1.pop(i)
             d[key1,key3] = set(row)
        columns.update(row)

columns = sorted(columns)
print('ID1 ID2  {0}'.format(' '.join(columns)))
for key in sorted(d):
   lc=key2.count(',',0,len(key2))
   lc=lc+1
   for i in range(0,lc):
     key1,key3 = key
     values = d[key]
     line = '{0} {1} '.format(key1, key3)
     line += ' '.join(('1' if col in values else '0').ljust(4) for col in columns)
     print(line)

and the output i am getting is
Code: Select all
ID1 ID2  Spend_A Spend_B Spend_C Spend_D Spend_E Spend_F Spend_G Spend_H
001 1234 1    1    1    1    0    0    0    0
003 4567 0    1    1    1    1    0    0    0
003 98r5 0    1    1    1    1    0    0    0


Please let me know where i am going wrong and how i can rectify that
Last edited by abhis1 on Tue Jun 25, 2013 3:33 pm, edited 2 times in total.
abhis1
 
Posts: 5
Joined: Tue Jun 25, 2013 5:17 am

Re: Converting Single row file to multiple row file

Postby Mekire » Tue Jun 25, 2013 1:21 pm

People really do want to help you; but as has been stated you need to take a hack at it first.

Show your code; show your result (if there is an error show your traceback); explain how your result is different from what you desired.

-Mek
User avatar
Mekire
 
Posts: 988
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: Converting Single row file to multiple row file

Postby ochichinyezaboombwa » Tue Jun 25, 2013 5:45 pm

What I think you needed to do is explain to us:
presence of "Spend_A" should correspond to a "1" in 3d column, absence - to a "0",
presence of "Spend_B" should correspond to a "1" in 4th column, absence - to a "0",
-- etc.

You problem (I think) is that you never attempt to use this requirement in your code.

The following should (I think) help you, - at least give you some ideas:
Code: Select all
for ln in open("chk1.txt"):
    cols = ln.rstrip().split()
    id1 = cols[0]
    ids = cols[1].split(",")

    spend = cols[2:]
    assert all( s.startswith("Spend_") for s in spend)
    spend = [x[len("Spend_"):] for x in spend]

    zeros_or_ones = [int(l in spend) for l in "ABCDEFGH"]

    for id2 in ids:
        print id1, id2, zeros_or_ones

It is in Python 2 but you should be able to make corresponding changes. It produces the correct (I think) output except for formatting: you should be able to format it in Python 3.
Code: Select all
001 1234 [1, 1, 1, 1, 0, 0, 0, 0]
001 67569 [1, 1, 1, 1, 0, 0, 0, 0]
002 2345 [1, 0, 0, 0, 1, 0, 0, 0]
003 4567 [0, 1, 1, 1, 1, 0, 0, 0]
003 9089873 [0, 1, 1, 1, 1, 0, 0, 0]
003 9815 [0, 1, 1, 1, 1, 0, 0, 0]
004 7896 [1, 0, 0, 1, 1, 1, 1, 1]

PS: I repeat "I think" all the time because I had to read your mind to answer your question. I think I read it correctly.
ochichinyezaboombwa
 
Posts: 200
Joined: Tue Jun 04, 2013 7:53 pm

Re: Converting Single row file to multiple row file

Postby Yoriz » Tue Jun 25, 2013 9:06 pm

Practicing test driven programming :P
Here are some tested functions you could put to use(I Think).
Code: Select all
import unittest

spendsFields = ('Spend_A', 'Spend_B', 'Spend_C', 'Spend_D', 'Spend_E',
                'Spend_F', 'Spend_G', 'Spend_H')


def spendsToValue(inputList):
    result = ['1' if spend in inputList else '0' for spend in spendsFields]
    return ' '.join(result)


def formatLine(lineInput):
    id1, id2, spends = lineInput.split(' ', 2)
    spendsValues = spendsToValue(spends.split(' '))
    result = ''
    for id2Item in id2.split(','):
        result = '{}{} {} {}\n'.format(result, id1, id2Item, spendsValues)
    return result


class UnitTests(unittest.TestCase):

    def test_spendsToValue(self):
        inputList = 'Spend_A Spend_B Spend_C Spend_D'.split(' ')
        lineOutput = '1 1 1 1 0 0 0 0'
        self.assertEqual(spendsToValue(inputList), lineOutput)
       
        inputList = 'Spend_A Spend_E'.split(' ')
        lineOutput = '1 0 0 0 1 0 0 0'
        self.assertEqual(spendsToValue(inputList), lineOutput)
       
        inputList = 'Spend_B Spend_C Spend_D Spend_E'.split(' ')
        lineOutput = '0 1 1 1 1 0 0 0'
        self.assertEqual(spendsToValue(inputList), lineOutput)
       
        inputList = 'Spend_D Spend_E Spend_F Spend_G Spend_H Spend_A'.split(' ')
        lineOutput = '1 0 0 1 1 1 1 1'
        self.assertEqual(spendsToValue(inputList), lineOutput)

    def test_formatLine(self):
        lineInput = '001 1234,67569 Spend_A Spend_B Spend_C Spend_D'
        lineOutput = ('001 1234 1 1 1 1 0 0 0 0\n'
                    '001 67569 1 1 1 1 0 0 0 0\n')
        self.assertEqual(formatLine(lineInput), lineOutput)
       
        lineInput = '002 2345 Spend_A Spend_E'
        lineOutput = '002 2345 1 0 0 0 1 0 0 0\n'
        self.assertEqual(formatLine(lineInput), lineOutput)
       
        lineInput = '003 4567,9089873,9815 Spend_B Spend_C Spend_D Spend_E'
        lineOutput = ('003 4567 0 1 1 1 1 0 0 0\n'
                      '003 9089873 0 1 1 1 1 0 0 0\n'
                      '003 9815 0 1 1 1 1 0 0 0\n')
        self.assertEqual(formatLine(lineInput), lineOutput)
       
        lineInput = '004 7896 Spend_D Spend_E Spend_F Spend_G Spend_H Spend_A'
        lineOutput = '004 7896 1 0 0 1 1 1 1 1\n'
        self.assertEqual(formatLine(lineInput), lineOutput)


if __name__ == "__main__":
    unittest.main()

Code: Select all
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Spam topic disapproval technician
Windows7, Python 2.7.4., WxPython 2.9.5.0., some Python 3.3
User avatar
Yoriz
 
Posts: 871
Joined: Fri Feb 08, 2013 1:35 am
Location: UK


Return to General Coding Help

Who is online

Users browsing this forum: Bing [Bot], conor100, Larz60+, W3C [Linkcheck] and 4 guests