Retrieving sequence data from other file

This is the place for queries that don't fit in any of the other categories.

Retrieving sequence data from other file

Postby hexfigaro » Tue Oct 01, 2013 8:25 pm

Hello experts ,

I am new to programming and will need your help.. I have 2 very large files with the following format:

FILE1:
>MLP1019 PL4
>MLP7456 PL3
>MLP9268 PL9
>MLP6245 PL1

FILE2:
>MLP1019
STNAPLQTSNTWVSYQPSMMMSLQ
>MLP7456
PPYWYWNSAVMIFYVQPLSLLAVLLA
>MLP9268
WNANWLSPQUVSTQYWFFWFQALN
>MLP6245
TTANPLQYAVWWVSLIFIFPPALQMIF

Does anyone know how I can make an output that looks like below. I need to have the ">MLP____", "PL_", and the sequence corresponding to them.

OUTPUT:
>MLP1019 PL4
STNAPLQTSNTWVSYQPSMMMSLQ
>MLP7456 PL3
PPYWYWNSAVMIFYVQPLSLLAVLLA
>MLP9268 PL9
WNANWLSPQUVSTQYWFFWFQALN
>MLP6245 PL1
TTANPLQYAVWWVSLIFIFPPALQMIF

Thanks so much in advance!
Last edited by micseydel on Tue Oct 01, 2013 9:51 pm, edited 2 times in total.
Reason: First post lock.
hexfigaro
 
Posts: 3
Joined: Tue Oct 01, 2013 8:18 pm

Re: Retrieving sequence data from other file

Postby micseydel » Tue Oct 01, 2013 9:52 pm

Are the two files always in the same order? What have you tried? How much Python do you know? Are you looking to learn how to do this, or do you want someone to do it for you? We have a jobs forum if you want it done for you.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1223
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Retrieving sequence data from other file

Postby hexfigaro » Tue Oct 01, 2013 10:19 pm

Yes, the two files are always in the same order.

I have tried this but it doesn't seem to work.

Code: Select all
from Bio import SeqIO
import csv
dict = SeqIO.index("FILE2.fasta", "fasta")
csvReader = csv.reader(open('FILE1.txt', 'rb'), delimiter="\t")
csvWriter = csv.writer(open('OUTPUT.fasta', 'wb'), delimiter="\n")
error_file = open("errors.txt", 'a')
for line in csvReader:
   try:
      hits = str(dict[line[0]].seq)
      csvWriter.writerow([line[0], line[1]], hits)
   except KeyError:
      message = "Error with %s, not found in hits file\n" % line[0]
      message = "%s\n" % line[2]
      error_file.write(message)
hexfigaro
 
Posts: 3
Joined: Tue Oct 01, 2013 8:18 pm

Re: Retrieving sequence data from other file

Postby micseydel » Tue Oct 01, 2013 10:33 pm

I think you're way overcomplicating it
Code: Select all
with open('FILE1') as file1, open('FILE2') as file2, open('output', 'w') as output:
    pairs = zip(*[file2, file2]) # take two lines of file2 at a time
    for pl_line, (ignored_line, sequence) in zip(file1, pairs):
        output.write(pl_line)
        output.write(sequence)

I think this should work. If the file is very large, and you're using Python 2, add this line at the top
Code: Select all
from itertools import izip as zip

This does assume a great deal of uniformity and cohesion between the files though.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1223
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Retrieving sequence data from other file

Postby hexfigaro » Wed Oct 02, 2013 4:48 pm

Thank you so much! It works.
hexfigaro
 
Posts: 3
Joined: Tue Oct 01, 2013 8:18 pm


Return to General Coding Help

Who is online

Users browsing this forum: No registered users and 4 guests