How to split up a string into a list, 5 characters per chunk

This is the place for queries that don't fit in any of the other categories.

How to split up a string into a list, 5 characters per chunk

Postby johnick013 » Mon Feb 25, 2013 3:42 pm

Hi, I'm doing an exercise for bioinformatics. In the exercise I have to split a gene sequence, which is in the form of a string, into base groups of 5. So for example:
Code: Select all
s='GTAGTACGAATTTGAGCAAA'

and then I want my output to be in a form of a list:
Code: Select all
l=['GTAGT','ACGAA','TTTGA','GCAAA']

But I have absolutely no idea how to do this. Please help! :D
Last edited by Yoriz on Thu Feb 28, 2013 7:03 pm, edited 2 times in total.
Reason: Added code tags, Changed title
johnick013
 
Posts: 1
Joined: Mon Feb 25, 2013 3:36 pm

Re: How to split up a string

Postby zeycus » Mon Feb 25, 2013 4:58 pm

Admins will probably tell you to read this:
http://www.python-forum.org/viewtopic.php?f=6&t=145
You should use code tags, and most important, show your attempts to solve the problem.
Image

Live long and prosper.
Spock
User avatar
zeycus
 
Posts: 23
Joined: Sun Feb 17, 2013 10:30 am
Location: Madrid

Re: How to split up a string

Postby Yoriz » Mon Feb 25, 2013 6:02 pm

Here is a recursive solution, it will take any length of string, when there is less then 5 left for a group it will use whatever is left for the last list item, which my or may not be want you want to happen.
Code: Select all
string = 'GTAGTACGAATTTGAGCAAA'


def chunk_five(string):
    return [string[:5]] + chunk_five(string[5:]) if string else []

print chunk_five(string)

['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Image
User avatar
Yoriz
 
Posts: 1161
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

Re: How to split up a string

Postby micseydel » Mon Feb 25, 2013 8:32 pm

While recursion is neat, it's not efficient, and I'm not sure that list concatenation is either. Below I have an iterator solution which will work for a string of greater length than 5000, and which is significantly less likely to get you a MemoryError too.
Code: Select all
>>> from itertools import izip
>>> def chunk_five(iterable):
   my_it = iter(iterable)
   return izip(*[my_it]*5)

>>> chunk_five('GTAGTACGAATTTGAGCAAA')
<itertools.izip object at 0x7f4390034248>
>>> list(chunk_five('GTAGTACGAATTTGAGCAAA'))
[('G', 'T', 'A', 'G', 'T'), ('A', 'C', 'G', 'A', 'A'), ('T', 'T', 'T', 'G', 'A'), ('G', 'C', 'A', 'A', 'A')]
>>>
>>> ]
>>> def chunk_five(iterable):
   my_it = iter(iterable)
        # if getting back strings instead of tuples is important
   return (''.join(five) for five in izip(*[my_it]*5))

>>> list(chunk_five('GTAGTACGAATTTGAGCAAA'))
['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1491
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: How to split up a string

Postby ichabod801 » Mon Feb 25, 2013 8:51 pm

While recursion and iterators are nice, aren't they a bit high level? Why not just use slicing?

Code: Select all
genes = 'GTAGTACGAATTTGAGCAAA'
fives = [genes[start:(start + 5)] for start in range(0, len(genes), 5)]


Even list comprehensions might be above beginner level, so I might even put it in a loop:

Code: Select all
genes = 'GTAGTACGAATTTGAGCAAA'
fives = []
for start in range(0, len(games), 5):
   fives.append(genes[start:(start + 5)])
Craig "Ichabod" O'Brien
Minimalist, buddhist, theist, and programmer
Current languages: Python, SAS, and C++
Previous serious languages: R, Java, VBA, Lisp, HyperTalk, BASIC
ichabod801
 
Posts: 96
Joined: Sat Feb 09, 2013 12:54 pm
Location: Outside Washington DC

Re: How to split up a string

Postby Yoriz » Mon Feb 25, 2013 8:56 pm

Here's is another go.
Code: Select all
string = 'GTAGTACGAATTTGAGCAAA'


def yield_chunk_five(string):
    while string:
        yield string[:5]
        string = string[5:]

print list(yield_chunk_five(string))

['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Image
User avatar
Yoriz
 
Posts: 1161
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

Re: How to split up a string

Postby micseydel » Mon Feb 25, 2013 9:04 pm

What's wrong with high level? The iterator works well for very large samples, which is common with DNA. Also, this person likely isn't someone who need to learn general Python, they're just someone trying to do bioinformatics and so they need to know how to do this one thing.

Yoriz: that solution makes new, potentially big strings every iteration of the loop.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1491
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: How to split up a string

Postby ichabod801 » Mon Feb 25, 2013 9:05 pm

Is this just going to turn into how many ways can we split the string into lenths of five?

Code: Select all
[''.join(word) for word in zip(*[genes[start::5] for start in range(5)])]
Craig "Ichabod" O'Brien
Minimalist, buddhist, theist, and programmer
Current languages: Python, SAS, and C++
Previous serious languages: R, Java, VBA, Lisp, HyperTalk, BASIC
ichabod801
 
Posts: 96
Joined: Sat Feb 09, 2013 12:54 pm
Location: Outside Washington DC

Re: How to split up a string

Postby Yoriz » Mon Feb 25, 2013 9:09 pm

O bugger, i thought it was just chopping 5 off the string each time but i think i see now that its creating a new string that's 5 less then the last, back to the drawing board. :(
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Image
User avatar
Yoriz
 
Posts: 1161
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

Re: How to split up a string

Postby ichabod801 » Mon Feb 25, 2013 9:15 pm

micseydel wrote:What's wrong with high level? The iterator works well for very large samples, which is common with DNA. Also, this person likely isn't someone who need to learn general Python, they're just someone trying to do bioinformatics and so they need to know how to do this one thing.


When teaching I stick to simple. I don't know who this guy is or what the context of his exercise in Bioinformatics is, so I would aim for simple that he is more likely to understand.
Craig "Ichabod" O'Brien
Minimalist, buddhist, theist, and programmer
Current languages: Python, SAS, and C++
Previous serious languages: R, Java, VBA, Lisp, HyperTalk, BASIC
ichabod801
 
Posts: 96
Joined: Sat Feb 09, 2013 12:54 pm
Location: Outside Washington DC

Re: How to split up a string

Postby Yoriz » Mon Feb 25, 2013 9:48 pm

And I'm just a hobbyist python coder that makes up crappy solutions that might help for the time being till some one that knows what there doing comes along.
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Image
User avatar
Yoriz
 
Posts: 1161
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

Re: How to split up a string

Postby snippsat » Mon Feb 25, 2013 11:24 pm

Is this just going to turn into how many ways can we split the string into lenths of five?

Why not ;)
Code: Select all
>>> import re
>>> s = 'GTAGTACGAATTTGAGCAAA'
>>> re.findall(r'.'*5, s)
['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']

Code: Select all
>>> map(None, *([iter(s)] * 5))
[('G', 'T', 'A', 'G', 'T'),
 ('A', 'C', 'G', 'A', 'A'),
 ('T', 'T', 'T', 'G', 'A'),
 ('G', 'C', 'A', 'A', 'A')]
User avatar
snippsat
 
Posts: 293
Joined: Thu Feb 21, 2013 12:04 am

Re: How to split up a string

Postby Yoriz » Tue Feb 26, 2013 12:50 pm

ichabod801 wrote:Is this just going to turn into how many ways can we split the string into lenths of five?

Its already been done to death.
http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Image
User avatar
Yoriz
 
Posts: 1161
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

Re: How to split up a string

Postby Jaro » Tue Feb 26, 2013 6:51 pm

ichabod801 wrote:Is this just going to turn into how many ways can we split the string into lenths of five?

If so, let me drop a few lines:

Code: Select all
>>> import textwrap
>>> split_seq=textwrap.TextWrapper(width=5).wrap
>>> split_seq('GTAGTACGAATTTGAGCAAA')
['GTAGT', 'ACGAA', 'TTTGA', 'GCAAA']
Code: Select all
<function signature at 0xb73f910c>
User avatar
Jaro
 
Posts: 8
Joined: Sat Feb 23, 2013 6:16 pm


Return to General Coding Help

Who is online

Users browsing this forum: Marbelous, Mekire, W3C [Linkcheck] and 3 guests