Strings

A place where you can post Python-related tutorials you made yourself, or links to tutorials made by others.

Strings

Postby metulburr » Tue Feb 12, 2013 2:14 pm

String Quotes

Single and Double Quotes

String literals can be enclosed in either two single or two double quotes. The reason for this is so you can use one type inside the other to print out the inner quote. Even though you could use an escape sequence to do the same, this option is still available.
Code: Select all
>>>"that's"
"that's"
>>>'that"s'
'that"s'


Concatenation
concatenate = the act of putting two objects end to end
You can concatenate adjacent string literals. You can put a "+" between them, or wrap them in parenthesis allowing to span multiple lines.
Code: Select all
>>> s = "hello" 'there' "what"
>>> s
'hellotherewhat'


Triple Quotes

A multiline text data that can be of as many lines as you need. It can be either double quotes """ """ or single quotes ''' '''. Single or double quotes inside the triple quote do not have to be escaped, but can be. It is useful for error messages. HTML code as strings, documentation, disable partial code temporarily.

Code: Select all
>>> triple = '''This is
... some gibberish that
... I have came up with.'''
>>> triple
'This is\nsome gibberish that\nI have came up with.'



Escape Sequences

Back slashes are used to allow special byte codes: Escape Sequences.The backslash "\" and the following character(s) are replaced by a single character which has a binary value specified by the character(s).

Code: Select all
>>> s = '1\n2\t3\n\\'
>>> s
'1\n2\t3\n\\'
>>> print(s)
1
2   3
\
>>> len(s)
7


This string gives this result because of the escape sequences in it. The 2 is on a newline because of \n, the 3 is tabbed because of the \t, and the backslash also on a newline and is shown because we escaped the backslash itself with \\. The built-in function len() returns the length of the string which is showing 7 and not 11 because escape sequences are one character and thus is showing 7 bytes.

Raw Strings

An r preceding the string will make it a raw string. This will ignore the escape sequences inside the string. Otherwise the code below would have a \n and a \t in it.
Code: Select all
filer = open(r'C:\new\text.txt','w')



Indexing and slicing

One of the most used features of Python is indexing and slicing. These are the acts of accessing the characters in the string by position, assuming you are doing it to a string. This can also be done to a list of elements.

Code: Select all
>>> s = 'index'
>>> s
'index'
>>> s[0]
'i'
>>> s[-1]
'x'


The first index s[0] gets the item at offset 0 from the left, while s[-1] gets the item from offset 1 from the end.

Slicing returns an entire section instead of one single item.
Code: Select all
>>> s = 'slice'
>>> s
'slice'
>>> s[1:]
'lice'
>>> s[1:3]
'li'
>>> s[:-1]
'slic'
>>> s[::-1]
'ecils'

You could think of it like this: s[START:END]. Give me the string omitting before and up to, not including START index ... and omit END and after it. If no START, then give me all the way from beginning, if no END give me all the way to the end. Slicing will get easier the more you use it.

Not as if you are already confused, but there is also a third index in the slice s[START:END:STEP]. This STEP index will allow you to get the sliced string by skipping STEP. For example, a step of 2 will give you every other index based on your values of START and END.
Code: Select all
>>> s = 'abcdefghijklmnopqrstuvwxyz'
>>> s
'abcdefghijklmnopqrstuvwxyz'
>>> s[2:20:2]
'cegikmoqs'

This slice is saying we want index 2 through 19 at intervals of 2.

In the same token we can use this to reverse the order of the string.
Code: Select all
>>> s = '123456789'
>>> s
'123456789'
>>> s[::-1]
'987654321'



String Conversion
str(), int(), and repr()
Code: Select all
>>> '1' + 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't convert 'int' object to str implicitly


Were you expecting 2? Were you expecting '2'? Were you expecting '11'? Python doesn't know which one you want...a string...or an integer. To bypass, you need to either convert the '1' to an int 1, or convert the int 1 to a string '1'. Converting both to strings, however will get you the string '11' Converting both to integers will get you 2. The built in functions str() will convert to a string and the int() will convert to an integer.

Code: Select all
>>> int('1') + 1
2
>>> str(1) + '1'
'11'


The built in function repr() will convert to as-code string.

Code: Select all
>>> print((str('stringer'), repr('stringer')))
('stringer', "'stringer'")


Character Code Conversions
ord() and chr()
Every character on a computer has an ASCII integer code. The built in functions ord() will convert the single character to its ASCII integer, while chr() will do the opposite and convert the ASCII integer to it's single character.
Code: Select all
>>> ord('a')
97
>>> chr(97)
'a'

So If you want to progress to the next character via ASCII...
Code: Select all
>>> a = 'a'
>>> num = ord(a) + 1
>>> chr(num)
'b'


String Changing
Because strings are an immutable sequence we cannot change the index of a string by reassigning a new index, like so:
Code: Select all
>>> s = 'stringer'
>>> s[0] = 'S'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment


To do this we can use a slice to change and reassign it back to the same variable.
Code: Select all
>>> s = 'stringer'
>>> s = 'S' + s[1:]
>>> s
'Stringer'


String Methods
str.replace()
As it sounds like, the string replace method will replace one character or a set of characters with another.
Code: Select all
>>> s = 'light'
>>> s.replace('l','f')
'fight'
>>> s.replace('l','br')
'bright'
>>> s
'light'
>>> s = s.replace('li','bou')
'bought'

The above code shows how to replace a character or many characters. However the replace method does not change the original variable until it is reassigned back to the variable.

There are at times when you would want to replace x amount of the next encounters, not all.
Code: Select all
>>> s = 'XXyyXXyyxxYYxxYY'
>>> s
'XXyyXXyyxxYYxxYY'
>>> s.replace('xx','oo')
'XXyyXXyyooYYooYY'
>>> s
'XXyyXXyyxxYYxxYY'
>>> s.replace('xx','oo', 1)
'XXyyXXyyooYYxxYY'
>>> s.replace('x','o', 3)
'XXyyXXyyooYYoxYY'


str.find()
The string method find(), will return the index of what is searched for. So the character 'd' is index 7 of the string.
Code: Select all
>>> s = 'somerandomstring'
>>> indexer = s.find('d')
>>> indexer
7
>>> s[:indexer]
'someran'
>>> s[indexer:]
'domstring'


If however what is searched for is not found. -1 is returned.
Code: Select all
>>> s.find('X')
-1


So if you try to use it in a slice without testing if it failed or not, you will be essentially inserting -1 in place of where the str.find() method goes.
Code: Select all
>>> s[:s.find('X')]
'somerandomstrin'


str.capitalize(), str.title(), str.lower(), str.upper(), str.swapcase()
All of these methods effect the outcome of the capitalization of words inside of a string. The method capitalize() will capitalize the first character of the first word. The method title() will capitalize every first character of every word in the string. The method lower() will make all characters lowercase in the string. The method upper() will make all characters uppercase in the string. The method swapcase() will swap uppercase characters for lowercase characters and vice versa in the string.

Code: Select all
>>> s = 'the great beyond of the stars'
>>> s.capitalize()
'The great beyond of the stars'
>>> s.title()
'The Great Beyond Of The Stars'
>>> s2 = s.title()
>>> s2
'The Great Beyond Of The Stars'
>>> s2.swapcase()
'tHE gREAT bEYOND oF tHE sTARS'
>>> s.upper()
'THE GREAT BEYOND OF THE STARS'


str.split()
The method split() will chop the string into substrings into a list.
Code: Select all
>>> s = 'Another random string created.'
>>> s
'Another random string created.'
>>> s.split()
['Another', 'random', 'string', 'created.']

At which point you can splice and index the list.
Code: Select all
>>> s.split()[0]
'Another'
>>> s.split()[1:]
['random', 'string', 'created.']
>>> s.split('random')
['Another ', ' string created.']

You can also split by specific characters, which be default is whitepace. Here we split the string by commas instead.
Code: Select all
>>> s = 'one,word,123'
>>> s.split()
['one,word,123']
>>> s.split(',')
['one', 'word', '123']


str.join()
The method join() will join, for example, a list of all the indexes together into a string based on what you join them with.
Code: Select all
>>> s = ['Another', 'random', 'string', 'created.']
>>> s
['Another', 'random', 'string', 'created.']
>>> ''.join(s)
'Anotherrandomstringcreated.'
>>> ' '.join(s)
'Another random string created.'
>>> 'XX'.join(s)
'AnotherXXrandomXXstringXXcreated.'

There are numerous string methods available. Check >>>help(str) to view them in the python interpreter. There is also a very common str method that I gave its own section for due to it's complexity: the str.format(). This section can be found in string expressions and formatting.

Format Expression and Format Method
There are two ways to format strings. The first is the Expression (which it's character is known for as %) and the second is the newer technique: the format method str.format().

Format Expressions
The % operator when applied to strings provides a way to format a string according to it's format definition. The way it is applied is like so:
Code: Select all
>>> s = 'There are %i %s to format a string' % (2, 'ways')
>>> s
'There are 2 ways to format a string'

When you plug in more than two values into the string they need to be within parenthesis. The %i gets replaced by the integer 2 and the %s gets replaced by string 'ways'. If you do not input the correct definition for the corresponding value you will get an error. The below code shows an example of attempting to input two integers where one of them is a string.
Code: Select all
>>> 'There are %i %i to format a string' % (2, 'ways')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: %i format: a number is required, not str


Format Method
Unlike the format expressions, you can use the format method to plug in any data to a string without specifying it's definition.
Code: Select all
>>> 'some data: {} {} {}'.format(1.4, 5, 'stringer')
'some data: 1.4 5 stringer'

As you can see. The type of data it is, does not matter. Here, the {} indicate the definition of whatever is in the order of the format()'s arguments. By default empty curly brackets {} plug format()'s arguments into the string at their position they are in format(). You can also change this order of postion.
Code: Select all
>>> 'some data: {2} {1} {0}'.format(1.4, 5, 'stringer')
'some data: stringer 5 1.4'

You can also plug the values in by keyword.
Code: Select all
>>> temp = '{header} {body}, and good{footer}'
>>> temp.format(header='hello', body='Metul Burr', footer='bye metulburr')
'hello Metul Burr, and goodbye metulburr'

Or you can do them both by position and keyword.
Code: Select all
site = '{w3}{0}{tld}'
>>> site.format('metulburr',w3='www.',tld='.com')
'www.metulburr.com'


Specific Formatiing
Yes. It can get more complicated. The format for {} inside a string is: {fieldname!convertionflag:formatspec}
  • Fieldname is a number or keyword naming an argument, followed by optional ".name" attribute or "[index]" component references.
  • Conversionflag can be r, s, or a to call repr, str, or ascii built in functions on the value.
  • Formatspec specifies how the value should be presented (width, alignment, padding, decimal precision, etc.)
Within the formatspec a subcategory of: [[fill]align][sign][#][0][width][.precision][typecode] gives the specifications of how it should be presented. Align can be <, >, =, or ^, for the left alignment, right alignment, padding after a sign character, or centered alignment. The formatspec also contains a nested {} format strings with field name only.
Code: Select all
>>> num = 3.141592653589793
>>> '{:.2f}'.format(num)
'3.14'
>>> '{:.3f}'.format(num)
'3.142'
>>> '{:010.3f}'.format(num)
'000003.142'
>>> '{:10.2f}'.format(num)
'      3.14'
>>> '{:30}'.format(num)
'             3.141592653589793'
>>> '{:<30}'.format(num)
'3.141592653589793             '
>>> '{:,}'.format(123456789)
'123,456,789'
>>> '{0:b}'.format(255)
'11111111'
'
Last edited by metulburr on Fri Oct 18, 2013 7:34 pm, edited 1 time in total.
Reason: modified wording
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1413
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Return to Tutorials

Who is online

Users browsing this forum: No registered users and 2 guests