Displaying in different bases

This is the place for queries that don't fit in any of the other categories.

Displaying in different bases

Postby KHarvey » Fri Apr 26, 2013 6:13 pm

I am working on a "conversion" app to help me to translate measurements into different measurements. Just something semi-simple for my own personal use.
So far I have built around half of the app and things appear to be working out thanks to Yoriz.

The part of the app that I am attempting at this point is the base conversions. I know conversion isn't the proper term as I am just displaying the same information in different bases.
I've written this little test script that I have been playing around with, and I think it is accurate, but I'm not really sure. Some of the Base64 converters that I have used give me different info than what I get from my tests, so I am a little concerned. But all the other base conversions appear to be working...I think.
Code: Select all
base_names = {
        2:"Binary"
      , 3:"Ternary"
      , 4:"Quaternary"
      , 5:"Quinary"
      , 6:"Senary"
      , 7:"Septenary"
      , 8:"Octal"
      , 9:"Nonary"
      , 10:"Decimal"
      , 11:"Undenary"
      , 12:"Duodecimal"
      , 16:"Hexadecimal"
      , 17:"Septendecimal"
      , 19:"Decennoval"
      , 20:"Vigesimal"
      , 30:"Trigesimal"
      , 32:"Duotrigesimal"
      , 40:"Quadragesimal"
      , 50:"Quinquagesimal"
      , 60:"Sexagesimal"
      , 64:"Base 64"
}

for key, value in base_names.iteritems():
   num_test = 96
   basenum = key
   array = []
   while num_test != 0:
      base_remainder = num_test % basenum
      #If statements to determine proper plac in chr table (11 != a)
      if base_remainder < 10:
         array.append(chr(48 + base_remainder))
      elif base_remainder < 36:
         array.append(chr(55 + base_remainder))
      elif base_remainder < 133:
         array.append(chr(61 + base_remainder))
      num_test = num_test // basenum
   print value, ''.join([str(var) for var in array])[::-1]

base_num = 64
test_val = "1W"
test_len = 0
array2 = []
for val in test_val[::-1]:
   if ord(val) > 96:
      array2.append((ord(val) - 61) * (base_num ** test_len))
   elif ord(val) > 64:
      array2.append((ord(val) - 55) * (base_num ** test_len))
   elif ord(val) > 47:
      array2.append((ord(val) - 48) * (base_num ** test_len))
   test_len = test_len + 1
print sum(array2)


I will turn this into a class, and make some variable name changes to it. I will also add on the prefixes and suffixes so that 0o is octal and 0b is binary and == is Base64 and so on. I just wanted to get something working.

I decided against using binascii as it would only be able to do binary, octal, decimal, and hexadecimal, and I definitely wanted to have duotrigesimal and base 64. Also rather than using two different pieces of code to do the same conversions I decided to just write one small math script to do all the conversions. This gives me the capability to do any base conversions (13, 43, etc..) if I so choose.

My questions would be, is this script accurate? Is the a better way to do this? (like a function that I don't know about)
KHarvey
 
Posts: 34
Joined: Tue Mar 19, 2013 5:13 pm
Location: US

Re: Displaying in different bases

Postby casevh » Sat Apr 27, 2013 6:16 am

You are implementing radix conversion - displaying a number using an arbitrary radix. Base64 encoding is not quite the same as displaying a number in radix-64 format.

Instead of using a sequence of if statements to convert base_remainder to a character, I would create a string containing all the characters and then index the string. Something like:

Code: Select all
# Make the script work with both Python 2.6, 2.7, and 3.x.
from __future__ import print_function

# Only valid for radix 2 through 16.
char_table = "0123456789ABCDEF"

def num_to_str(value, radix):

    if radix > len(char_table):
        raise ValueError('radix is longer than char_table string')
    if radix < 2:
        raise ValueError('radix is less than 2')
    if value < 0:
        raise ValueError('value is negative')

    digits = []
    while True:
        value, digit = divmod(value, radix)
        digits.append(char_table[digit])
        if not value:
            break

    return ''.join(reversed(digits))

if __name__ == '__main__':
    # some tests
    fmt_str = '{0} in decimal is {2} in radix-{1}'
    print(fmt_str.format(0,2,num_to_str(0,2)))
    print(fmt_str.format(123,8,num_to_str(123,8)))
    print(fmt_str.format(123,16,num_to_str(123,16)))

Just a couple of comments about the script.

Many (not all) of the new features in Python 3 are available in Python 2.x by magically importing the new feature from the future. With a little care, may Python programs can easily run with both Python 2 and 3.

If you run the script from the command line, it recognizes that it is the main program and it run a few simple tests. But you can also import in another program.
casevh
 
Posts: 70
Joined: Sat Feb 09, 2013 7:35 am

Re: Displaying in different bases

Postby KHarvey » Mon Apr 29, 2013 2:31 pm

casevh wrote:You are implementing radix conversion - displaying a number using an arbitrary radix. Base64 encoding is not quite the same as displaying a number in radix-64 format.

Instead of using a sequence of if statements to convert base_remainder to a character, I would create a string containing all the characters and then index the string. Something like:

Code: Select all
# Make the script work with both Python 2.6, 2.7, and 3.x.
from __future__ import print_function

# Only valid for radix 2 through 16.
char_table = "0123456789ABCDEF"

def num_to_str(value, radix):

    if radix > len(char_table):
        raise ValueError('radix is longer than char_table string')
    if radix < 2:
        raise ValueError('radix is less than 2')
    if value < 0:
        raise ValueError('value is negative')

    digits = []
    while True:
        value, digit = divmod(value, radix)
        digits.append(char_table[digit])
        if not value:
            break

    return ''.join(reversed(digits))

if __name__ == '__main__':
    # some tests
    fmt_str = '{0} in decimal is {2} in radix-{1}'
    print(fmt_str.format(0,2,num_to_str(0,2)))
    print(fmt_str.format(123,8,num_to_str(123,8)))
    print(fmt_str.format(123,16,num_to_str(123,16)))



Thanks for the hint with the index of the string. That shortened my code quite a bit.

You said that there is a difference between radix64 and Base64. I have done a bit of searching and I have not been able to find what that difference is. From what I can tell and have read radix-64 and Base64 is synonymous. Can you give a little more info on what the difference is?

Here is the code I am using now for the encoding:
Code: Select all
base_names = {
        2:"Binary"
      , 3:"Ternary"
      , 4:"Quaternary"
      , 5:"Quinary"
      , 6:"Senary"
      , 7:"Septenary"
      , 8:"Octal"
      , 9:"Nonary"
      , 10:"Decimal"
      , 11:"Undenary"
      , 12:"Duodecimal"
      , 16:"Hexadecimal"
      , 17:"Septendecimal"
      , 19:"Decennoval"
      , 20:"Vigesimal"
      , 30:"Trigesimal"
      , 32:"Duotrigesimal"
      , 40:"Quadragesimal"
      , 50:"Quinquagesimal"
      , 60:"Sexagesimal"
      , 64:"Base 64"
}

char_table = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/"

for key, value in base_names.iteritems():
   num_test = 96
   basenum = key
   array = []
   while num_test != 0:
      base_remainder, digit = divmod(num_test, basenum)
      array.append(char_table[digit])
      num_test = num_test // basenum
   print ''.join(reversed(array))


I will change over my decode section in a bit.
KHarvey
 
Posts: 34
Joined: Tue Mar 19, 2013 5:13 pm
Location: US

Re: Displaying in different bases

Postby casevh » Wed May 01, 2013 6:33 am

Sorry about the delay in responding. Sometimes work gets in the way....

In your code example, I would re-write this section:

Code: Select all
   while num_test != 0:
      base_remainder, digit = divmod(num_test, basenum)
      array.append(char_table[digit])
      num_test = num_test // basenum


as

Code: Select all
   while num_test != 0:
      num_test, digit = divmod(num_test, basenum)
      array.append(char_table[digit])

divmod() returns both the quotient and remainder and will be more efficient, especially as the size of the numbers increase.

In the following, I am using your char_table to map values between 0 and 63 to a displayable character. I am using ^ for exponentiation.

In the previous answer, I commented that Base64 encoding is not the same as displaying a number in radix-64 format. When you display a number N in radix-B format, you are actually writing the number as N = ... d3*B^3 + d2*B^2 + d1*B + d0. The digits are assigned a character for ease of display. In radix-64 format, the number 0 is display as "0", the number 63 is displayed as "/", the number 64 is displayed as "10", etc.

Base64 encoding is defined for a sequence of bytes. Whether or not the bytes represent how a computer stores an integer, a string, or an MP3 file is irrelevant. The Base64 encoding maps three bytes (24 bits) into 4 bytes (32 bits). The original sequence of 24 bits is split into 4 groups of 6 bits, and then each grouping of 6 bits is converted into a displayable character. This is similar to radix-64 display of a number (6-bit values are converted to a displayable character) but there are some important differences.

The character mapping table does not begin with 0. A sequence of 6 0-bits is mapped to "A".

If the length of the input is not a multiple of 3, then one or two "=" symbols are added to the end of the Base64 encoded value to pad the length of the encoded value to a multiple of 4. If present, these special symbols need to be handled when decoding a value. In contrast, a radix-64 encoding of an integer is complete when all the bits in the integer are consumed.

The following example illustrates the encoding of a sequence of one, two, or three 0-bytes:

Code: Select all
>>> base64.b64encode(b'\00')
b'AA=='
>>> base64.b64encode(b'\00\00')
b'AAA='
>>> base64.b64encode(b'\00\00\00')
b'AAAA'



Note that "0b", "0x", etc., are not part the of radix-N encoding. They are just prepended to the beginning of the string so we can recognize the string. In Base64 encoding, the "=" symbols are part of the encoded value and are not used to identify the string as a Base64 encoding (because they aren't present if the length of the original data is a multiple of 3).

Historical note

Why was Base64 encoding invented? Many years ago, before the Internet as we know it existed, modems and serial ports ruled the world. Data transmission wasn't very reliable so a parity bit was used to provide rudimentary error checking. If even parity was specified, then the parity bit would be set to either 0 or 1 so that the number of 1-bits in the "character" field and parity bit would sum to an even number. 7 data bits plus a parity bit (either even or odd) was common. This made it difficult to send bytes where 8 data bits were required. Base64 encoding results in bytes of data that don't use the 8-th bit so that position could be used for parity.

Historical note #2

Some of the early word processors would set the 8th bit if the next character in the document was a space character. This would save 1 byte for every word in a document. Back then, those savings were valuable.
casevh
 
Posts: 70
Joined: Sat Feb 09, 2013 7:35 am


Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot] and 4 guests