Compatibility issue: asksaveasfilename and Unicode

Post here if you need help with creating a Graphical User Interface in Python.

Compatibility issue: asksaveasfilename and Unicode

Postby rovf » Mon Apr 28, 2014 3:44 pm

This is a condensed version (i.e. code of no relevance to this question omitted) of a program which runs on Mac OSX under Python 2.3.5, 2.6.1 and 3.4.0:

Code: Select all
# -*- coding: utf-8 -*-
try:
    from tkFileDialog import asksaveasfilename
except ImportError:
    from tkinter.filedialog import asksaveasfilename
pathname=asksaveasfilename(initialdir='.')
if pathname:
    print('Wörterbuch:'+pathname) # This is Line X
    pathname_bak=pathname+'.bak'
    print('Alte Version gesichert auf:'+pathname_bak)


However, running it on Windows XP using Python 2.7 crashes in Line X with the error message 'ascii' codec can't decode byte ..., the culprit being shown as the German umlaut ö in the string 'Wörterbuch:'. My understanding is that asksaveasfilename() returns an unicode string on the Mac, but some different encoding on Windows, and in order to perform the catenation, Python somehow tries to convert the strings to ASCII, which fails, because ö is not representable in ASCII.

Based on this guess, I enhanced the program by placing the following line just before line X:

Code: Select all
    pathname=pathname.encode('utf-8')


Indeed, the program now runs on XP with Python 2.7, and with Python 2.3 and 2.6 on Mac OS.

However, it now ceased to work with Python 3.4 on Mac OS. This time, it crashes in line X with the following error message: Can't convert 'bytes' object to str implicitly.

This confuses me. The right operand is UTF-8 encoded Unicode (because it results from a encode('utf-8')), and the left operand is the same (because it is a string literal, and I have placed a coding: utf-8 in my program).

Now my question: How do I write this piece of program, so that it works on Python 2.3 up to 3.4, on OSX and Windows?

Additional question: At what point(s) did I misunderstand the workings of unicode here?
rovf
 
Posts: 25
Joined: Fri Aug 16, 2013 4:35 pm

Re: Compatibility issue: asksaveasfilename and Unicode

Postby Mekire » Mon Apr 28, 2014 4:10 pm

Try this:
Code: Select all
# -*- coding: utf-8 -*-
from __future__ import unicode_literals

try:
    from tkFileDialog import asksaveasfilename
except ImportError:
    from tkinter.filedialog import asksaveasfilename
pathname=asksaveasfilename(initialdir='.')
if pathname:
    print('Wörterbuch:'+pathname) # This is Line X
    pathname_bak=pathname+'.bak'
    print('Alte Version gesichert auf:'+pathname_bak)

No clue about anything prior to 2.7 for it. Maintaining compatibility between 2.7 and 3.x is as far as I'll try to stretch. Beyond that you should probably think about making version specific versions if it is really necessary.

-Mek
User avatar
Mekire
 
Posts: 988
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: Compatibility issue: asksaveasfilename and Unicode

Postby stranac » Mon Apr 28, 2014 4:20 pm

As an alternative to what Mekire suggested, you can also prepend all the strings you want to be unicode with u, e.g.:
Code: Select all
u'Wörterbuch:'

This will not work on python 3.x prior to 3.3, but there's no good reason anyone should use one of those anyway...

rovf wrote:Additional question: At what point(s) did I misunderstand the workings of unicode here?

  1. Unicode is not an encoding
  2. Strings are unicode by default on python 3, but bytestrings on python 2, so this is not correct:
    rovf wrote:the left operand is the same (because it is a string literal, and I have placed a coding: utf-8 in my program).
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1144
Joined: Thu Feb 07, 2013 3:42 pm

Re: Compatibility issue: asksaveasfilename and Unicode

Postby rovf » Mon May 05, 2014 11:03 am

stranac wrote:As an alternative to what Mekire suggested, you can also prepend all the strings you want to be unicode with u


Now I get confused. I thought that the purpose of having the
Code: Select all
-*- coding: utf-8 -*-
comment in the beginning of the file is to tell Python that all string literals are utf-8 encoded Unicode strings? Or do I fall into the trap again to confuse the idea of encoding with the concept of being an unicode string?

I am aware that Unicode just defines a set of code points, and that there are various encodings defined for it (utf-8, utf-16 etc.). But what, then, is the difference (in Python 2.7 and Python 3 respectively) between
Code: Select all
'some string'
and
Code: Select all
u'some string'
, under the assumption that I have
Code: Select all
-*- coding: utf-8 -*-
enabled in my source file?
rovf
 
Posts: 25
Joined: Fri Aug 16, 2013 4:35 pm


Return to GUI

Who is online

Users browsing this forum: No registered users and 0 guests