lundi 13 juin 2016

read line with .encode with utf8

I read line from a file like:

The Little Big Things: 163 Wege zur Spitzenleistung (Dein Leben) (German Edition) (Peters, Tom)

Die virtuelle Katastrophe: So führen Sie Teams über Distanz zur Spitzenleistung (German Edition) (Thomas, Gary)

I read / encode them with:

title = line.encode('utf8')

but the output is:

b'Die virtuelle Katastrophe: So fxc3xbchren Sie Teams xc3xbcber Distanz zur Spitzenleistung (German Edition) (Thomas, Gary)'

b'The Little Big Things: 163 Wege zur Spitzenleistung (Dein Leben) (German Edition) (Peters, Tom)'

Why is the "b'" always added? How do I properly read the files so that the "Umlauts" are preserved?

Here is the complete relevant code snippet:

# Parse the clippings.txt file
lines = [line.strip() for line in codecs.open(config['CLIPPINGS_FILE'], 'r', 'utf-8-sig')]
for line in lines:
    line_count = line_count + 1
    if (line_count == 1 or is_title == 1):
        # ASSERT: this is a title line
        #title = line.encode('ascii', 'ignore')
        title = line.encode('utf8')
        prev_title = 1
        is_title = 0
        note_type_result = note_type = l = l_result = location = ""
        continue

thanks

Aucun commentaire:

Enregistrer un commentaire