I read line from a file like:
The Little Big Things: 163 Wege zur Spitzenleistung (Dein Leben) (German Edition) (Peters, Tom)
Die virtuelle Katastrophe: So führen Sie Teams über Distanz zur Spitzenleistung (German Edition) (Thomas, Gary)
I read / encode them with:
title = line.encode('utf8')
but the output is:
b'Die virtuelle Katastrophe: So fxc3xbchren Sie Teams xc3xbcber Distanz zur Spitzenleistung (German Edition) (Thomas, Gary)'
b'The Little Big Things: 163 Wege zur Spitzenleistung (Dein Leben) (German Edition) (Peters, Tom)'
Why is the "b'" always added? How do I properly read the files so that the "Umlauts" are preserved?
Here is the complete relevant code snippet:
# Parse the clippings.txt file
lines = [line.strip() for line in codecs.open(config['CLIPPINGS_FILE'], 'r', 'utf-8-sig')]
for line in lines:
line_count = line_count + 1
if (line_count == 1 or is_title == 1):
# ASSERT: this is a title line
#title = line.encode('ascii', 'ignore')
title = line.encode('utf8')
prev_title = 1
is_title = 0
note_type_result = note_type = l = l_result = location = ""
continue
thanks
Aucun commentaire:
Enregistrer un commentaire