jeudi 30 juin 2016

JSON loads fails on invaild escape characters [duplicate]

This question already has an answer here:

I am getting a large data file from external service, where each line is a json object. However, it contains multiple hex characters like (xef,xa0,xa9) etc and some unicode characters like (u2022) .I am basically reading the file like

with open(filename,'r') as fh:
    for line in fh:
        attr = json.loads(line)

I tried giving encoding utf-8 and latin-1 to the open method, but still json loads is failing. If the invalid characters are removed then loads is working, but I don't want to lose any data. What's the recommended way to fix this ?

repr(line) sample:

'{"product_type":"SHOES","recommended_browse_nodes":"361208011","item_name":["Citygate  960561 Ankle Boots Womens  Gray Grau (anthrazit 9) Size: 8 (42 EU)"],"product_description":[],"brand_name":"Citygate","manufacturer":"J H P\xf6lking GmbH & Co KG","bullet_point":[],"department_name":"Women\u2019s","size_name":"42 EU","material_composition":["Leather"]}n'

json.loads is failing at xf6 in item_name with Invalid escape: line 1 column 105 (char 104) .

Aucun commentaire:

Enregistrer un commentaire