I have written a basic web scraper in Python using the lxml and JSON libraries. The below code snippet details how I currently write to CSV:
with open(filepath, "ab") as f:
write = csv.writer(f)
try:
write.writerow(["allhomes",
statenum,
statesubnum,
suburbnum,
listingnum,
listingsurlstr,
'', # fill this in! should be 'description'
node["state"],
node["suburb"],
node["postcode"],
node["propertyType"],
node["bathrooms"],
node["bedrooms"],
node["parking"],
pricenode,
node["photoCount"],
node2["pricemin"],
node2["pricemax"],
node2["pricerange"]])
except KeyError, e:
try:
write.writerow(["allhomes",
statenum,
statesubnum,
suburbnum,
listingnum,
listingsurlstr,
'', # fill this in! should be 'description'
node["state"],
node["suburb"],
node["postcode"],
node["propertyType"],
'',
node["bedrooms"],
node["parking"],
pricenode,
node["photoCount"],
node2["pricemin"],
node2["pricemax"],
node2["pricerange"]])
except KeyError, e:
errorcount += 1
with open(filepath, "ab"): #
write = csv.writer(f)
write.writerow(["Error: invalid dictionary field key: %s" % e.args,
statenum,
statesubnum,
suburbnum,
listingnum,
listingsurlstr])
pass
pass
The problem is such that if a certain node does not exist (most commonly the Bathrooms node) I have to try again by replacing the Bathrooms node with a blank value, or subsequently give up the entire row of data. My current approach is to try again and write the row by removing the Bathrooms node, but this is messy (and does not fix KeyErrors with other nodes).
How can I pass over writing a single node in this situation if it does not exist or does not contain any data, without sacrificing the whole entry?
Many thanks.
Aucun commentaire:
Enregistrer un commentaire