mercredi 13 juillet 2016

Is it possible to hook up a more robust HTML parser to Python mechanize?

I am trying to parse and submit a form on a website using mechanize, but it appears that the built-in form parser cannot detect the form and its elements. I suspect that it is choking on poorly formed HTML, and I'd like to try pre-parsing it with a parser better designed to handle bad HTML (say lxml or BeautifulSoup) and then feeding the prettified, cleaned-up output to the form parser. I need mechanize not only for submitting the form but also for maintaining sessions (I'm working this form from within a login session.)

I'm not sure how to go about doing this, if it is indeed possible.. I'm not that familiar with the various details of the HTTP protocol, how to get various parts to work together etc. Any pointers?

Aucun commentaire:

Enregistrer un commentaire