lundi 13 juin 2016

How to get the raw page source?

Consider a URL like https://groups.yahoo.com/api/v1/groups/concatenative/messages/300 . This is an application/json response:

enter image description here

I'd like to access the JSON from Selenium. (I'm using Selenium because I need to access private groups and I didn't want to deal with figuring out how to login via mechanicalsoup or something of the sort.) However, getting the page source gives me the way the browser is presenting the JSON, not the JSON itself:

>>> self.br.driver.page_source
'<html xmlns="http://www.w3.org/1999/xhtml"><head><link title="Wrap Long Lines" href="resource://gre-resources/plaintext.css" type="text/css" rel="alternate stylesheet" /></head><body><pre>{"ygPerms":{"resourceCapabilityList":[{"resourceType":"GROUP","capabilities":[{"name":"READ"},{"name":"JOIN"}]},{"resourceType":"PHOTO","capabilities":[]},{"resourceType":"FILE","capabilities":[]},{"resource ...

Note that the JSON is wrapped in some HTML and pre elements.

How can I get just the JSON, directly? It seems hacky to get the contents of the <pre> in the <body> since I don't know how the browser may choose to represent this JSON response in the future.

Aucun commentaire:

Enregistrer un commentaire