Advanced Usage of Requests

In requests, it is true that using the get() or post() methods directly can emulate a web request, but these are actually two different sessions, equivalent to opening two different pages with two browsers that do not share cookies. Session maintenance is equivalent to opening a new page on the original browser, so that you don’t have to set cookies every time – this is the Session object.

s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get("http://httpbin.org/cookies")
print(r.text)

{
  "cookies": {}
}

Here we requested a test site, set a cookie with the name num and the content 123456, and then launched a request to get cookies, but the result did not get the first requested cookie.

Imagine a common scenario: after I log in to a website, when I click on a function inside, I don’t need to log in again, why? Because after the login operation, a Session is established between the browser and the server, and when I request the server again in the same browser, I share the same Session, so I don’t need to login again. So what if I use code to request it? According to the above example, my request will not share the Session twice, so I can’t implement this scenario. Requests’ sessions can do this.

Let’s look at another example.

session = requests.Session()
session.get('http://httpbin.org/cookies/set/num/123456')
res = session.get('http://httpbin.org/cookies')
print(res.text)

{
  "cookies": {
    "num": "123456"
  }
}

In this example, we use Session object request, the cookie set in the first request, we can still get in the second request, indicating that the two requests are in the same Session.

Authentication
When accessing a website, we often encounter pages that require authentication, where we need to enter a username and password to log in to the site. In this case, we can use the authentication feature that comes with Requests.

import requests
from requests.auth import HTTPBasicAuth

res = requests.get('http://www.facebook.com', auth=HTTPBasicAuth('username', 'password'))
print(res.status_code)

200

If the username and password are correct, it will succeed and return a 200 status code. Otherwise, a 401 status code is returned.

SSL Certificate Validation
Nowadays, you can see https start websites everywhere, Requests can verify SSL certificates for HTTPS requests, just like a web browser. To check the SSL certificate of a host, you can use the verify parameter.

import requests

r = requests.get('https:
print(r.text)

If you want to check and verify the SSL certificate of a host, set verify to True and if the certificate is invalid, it will report
SSLError error if the certificate is invalid. If you want to skip the check, set the verify parameter to False. The verify parameter defaults to True, so you will need to set this variable manually if necessary.

Proxy settings
For some sites, it may be possible to get content normally if requested a few times. Once you start crawling, for large-scale frequent requests, the site may pop up the verification code, or jump to the login authentication, or block the IP, resulting in a certain period of time can not be accessed. At this point, you need to set the proxy also to solve this problem, we need to use the proxies parameter.

proxies = { 'http': 'http: 'https': 'https: } requests.get('http: 

Both addresses here are made up and are for example purposes only. If you want to run it you need to change to a valid proxy.

SOCKS
In addition to the basic HTTP proxy, Request supports a proxy for the SOCKS protocol. This is an optional feature, but to use it you need to install a third-party library. You can use pip to get the dependencies:

$ pip install requests[socks]

Once the dependencies are installed, using the SOCKS proxy is as easy as using the HTTP proxy:

proxies = {
    'http': 'socks5:
    'https': 'socks5:
}

Timeout Settings
In the basic usage of Rquests, we introduced the usage of timeout, which is configured by using the timeout parameter. For example.

r = requests.get('https://github.com', timeout=5)

We know that an HTTP request will have two parts, connect and read, and the example above sets the timeout for both combined. To formulate them separately, we need to pass in a tuple of:

r = requests.get('https://github.com', timeout=(3.05, 27))

If the remote server is slow, you can assign None to the timeout if you want the Request to keep waiting for the server to return.

r = requests.get('https://github.com', timeout=None)

Summary
In this article, we’ve covered a few advanced features of Requests. By mastering these features, we’ve basically mastered the common functions of Requests and can use Requests to solve real-world problems. This is the end of our introduction to Requests. The rest is up to you to practice.

Be the first to comment

Leave a Reply

Your email address will not be published.


*