問題表現:

響應頭中有gbk編碼的中文,導致requests無法解碼讀取header。

http包如圖:

Python 3.4.3 (default, Aug 25 2017, 16:49:50)
 [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import requests
 >>> res = requests.get('http://down.chinaz.com/download.asp?id=35&dp=1&fid=22&f=yes',headers={'Referer':'http://down.chinaz.com/soft/12162.htm'},allow_redirects=False)
 Traceback (most recent call last):
 File "", line 1, in 
 File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 72, in get
 return request('get', url, params=params, **kwargs)
 File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 58, in request
 return session.request(method=method, url=url, **kwargs)
 File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 510, in request
 resp = self.send(prep, **send_kwargs)
 File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 655, in send
 r._next = next(self.resolve_redirects(r, request, yield_requests=True, **kwargs))
 File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 125, in resolve_redirects
 url = self.get_redirect_target(resp)
 File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 116, in get_redirect_target
 return to_native_string(location, 'utf8')
 File "/usr/local/lib/python3.4/site-packages/requests/_internal_utils.py", line 25, in to_native_string
 out = string.decode(encoding)
 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 28: invalid continuation byte
 >>>

直接導致無法請求,該問題google也找不到相關問題,因為大部人遇到的都是請求成功的響應編碼問題,而這個問題是請求時即報錯。

經過測試python2.7是沒有該問題的

從ipython 中可以看出是這一段錯誤:

usr/local/lib/python3.4/site-packages/requests/sessions.py in get_redirect_target(self, resp)
 114 if is_py3:
 115 location = location.encode('latin1')
 --> 116 return to_native_string(location, 'utf8')
 117 #return location
 118

那么對比下python 2.7 與python3.4 的requests底層代碼可以看出差別:

python3.4 requests中獲取響應location代碼;

默認全部使用ut8解碼

python 2.7代碼:

 

再看下 get_redirect_target函數:

 

基本可以確認為python3.4 中獲取location時默認使用了utf-8解碼,然而如果location是中文gbk編碼,那么就會出現文中一開始出現的報錯。

臨時的解決方法可以將utf-8改為 GBK,另外以下兩處也需要修改,用于請求location的地址:

您的支持將鼓勵我們繼續創作!

[微信] 掃描二維碼打賞

[支付寶] 掃描二維碼打賞