In the internet-driven digital age, the flow and interaction of data have become the lifeblood of software systems. Whether integrating with third-party APIs to obtain real-time data, building web crawlers to scrape public information, or submitting forms and uploading files to remote servers, the HTTP protocol remains the foundational cornerstone of data exchange. The Requests library in the Python ecosystem has become the de facto standard for handling HTTP requests due to its minimalist API design, powerful functionality, and cross-platform compatibility. It simplifies the complex processes of network requests, allowing developers to focus on implementing business logic without worrying about low-level socket communication or cookie management details. From prototype development in startups to microservice architectures in large enterprises, Requests plays a crucial role in the code of tens of millions of developers worldwide, serving as a bridge connecting local programs to the web.
1. Introduction to the Library: Core Role in Real Life
The Requests library was developed by Kenneth Reitz in 2011 to address the cumbersome and limited functionality of Python’s built-in urllib module API. It supports all HTTP methods (GET/POST/PUT/DELETE, etc.), has built-in automatic decompression for GZip/Brotli and other compression formats, supports cookie persistence, session management, HTTPS verification, proxy settings, and other advanced features, while providing user-friendly response objects that allow direct access to JSON data, raw binary content, or text encoding. In practical scenarios:
- API data retrieval: Calling weather APIs (like OpenWeatherMap), map APIs (like Gaode Map), e-commerce APIs (like JD Open Platform) to obtain structured data
- Web crawler development: Serving as a foundational dependency for crawling frameworks like Scrapy, or independently implementing simple data scraping (like news site content collection)
- Form submission: Simulating user login (handling sessions and cookies), submitting search requests or order data
- File operations: Downloading remote images/videos, uploading local files to servers
- Microservice communication: The preferred tool for HTTP calls between services in distributed systems
2. Installing the Library
Installing Requests is very convenient and can be done using Python’s official package manager pip:
pip install requests
For scenarios requiring HTTPS certificate verification or OAuth authentication, you can install the extension package:
pip install requests[security] # Includes SSL verification and OAuth support
Verify successful installation:
import requests
print(requests.__version__) # Outputs the current version number, e.g., 2.28.2
3. Basic Usage
1. Initiating a Basic GET Request
Retrieving webpage content is the most common scenario for Requests. The following code demonstrates how to fetch the Douban Movie Top 250 page and parse the response:
import requests
# Send GET request
response = requests.get("https://movie.douban.com/top250")
# Check response status code (200 indicates success)
if response.status_code == 200:
# Get text content (automatically handles encoding)
html_content = response.text
print(f"Page size: {len(html_content)} bytes")
# Get binary content (suitable for image/file downloads)
# content = response.content
# with open("page.html", "wb") as f:
# f.write(content)
2. Handling Request Parameters and URL Construction
Passing query parameters through params avoids security issues caused by manually concatenating URLs (like special character escaping):
# Search keyword "Python programming", page 3 of Baidu search
url = "https://www.baidu.com/s"
params = {
"wd": "Python编程",
"pn": 20 # 10 items per page, pn=20 indicates page 3
}
response = requests.get(url, params=params)
print(f"Actual request URL: {response.url}") # Outputs the automatically concatenated correct URL
3. Sending POST Requests and Form Data
When submitting form data (like a login form), use the data parameter to pass key-value pairs, and for JSON format data, use the json parameter:
# Simulating login (form data)
login_url = "https://example.com/login"
form_data = {
"username": "[email protected]",
"password": "secure_password"
}
response = requests.post(login_url, data=form_data)
# Sending JSON data (commonly used in API interfaces)
api_url = "https://api.example.com/data"
json_data = {"key": "value", "array": [1, 2, 3]}
response = requests.post(api_url, json=json_data) # Automatically sets Content-Type to application/json
4. Handling Response Headers and Cookies
Accessing response header information or managing cookie sessions:
# Get server information from response headers
server = response.headers.get("Server", "Unknown Server")
print(f"Server: {server}")
# Save cookies to Session object (cross-request persistence)
session = requests.Session()
session.get("https://example.com/login") # First request to get cookies
session.post("https://login.example.com", data=form_data) # Carry cookies to initiate login request
4. Advanced Usage
1. Session Management and Connection Pooling
Using the Session object to maintain session state avoids repeated handling of cookies and TCP connections, improving performance:
with requests.Session() as session:
# Set common request headers (like User-Agent)
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"
})
# Send multiple requests, sharing cookies and connection pool
response1 = session.get("https://api.example.com/data1")
response2 = session.get("https://api.example.com/data2")
2. Proxy Settings and HTTPS Verification
Handling network environments that require proxies or custom HTTPS certificate verification:
# Using HTTP/HTTPS proxies
proxies = {
"http": "http://user:[email protected]:8080",
"https": "https://user:[email protected]:8080"
}
response = requests.get("https://www.example.com", proxies=proxies)
# Ignore insecure HTTPS certificates (for testing environments only)
response = requests.get("https://insecure.example.com", verify=False)
# Specify local certificate files
response = requests.get("https://api.example.com", cert=("client.crt", "client.key"))
3. Timeout Control and Retry Mechanism
Avoid long request blocking, combining requests-adapters to implement automatic retries:
# Basic timeout settings (connection timeout 3 seconds, read timeout 5 seconds)
response = requests.get("https://api.example.com", timeout=(3, 5))
# Configure retry strategy (requires requests-retry installed)
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
session = requests.Session()
retries = Retry(total=3, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
session.mount("https://", HTTPAdapter(max_retries=retries))
response = session.get("https://flaky.example.com")
4. File Upload and Chunked Transfer
Support for multi-file uploads and large file chunk processing:
# Single file upload (form style)
upload_url = "https://example.com/upload"
with open("report.pdf", "rb") as f:
files = {"file": ("report.pdf", f, "application/pdf")}
response = requests.post(upload_url, files=files)
# Chunked upload of large files (custom chunk size)
def upload_large_file(url, file_path, chunk_size=1024*1024):
with open(file_path, "rb") as f:
while chunk := f.read(chunk_size):
response = requests.post(url, data=chunk, headers={"Content-Range": f"bytes {f.tell()-len(chunk)}-{f.tell() - 1}/{os.path.getsize(file_path)}"})
if response.status_code != 206: # 206 indicates partial reception
break
5. Practical Application Scenarios
1. Real-time Weather Query Tool
Using the OpenWeatherMap API to obtain city weather data, combined with Requests to implement a simple command-line tool:
import requests
API_KEY = "your_openweathermap_api_key"
CITY = "Beijing"
url = f"https://api.openweathermap.org/data/2.5/weather?q={CITY}&appid={API_KEY}&units=metric"
response = requests.get(url)
data = response.json()
print(f"{CITY} Weather: {data['weather'][0]['description']}")
print(f"Temperature: {data['main']['temp']}°C, Humidity: {data['main']['humidity']}%")
2. E-commerce Price Monitoring Bot
Regularly scraping product page prices and sending email notifications when prices drop below a threshold:
import requests
from bs4 import BeautifulSoup
import smtplib
from email.mime.text import MIMEText
def check_price():
url = "https://item.jd.com/123456.html"
headers = {
"User-Agent": "Mozilla/5.0 ...",
"Cookie": "your_cookie" # Contains login state cookie
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
price = float(soup.find("span", class_="price-num").text.strip())
if price < 999:
send_email(f"Price dropped! Current price: {price} yuan")
def send_email(content):
msg = MIMEText(content)
msg["Subject"] = "Product Price Monitoring Notification"
msg["From"] = "[email protected]"
msg["To"] = "[email protected]"
with smtplib.SMTP("smtp.example.com", 587) as server:
server.starttls()
server.login("[email protected]", "password")
server.send_message(msg)
# Scheduled task (check once an hour)
import schedule
import time
schedule.every().hour.do(check_price)
while True:
schedule.run_pending()
time.sleep(1)
3. News Aggregation Platform Data Collection
Scraping the latest news from multiple news websites and displaying them after deduplication:
import requests
from bs4 import BeautifulSoup
def crawl_news(site_url):
response = requests.get(site_url)
soup = BeautifulSoup(response.text, "html.parser")
news_list = []
for article in soup.find_all("article", class_="news-item"):
title = article.find("h2").text.strip()
link = article.find("a")["href"]
news_list.append({"title": title, "url": link})
return news_list
# Aggregating multiple sources
sources = [
"https://news.ycombinator.com",
"https://www.nytimes.com/section/technology"
]
all_news = []
for source in sources:
all_news.extend(crawl_news(source))
# Deduplication (based on title hash)
seen = set()
unique_news = []
for news in all_news:
title_hash = hash(news["title"])
if title_hash not in seen:
seen.add(title_hash)
unique_news.append(news)
The Requests library, with its design philosophy of being “the only non-GMO Python HTTP library,” truly realizes the goal of “serving humanity with HTTP rather than torturing it.” It not only simplifies the technical barriers of network requests but also, through comprehensive documentation and active community support, has become an essential tool for Python developers. Whether quickly validating API interfaces or building complex distributed data interaction systems, Requests can provide stable and reliable solutions. If you have encountered interesting application scenarios during your use or wish to learn more advanced techniques (such as building microservices with FastAPI, asynchronous request optimization, etc.), feel free to share and discuss in the comments section, and let us explore the infinite possibilities of Requests in the web world together.