What is the biggest fear when writing web scrapers? It’s not a 403 error, but rather getting “banned to the point of questioning life.”
The Denise module combines a high-anonymity proxy pool, fingerprint randomization, and retry strategies into a single chain, allowing requests to flow smoothly like a real user with just three lines of code.
Today, I will show you how to use Denise to cloak your web scraper and safely harvest public data.
1. One-click Skin Change: User-Agent and Fingerprint Randomization
Denise comes with over 3000 real device fingerprints, automatically rotating with each request, so you don’t have to maintain a text file yourself.
The following six lines initialize the browser fingerprint and then access the target site, making you appear as a new phone to the server.
from denise import StealthSession
s = StealthSession() # Automatically fetch the latest fingerprint database
r = s.get('https://httpbin.org/headers')
print(r.json()['headers']['User-Agent']) # Randomly appears as Safari/Chrome
2. Plug-and-Play Proxy Pool: Automatic Removal of Invalid Proxies
Free proxies often drop connections? Denise runs coroutines in the background to check latency, removing any that are down for more than 3 seconds.
The following nine lines throw in 20 assorted proxies and filter out 5 usable IPs within a minute; the code is simpler than washing dishes.
from denise import ProxyPool
pp = ProxyPool(['8.8.8.8:8080','1.1.1.1:3128']*10)
good = pp.check_all(timeout=3) # Returns the list of available proxies
print('Survival Rate:', len(good)) # 5
s = StealthSession(proxy_pool=pp) # Session automatically selects the fastest IP
3. Intelligent Retry: Sleep and Retry on 429
When rate-limited with a 429 response, Denise will retry with exponential backoff and can add random jitter to avoid a “synchronized walk.”
The following ten lines scrape 100 pages of products, automatically switching proxies on failure, all without manually writing try/except; in the time it takes to drink a cup of coffee, the data is stored.
for page in range(1, 101):
r = s.get(f'https://shop.com/item?page={page}',
retry=5, backoff=1.5, on_retry=lambda: print('Switching faces and continuing'))
if r.ok:
save(r.json())
4. Advantages Comparison
Compared to using requests + fake-useragent + a self-written proxy pool, Denise combines fingerprinting, proxies, and retries into a “single chain,” reducing the code by 70%, allowing beginners to get started in just 10 minutes;
The downside is that the package size is 18 MB, which may be a bit bulky for embedded scenarios.
It is recommended to use Denise for rapid prototyping and small to medium-scale scrapers, while considering Scrapy + self-developed pools for daily scraping at the tens of millions level.
5. Conclusion
It’s that simple to cloak your web scraper.
Install Denise into your script tonight and come back to comment: how many valid IPs did you scrape? Let’s share our “stealth” experiences!
Recommended Reading:
- • Pagr, a lightweight Python library!
- • Pydoer, a lightweight and easy-to-use Python module!
- • Expvar, a highly effective Python library!
- • Pybran, a lightweight Python tool!