fuleinist/stealth-fetch

GitHub: fuleinist/stealth-fetch

Stars: 0 | Forks: 0

# stealth-fetch **Rust library for anti-detection HTTP fetching.** Rotate user agents, manage browser fingerprints, and bypass common bot checks — as a lightweight, composable reqwest wrapper. [![Crates.io](https://img.shields.io/crates/v/stealth-fetch)](https://crates.io/crates/stealth-fetch) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ## Why Web scraping with AI agents hits a wall: sites detect automation via UA strings, canvas/WebGL fingerprints, TLS fingerprints, and request patterns. Existing tools like `cloak-browser` bundle a full browser — heavy and hard to embed. `stealth-fetch` is a zero-dependency-on-binary HTTP client that makes requests look human: | Problem | What stealth-fetch does | |---|---| | UA detection | 20 real browser UA strings, round-robin or random | | Canvas/WebGL fingerprints | Generates realistic fingerprint profiles | | Request ordering | Configurable delays, randomized within ranges | | Bot 403/429 responses | Auto-retries with a fresh fingerprint | | robots.txt | Optional compliance layer | ## Install # Cargo.toml [dependencies] stealth-fetch = "0.1" tokio = { version = "1", features = ["full"] } For the synchronous client (uses `ureq`): [dependencies] stealth-fetch = { version = "0.1", features = ["sync"] } ## Quick Start ### Async (reqwest) use stealth_fetch::StealthClient; use std::time::Duration; #[tokio::main] async fn main() -> Result<(), stealth_fetch::StealthError> { let client = StealthClient::builder() .user_agent_pool(true) .fingerprint_session(true) .delay_random(Duration::from_millis(300), Duration::from_millis(1000)) .max_retries(3) .build()?; let resp = client.get("https://example.com").await?; println!("Status: {}", resp.status()); // Rotate to a new fingerprint mid-session client.rotate_fingerprint(); let resp = client.get("https://httpbin.org/headers").await?; println!("Headers: {}", resp.text().await?); Ok(()) } ### Sync (ureq) use stealth_fetch::sync_client::SyncStealthClient; fn main() { let client = SyncStealthClient::new(None); match client.get("https://example.com") { Ok(body) => println!("Got {} bytes", body.len()), Err(e) => eprintln!("Error: {}", e), } } ## API Overview ### `StealthClient::builder()` StealthClient::builder() .user_agent_pool(true) // use built-in pool of 20 UAs .user_agents(vec![...]) // or provide your own .fingerprint_session(true) // persistent fingerprint per session .delay_random(min, max) // random delay range .delay_fixed(duration) // or fixed delay .respect_robots_txt(true) // check robots.txt before requests .max_retries(3) // retry count on 403/429 .timeout(Duration::from_secs(30)) .build() ### Methods let resp = client.get(url).await?; let resp = client.post(url).await?; let resp = client.put(url).await?; let resp = client.delete(url).await?; // Inspect current fingerprint let fp = client.fingerprint(); // Force a new fingerprint (e.g. after a detection event) client.rotate_fingerprint(); // Get next UA in round-robin pool let ua = client.next_ua(); // Get random UA from pool let ua = client.random_ua(); ### UserAgentPool use stealth_fetch::UserAgentPool; let pool = UserAgentPool::default(); // 20 built-in browser UAs let ua = pool.random(); let ua = pool.next_round_robin(); ### Fingerprint use stealth_fetch::Fingerprint; let fp = Fingerprint::random(); // Fields: canvas_seed, webgl_renderer, webgl_vendor, // screen_resolution, color_depth, timezone, // languages, platform, hardware_concurrency, device_memory ### DelayStrategy use stealth_fetch::DelayStrategy; use std::time::Duration; // Fixed DelayStrategy::Fixed(Duration::from_secs(1)) // Random range DelayStrategy::Random { min: Duration::from_millis(300), max: Duration::from_millis(1500), } // Exponential backoff DelayStrategy::Exponential { base: Duration::from_secs(1), max: Duration::from_secs(60), } ### RateLimiter use stealth_fetch::RateLimiter; use std::time::Duration; let limiter = RateLimiter::new(Duration::from_millis(500)); limiter.wait_before_request().await; // async, respects delay ### RobotsTxt use stealth_fetch::RobotsTxt; // Fetch and parse robots.txt let robots_txt = RobotsTxt::parse(&content); if !robots_txt.is_allowed("/api/private/") { return Err("blocked by robots.txt"); } let delay = robots_txt.crawl_delay(); // recommended request interval ## Browser UA Pool The built-in pool includes **20 real UA strings** across browsers and platforms: Chrome 124-122 (Windows, macOS, Linux) Firefox 125-124 (Windows, macOS, Linux) Safari 17.4-17.3 (macOS, iOS) Edge 124-123 (Windows, macOS) Chrome Mobile (Android) Safari Mobile (iOS 17.4, iOS 17.3) UA strings are rotated round-robin by default (ensuring equal distribution) or randomly. ## Fingerprint Fields | Field | Example Values | |---|---| | `canvas_seed` | Random u64 → deterministic canvas hash | | `webgl_renderer` | "ANGLE (NVIDIA GeForce GTX 1060...)", "llvmpipe..." | | `webgl_vendor` | "Google Inc. (NVIDIA)", "Intel Inc.", "AMD" | | `screen_resolution` | (1920, 1080), (2560, 1440), (1366, 768)... | | `color_depth` | 24, 30, 32 | | `timezone` | "America/New_York", "Europe/London", "Asia/Tokyo"... | | `languages` | ["en-US", "en"], ["de-DE", "de"], ["ja-JP", "ja"]... | | `platform` | "Win32", "MacIntel", "Linux x86_64"... | | `hardware_concurrency` | 2, 4, 8, 12, 16 | | `device_memory` | 2, 4, 8, 16 GB | ## Project Structure src/ lib.rs — public API re-exports client.rs — StealthClient async implementation builder.rs — StealthClientBuilder user_agent.rs — UserAgentPool, UA string constants fingerprint.rs — Fingerprint generation headers.rs — HTTP header building delay.rs — DelayStrategy, ConcurrencyLimiter robots.rs — RobotsTxt parser, RateLimiter sync_client.rs — synchronous ureq wrapper error.rs — StealthError types examples/ demo.rs — async demo sync_demo.rs — sync demo ## Design Goals 1. **Composable** — drop into any reqwest-based project; doesn't replace your HTTP stack 2. **Audit-friendly** — no magic; every header and fingerprint value is explicit and readable 3. **Testable** — 12 unit tests covering pool rotation, fingerprint generation, delay strategies, robots.txt parsing 4. **Performant** — zero heap allocation on the hot path; atomic operations for concurrency control ## Build & Test cargo build --release cargo test --lib cargo clippy --lib ## Status Cycle 1 complete. Functional MVP with: - Async + sync clients - UA pool with round-robin and random selection - Fingerprint generation - Retry logic on 403/429 with fingerprint rotation - Rate limiting (fixed, random, exponential) - robots.txt compliance layer - 12 unit tests, all passing
标签:通知系统