Crawling at scale
Extraction pipelines that stay reliable under load: millions of requests, shifting targets, and the usual chaos of the open web.
Crawling · systems · experiments
Technical Lead and Architect at Crawlbase. Shipping pipelines that survive the messy open web.
On the side of the desk: helping folks ship crawlers that touch billions of pages. Fix what breaks, route smarter proxies, and bolt on AI without the demo-day fairy dust.
I help teams crawl at ridiculous scale: unblock targets, fix broken pipelines, tame proxies, and wire in AI where it actually earns its keep. Counters creep a bit every day (and quietly all day long), because crawl ops never really sleep.
Illustrative counters: the shape of the work, not an audited filing. Want your own line to move for real? Hit Say hello and drop the messy URL.
Things that show up in my day job, consulting, and side projects.
Extraction pipelines that stay reliable under load: millions of requests, shifting targets, and the usual chaos of the open web.
Resilient fetchers, parsers, and the practical work around bot-detection: systems that actually ship data, not excuses.
Models plus agentic flows: tools, planners, and retries that summarize messy pages, structure extraction, and run multi-step crawl and research tasks without a human in every loop.
Anti-bot, WAFs, fingerprinting, CAPTCHAs: pragmatic paths when the work is allowed and you need clean data, not excuses.
Glue code, internal tools, and HTTP APIs in whatever stack fits: wiring crawlers, jobs, and backends so teams can trigger and observe work reliably.
Ingest, clean, enrich, store, and deliver, so scraped and automated output lands in queues, warehouses, or products people actually use.