Crawling at scale
High-volume extraction that stays up under load: millions of requests, shifting targets, fragile sites, and the chaos of the open web, with observability so you know when something breaks.
Crawling · unblocking · systems
Technical Lead and Architect at Crawlbase. Shipping spiders, bypass layers, and pipelines that survive the messy open web.
On the side of the desk: helping teams ship data pipelines that touch billions of pages — not only fetchers and parsers, but the full stack that tackles blocks: WAFs, fingerprints, CAPTCHAs, proxy routing, and the kind of targeted bypass work that turns a hard “no” into a reliable feed.
End-to-end work: I build spiders and the systems that unblock them — custom layers for hard targets, pragmatic bypass paths when the work is allowed, proxy orchestration, and clean structured delivery you can plug straight into prod.
Cumulative numbers across engineering programs and client projects. Need crawlers, unblocking, or both? Hit Say hello.
Crawlers, unblocking, and the glue in between: day job, consulting, and side experiments.
High-volume extraction that stays up under load: millions of requests, shifting targets, fragile sites, and the chaos of the open web, with observability so you know when something breaks.
Spiders and resilient fetchers that ship real data: parsing, scheduling, retries, and the hard edge where bot-detection actually hurts, not demo scripts that stop at hello world.
The systems that get you past the wall: WAFs, fingerprint and session tricks, smarter proxy and header strategy, CAPTCHA paths where allowed, and pragmatic custom layers when an off-the-shelf scraper is not enough.
Models plus agentic flows: tools, planners, and retries that summarize messy pages, structure extraction, and run multi-step crawl and research tasks without a human in every loop.
Glue code, internal tools, and HTTP APIs in whatever stack fits: wiring crawlers, unblocking jobs, and backends so teams can trigger, observe, and trust the pipeline.
Ingest, clean, enrich, store, and deliver, so scraped and automated output lands in queues, warehouses, or products people actually use.

About
I’ve spent the better part of a decade building crawling systems from the ground up, but the interesting part is rarely only the spider: it’s the layers that unblock stubborn targets, adapt when defenses change, and still hand you clean, structured data.
At Crawlbase, I lead architecture for large-scale crawling and anti-bot mitigation: proxy orchestration, bypass and recovery paths, retry logic, and pipelines that tie fetch, unblock, and deliver together. Before that, I shipped complete projects for clients across e-commerce, real estate, finance, and travel, end-to-end: spiders, custom unblocking work, automations, scheduling, and production-ready datasets.
On the side, I experiment with AI agents and agentic workflows, where models genuinely help extraction and research, vs. where they’re noise on top of a broken fetch path.
From teams across crawling, SaaS, databases, APIs, and consulting engagements.
Crawling & Data“We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.”
Crawling & Data“Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers; he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.”
Crawling & Data“Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work: when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.”
Database & HA“Our MySQL replicas were drifting 45 seconds behind under write-heavy load. Jamal restructured the replication topology, tuned InnoDB buffer pools, and moved the hottest tables to a dedicated cluster. Lag dropped to sub-second and stayed there through Black Friday.”
SaaS & Platform“We were building multi-tenant SaaS from a single-tenant Rails monolith. Jamal designed the tenant isolation layer, data partitioning scheme, and migration path. We onboarded 200 tenants without a single data leak or downtime window.”
API & Performance“P95 latency on our public API had crept to 1.8 seconds. Jamal profiled the hot paths, added Redis caching with smart invalidation, restructured three N+1 query patterns, and pushed us down to 180ms. Customers noticed before we even announced it.”
Database & HA“After a major outage took our PostgreSQL primary down for two hours, we brought Jamal in to build a proper HA setup. Patroni cluster, automated failover, WAL archiving to S3, and a runbook the on-call team actually follows. We've had three hardware failures since then. Zero downtime.”
Crawling & Data“We needed real estate listings from 14 fragmented MLS sources, each with different auth, pagination, and anti-scraping. Jamal built adapters for all of them, a unified schema, and deduplication logic. Our agents got a single clean feed for the first time ever.”
SaaS & Platform“Moving from a monolith to event-driven microservices felt impossible with our team size. Jamal carved the domain boundaries, set up Kafka topics with proper schemas, and migrated the first three services while keeping the monolith running. The rest of the team picked up the pattern and kept going.”
Crawling & Data“We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.”
Crawling & Data“Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work: when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.”
SaaS & Platform“We were building multi-tenant SaaS from a single-tenant Rails monolith. Jamal designed the tenant isolation layer, data partitioning scheme, and migration path. We onboarded 200 tenants without a single data leak or downtime window.”
Database & HA“After a major outage took our PostgreSQL primary down for two hours, we brought Jamal in to build a proper HA setup. Patroni cluster, automated failover, WAL archiving to S3, and a runbook the on-call team actually follows. We've had three hardware failures since then. Zero downtime.”
SaaS & Platform“Moving from a monolith to event-driven microservices felt impossible with our team size. Jamal carved the domain boundaries, set up Kafka topics with proper schemas, and migrated the first three services while keeping the monolith running. The rest of the team picked up the pattern and kept going.”
Crawling & Data“Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers; he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.”
Database & HA“Our MySQL replicas were drifting 45 seconds behind under write-heavy load. Jamal restructured the replication topology, tuned InnoDB buffer pools, and moved the hottest tables to a dedicated cluster. Lag dropped to sub-second and stayed there through Black Friday.”
API & Performance“P95 latency on our public API had crept to 1.8 seconds. Jamal profiled the hot paths, added Redis caching with smart invalidation, restructured three N+1 query patterns, and pushed us down to 180ms. Customers noticed before we even announced it.”
Crawling & Data“We needed real estate listings from 14 fragmented MLS sources, each with different auth, pagination, and anti-scraping. Jamal built adapters for all of them, a unified schema, and deduplication logic. Our agents got a single clean feed for the first time ever.”
Crawling & Data“We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.”
Database & HA“Our MySQL replicas were drifting 45 seconds behind under write-heavy load. Jamal restructured the replication topology, tuned InnoDB buffer pools, and moved the hottest tables to a dedicated cluster. Lag dropped to sub-second and stayed there through Black Friday.”
Database & HA“After a major outage took our PostgreSQL primary down for two hours, we brought Jamal in to build a proper HA setup. Patroni cluster, automated failover, WAL archiving to S3, and a runbook the on-call team actually follows. We've had three hardware failures since then. Zero downtime.”
Crawling & Data“Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers; he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.”
SaaS & Platform“We were building multi-tenant SaaS from a single-tenant Rails monolith. Jamal designed the tenant isolation layer, data partitioning scheme, and migration path. We onboarded 200 tenants without a single data leak or downtime window.”
Crawling & Data“We needed real estate listings from 14 fragmented MLS sources, each with different auth, pagination, and anti-scraping. Jamal built adapters for all of them, a unified schema, and deduplication logic. Our agents got a single clean feed for the first time ever.”
Crawling & Data“Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work: when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.”
API & Performance“P95 latency on our public API had crept to 1.8 seconds. Jamal profiled the hot paths, added Redis caching with smart invalidation, restructured three N+1 query patterns, and pushed us down to 180ms. Customers noticed before we even announced it.”
SaaS & Platform“Moving from a monolith to event-driven microservices felt impossible with our team size. Jamal carved the domain boundaries, set up Kafka topics with proper schemas, and migrated the first three services while keeping the monolith running. The rest of the team picked up the pattern and kept going.”
Issues, merge requests, commits, and code review across internal GitLab and private GitHub.
18,067 contributions in the last year