Crawling · unblocking · systems

I build crawlers and unblocking systems at scale, and I play with AI.

Technical Lead and Architect at Crawlbase. Shipping spiders, bypass layers, and pipelines that survive the messy open web.

On the side of the desk: helping teams ship data pipelines that touch billions of pages — not only fetchers and parsers, but the full stack that tackles blocks: WAFs, fingerprints, CAPTCHAs, proxy routing, and the kind of targeted bypass work that turns a hard “no” into a reliable feed.

Billions of pages, one stubborn engineer

End-to-end work: I build spiders and the systems that unblock them — custom layers for hard targets, pragmatic bypass paths when the work is allowed, proxy orchestration, and clean structured delivery you can plug straight into prod.

  • ···Pages crawled & parsed across client projects
  • ···HTTP & unblocking journeys: retries, fingerprints, redirects & proxies
  • ···AI extraction, clean-up & structured output passes

Cumulative numbers across engineering programs and client projects. Need crawlers, unblocking, or both? Hit Say hello.

What I’m usually doing

Crawlers, unblocking, and the glue in between: day job, consulting, and side experiments.

  • Crawling at scale

    High-volume extraction that stays up under load: millions of requests, shifting targets, fragile sites, and the chaos of the open web, with observability so you know when something breaks.

  • Crawlers & fetchers

    Spiders and resilient fetchers that ship real data: parsing, scheduling, retries, and the hard edge where bot-detection actually hurts, not demo scripts that stop at hello world.

  • Unblocking & bypass systems

    The systems that get you past the wall: WAFs, fingerprint and session tricks, smarter proxy and header strategy, CAPTCHA paths where allowed, and pragmatic custom layers when an off-the-shelf scraper is not enough.

  • AI & agents

    Models plus agentic flows: tools, planners, and retries that summarize messy pages, structure extraction, and run multi-step crawl and research tasks without a human in every loop.

  • Automations & APIs

    Glue code, internal tools, and HTTP APIs in whatever stack fits: wiring crawlers, unblocking jobs, and backends so teams can trigger, observe, and trust the pipeline.

  • Data pipelines

    Ingest, clean, enrich, store, and deliver, so scraped and automated output lands in queues, warehouses, or products people actually use.

Jamal Awad, portrait sketch
Jamal AwadTech Lead & Architect

About

Crawlers, blocks, and the systems in between

I’ve spent the better part of a decade building crawling systems from the ground up, but the interesting part is rarely only the spider: it’s the layers that unblock stubborn targets, adapt when defenses change, and still hand you clean, structured data.

At Crawlbase, I lead architecture for large-scale crawling and anti-bot mitigation: proxy orchestration, bypass and recovery paths, retry logic, and pipelines that tie fetch, unblock, and deliver together. Before that, I shipped complete projects for clients across e-commerce, real estate, finance, and travel, end-to-end: spiders, custom unblocking work, automations, scheduling, and production-ready datasets.

On the side, I experiment with AI agents and agentic workflows, where models genuinely help extraction and research, vs. where they’re noise on top of a broken fetch path.

  • 10+ years in web crawling & data extraction
  • Billions of pages and countless unblock paths
  • Full stack: spider, bypass, delivery
  • Remote, working with teams worldwide

What people sayCompany names are kept confidential. Most of this feedback came from sensitive crawling and data projects where clients prefer not to disclose the collaboration publicly.

From teams across crawling, SaaS, databases, APIs, and consulting engagements.

Crawling & Data

We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.

Head of Data, EU e-commerce platform40M+ pages/week project
Crawling & Data

Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers; he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.

CTO, price intelligence startupUnblocking & pipeline rescue
Crawling & Data

Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work: when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.

VP Engineering, real estate data companyEnd-to-end crawl & unblock infrastructure
Database & HA

Our MySQL replicas were drifting 45 seconds behind under write-heavy load. Jamal restructured the replication topology, tuned InnoDB buffer pools, and moved the hottest tables to a dedicated cluster. Lag dropped to sub-second and stayed there through Black Friday.

DBA Lead, fintech marketplaceMySQL replication & performance
SaaS & Platform

We were building multi-tenant SaaS from a single-tenant Rails monolith. Jamal designed the tenant isolation layer, data partitioning scheme, and migration path. We onboarded 200 tenants without a single data leak or downtime window.

CEO, HR tech startupMulti-tenant SaaS architecture
API & Performance

P95 latency on our public API had crept to 1.8 seconds. Jamal profiled the hot paths, added Redis caching with smart invalidation, restructured three N+1 query patterns, and pushed us down to 180ms. Customers noticed before we even announced it.

Product Lead, logistics API providerAPI latency reduction
Database & HA

After a major outage took our PostgreSQL primary down for two hours, we brought Jamal in to build a proper HA setup. Patroni cluster, automated failover, WAL archiving to S3, and a runbook the on-call team actually follows. We've had three hardware failures since then. Zero downtime.

Infrastructure Lead, media streaming platformPostgreSQL high availability
Crawling & Data

We needed real estate listings from 14 fragmented MLS sources, each with different auth, pagination, and anti-scraping. Jamal built adapters for all of them, a unified schema, and deduplication logic. Our agents got a single clean feed for the first time ever.

CTO, proptech startupMulti-source data aggregation
SaaS & Platform

Moving from a monolith to event-driven microservices felt impossible with our team size. Jamal carved the domain boundaries, set up Kafka topics with proper schemas, and migrated the first three services while keeping the monolith running. The rest of the team picked up the pattern and kept going.

Engineering Manager, e-commerce companyEvent-driven architecture migration
Crawling & Data

We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.

Head of Data, EU e-commerce platform40M+ pages/week project
Crawling & Data

Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work: when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.

VP Engineering, real estate data companyEnd-to-end crawl & unblock infrastructure
SaaS & Platform

We were building multi-tenant SaaS from a single-tenant Rails monolith. Jamal designed the tenant isolation layer, data partitioning scheme, and migration path. We onboarded 200 tenants without a single data leak or downtime window.

CEO, HR tech startupMulti-tenant SaaS architecture
Database & HA

After a major outage took our PostgreSQL primary down for two hours, we brought Jamal in to build a proper HA setup. Patroni cluster, automated failover, WAL archiving to S3, and a runbook the on-call team actually follows. We've had three hardware failures since then. Zero downtime.

Infrastructure Lead, media streaming platformPostgreSQL high availability
SaaS & Platform

Moving from a monolith to event-driven microservices felt impossible with our team size. Jamal carved the domain boundaries, set up Kafka topics with proper schemas, and migrated the first three services while keeping the monolith running. The rest of the team picked up the pattern and kept going.

Engineering Manager, e-commerce companyEvent-driven architecture migration
Crawling & Data

Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers; he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.

CTO, price intelligence startupUnblocking & pipeline rescue
Database & HA

Our MySQL replicas were drifting 45 seconds behind under write-heavy load. Jamal restructured the replication topology, tuned InnoDB buffer pools, and moved the hottest tables to a dedicated cluster. Lag dropped to sub-second and stayed there through Black Friday.

DBA Lead, fintech marketplaceMySQL replication & performance
API & Performance

P95 latency on our public API had crept to 1.8 seconds. Jamal profiled the hot paths, added Redis caching with smart invalidation, restructured three N+1 query patterns, and pushed us down to 180ms. Customers noticed before we even announced it.

Product Lead, logistics API providerAPI latency reduction
Crawling & Data

We needed real estate listings from 14 fragmented MLS sources, each with different auth, pagination, and anti-scraping. Jamal built adapters for all of them, a unified schema, and deduplication logic. Our agents got a single clean feed for the first time ever.

CTO, proptech startupMulti-source data aggregation
Crawling & Data

We needed 40M product pages crawled weekly across six markets. Jamal built the spider fleet, hardening layer for the worst anti-bot markets, and delivered clean CSVs to our S3 bucket on schedule. Our data team finally stopped complaining.

Head of Data, EU e-commerce platform40M+ pages/week project
Database & HA

Our MySQL replicas were drifting 45 seconds behind under write-heavy load. Jamal restructured the replication topology, tuned InnoDB buffer pools, and moved the hottest tables to a dedicated cluster. Lag dropped to sub-second and stayed there through Black Friday.

DBA Lead, fintech marketplaceMySQL replication & performance
Database & HA

After a major outage took our PostgreSQL primary down for two hours, we brought Jamal in to build a proper HA setup. Patroni cluster, automated failover, WAL archiving to S3, and a runbook the on-call team actually follows. We've had three hardware failures since then. Zero downtime.

Infrastructure Lead, media streaming platformPostgreSQL high availability
Crawling & Data

Our stack was failing on 60% of targets after Cloudflare updates. Jamal didn't just patch crawlers; he rebuilt the unblocking pipeline in two weeks: fingerprinting, smarter retries, residential fallback, and custom logic where generic tools died. Failure rate dropped to under 3%.

CTO, price intelligence startupUnblocking & pipeline rescue
SaaS & Platform

We were building multi-tenant SaaS from a single-tenant Rails monolith. Jamal designed the tenant isolation layer, data partitioning scheme, and migration path. We onboarded 200 tenants without a single data leak or downtime window.

CEO, HR tech startupMulti-tenant SaaS architecture
Crawling & Data

We needed real estate listings from 14 fragmented MLS sources, each with different auth, pagination, and anti-scraping. Jamal built adapters for all of them, a unified schema, and deduplication logic. Our agents got a single clean feed for the first time ever.

CTO, proptech startupMulti-source data aggregation
Crawling & Data

Jamal ships the full loop: scheduling, monitoring, alerts, and API delivery. But the difference was the blocking work: when everyone else shrugged at 403s, he had a system. Fourteen months, zero missed deliveries.

VP Engineering, real estate data companyEnd-to-end crawl & unblock infrastructure
API & Performance

P95 latency on our public API had crept to 1.8 seconds. Jamal profiled the hot paths, added Redis caching with smart invalidation, restructured three N+1 query patterns, and pushed us down to 180ms. Customers noticed before we even announced it.

Product Lead, logistics API providerAPI latency reduction
SaaS & Platform

Moving from a monolith to event-driven microservices felt impossible with our team size. Jamal carved the domain boundaries, set up Kafka topics with proper schemas, and migrated the first three services while keeping the monolith running. The rest of the team picked up the pattern and kept going.

Engineering Manager, e-commerce companyEvent-driven architecture migration

Stack & focus areas

Systems architectureSaaS platform designTechnical leadershipDistributed systemsAPI design & integrationData pipelinesLLMs & AI agentsCloud infrastructure (AWS)Container orchestration (K8s)Self-hosted infrastructureEvent-driven architectureObservability & monitoringCI/CD & DevOpsRubyPythonPostgreSQLRedisKafkaElasticsearch

Activity & contributions

Issues, merge requests, commits, and code review across internal GitLab and private GitHub.

18,067 contributions in the last year