Scaling Lead Intelligence with Leadreaper - AWS Lambda, Serverless, & dbt

Scaling Lead Intelligence with Leadreaper - AWS Lambda, Serverless, & dbt


Facet developed Leadreaper, a scalable lead intelligence platform for identifying qualified lead opportunities for technology-driven companies. Leadreaper is built with the goals of powering large-scale sales teams who need near real-time, high quality data about their client’s detectable technology and operations.

  • Hyper-scalable via Serverless framework deployed on AWS Lambda.
  • Crawls scale up to 100,000 domains per day.
  • Performs dynamic segmentation and account scoring.
  • Layers data enrichment based on qualifying factors to better allocate costs.
  • Intelligently enrich, segment, and qualify sales opportunities.


  • Facet, like many web development studios, faced the challenge of inconsistent quality with lead intelligence and lead signals for qualifying customers.
  • Many lead databases incorrectly qualified prospective accounts with incorrect technologies, leading to time wasted doing out reach to accounts with a poor matching criteria.
  • Lead signals must be highly trusted, as timing is everything in determining the appropriate approach for a given client opportunity. Not only is the technology detection important, but the transition of technologies is a key signal.


Even if a company was reported as using a technology, there’s no convention in lead databases for identifying which website uses a particular technology — Facet needed a way to target a specific department if they were a user of Drupal plus other advertising or marketing technologies which would qualify them as spending on customer acquisition.


Facet engineered a scalable, serverless application with elastic infrastructure on AWS Lambda, enabling progressive profiling of target enterprise accounts and collect lead intelligence on prospective accounts.

Facet developed various web scraping plugin architectures, including:

Technology Detection

  • Passive technology detection with Wappalyzer
  • IP Whois Lookup for Hosting Infrastructure Detection or WAF detection
  • Drupal Version Detection
  • WordPress Version Detection

Metadata Capture

  • HTML Metatags Scraping
  • JSON-LD / Schema Metadata Scraping

User Experience Capture

  • Google Lighthouse score detection including PageSpeed, accessibility, security, and progressive web app scores.

Querying Third-Party APIs for Account Enrichment

Developing data transformation pipelines with dbt

  • By leveraging dbt (”data build tool”), Leadreaper can incrementally update data marts and fact tables through dbt’s own directed acyclic graphs (DAGs).
  • dbt ships with a number of tools for testing the data sources, transformations, and allowing Leadreaper’s data models to continually be updated and shipped with documentation as models change.


Through development of Leadreaper, Facet gained not only a qualified lead pipeline, but a way to quantify the opportunity of each client through additional quality scoring metrics. Leadreaper has been developed with the intention of commercializing the offering, and making additional methods of web scraping possible through plugin development.


Domains Scraped


Prospects Prioritized