In today's data-driven world, the quality of your information is a direct competitive advantage. A new lead signing up with just an email address is a starting point, but it's not enough. To personalize outreach, score leads effectively, and truly understand your customers, you need a complete picture. This is where real-time data enrichment comes in—the process of instantly augmenting your existing data with valuable, publicly available information.
Traditionally, building an enrichment pipeline was a complex engineering task. It involved writing brittle web scrapers for multiple sites, integrating with various APIs, and building complex parsers—all of which would break the moment a website changed its layout.
What if you could replace all that complexity with a single, intelligent API call? With extract.do, you can. Let's explore how our AI-powered agent can become the core of your real-time data enrichment strategy.
Real-time data enrichment is the automated process of taking a single piece of data (like an email or a company name) and instantly enhancing it with additional context from external sources.
Imagine a user signs up for your newsletter with jane.doe@acme.com. A real-time enrichment workflow could automatically:
All of this happens in seconds, transforming a simple email into a rich, actionable lead without any manual intervention.
Building the workflow above the old way is a headache. You'd need a scraper for the company website, another for Google search results, and a third for LinkedIn. Each would require constant maintenance.
extract.do simplifies this entire process into a series of declarative API calls. Instead of telling the machine how to find and parse the data with CSS selectors, you simply tell our AI agent what data you want.
This is the "Data as Code" philosophy in action. Your data enrichment logic becomes a simple, readable part of your application's codebase, not a separate, fragile infrastructure project.
Let's build the lead enrichment pipeline we described earlier using extract.do.
The Goal: Convert an email address into a full lead profile ready for our CRM.
The Workflow:
Here’s how you could implement this enrichment logic within your application.
First, let's define the final data structure we want.
// Define the desired data structure for our enriched lead
interface EnrichedLeadProfile {
fullName: string;
email: string;
jobTitle: string;
linkedInUrl: string;
company: {
name: string;
website: string;
description: string;
};
}
Now, let's create a function that orchestrates the calls to the extract.do agent.
import { DO } from '@do-inc/sdk';
// Initialize the .do client
const secret = process.env.DO_SECRET;
const digo = new DO({ secret });
async function enrichLead(email: string): Promise<EnrichedLeadProfile | null> {
try {
const domain = email.split('@')[1];
const companyUrl = `https://${domain}`;
const searchName = email.split('@')[0].replace('.', ' '); // "clark k"
// Step 1: Extract Company Info
const companyInfo = await digo
.agent<{ name: string; description: string }>('extract')
.run({
source: companyUrl,
description: 'Extract the company name and a one-sentence summary from their about us or homepage.',
});
// Step 2: Find LinkedIn Profile URL
const searchUrl = `https://www.google.com/search?q=${searchName}+${domain}+linkedin`;
const profileInfo = await digo
.agent<{ profileUrl: string }>('extract')
.run({
source: searchUrl,
description: `Find the LinkedIn profile URL for a person associated with ${domain}.`,
});
if (!profileInfo.profileUrl) return null;
// Step 3: Extract Details from LinkedIn Profile
const personDetails = await digo
.agent<{ fullName: string; jobTitle: string }>('extract')
.run({
source: profileInfo.profileUrl,
description: 'Extract the full name and current job title of this person.',
});
// Step 4: Assemble the final object
const finalProfile: EnrichedLeadProfile = {
fullName: personDetails.fullName,
email: email,
jobTitle: personDetails.jobTitle,
linkedInUrl: profileInfo.profileUrl,
company: {
name: companyInfo.name,
website: companyUrl,
description: companyInfo.description,
},
};
console.log('Enriched Profile:', finalProfile);
// Next step: await sendToCrmApi(finalProfile);
return finalProfile;
} catch (error) {
console.error("Enrichment failed:", error);
return null;
}
}
// Run the enrichment
enrichLead('clark.k@dailyplanet.com');
In just a few lines of declarative code, we've built a powerful, resilient enrichment pipeline that queries multiple web sources and returns perfectly structured data, ready for any system you want to connect it to.
While lead enrichment is a killer app, this pattern applies anywhere you need to connect and structure disparate information:
With extract.do, you have a universal tool to turn the unstructured web into a structured database, available on-demand via a simple API.
Ready to stop maintaining brittle scrapers and start building intelligent data workflows? Get started with extract.do today.
Q: What kind of data sources can extract.do handle?
A: extract.do is designed to be source-agnostic. You can provide raw text, HTML content, URLs to websites, or even text from documents and images. The AI agent intelligently parses the content to find the data you need.
Q: How do I define the structure of the extracted data?
A: You define the output structure by providing a simple JSON schema or a Typescript interface. The AI agent uses this schema to understand what fields to look for (e.g., 'name', 'email', 'invoice_amount') and returns the data in that exact format.
Q: Is this just for web scraping?
A: While excellent for web scraping, extract.do is much more. It's a comprehensive extraction and transformation tool. Use it to parse emails, process invoices, standardize user-generated content, or any task that requires turning unstructured information into structured data, like the real-time enrichment pipelines shown above.
Q: How does this compare to traditional ETL tools?
A: extract.do replaces complex ETL pipelines with a simple API call. Instead of building and maintaining brittle parsers and scripts, you simply describe the data you want. Our AI agent handles the heavy lifting, adapting to changes in source format automatically.