Real-Time Data Enrichment: Connecting extract.do to Your CRM and Services

In today's data-driven world, the quality of your information is a direct competitive advantage. A new lead signing up with just an email address is a starting point, but it's not enough. To personalize outreach, score leads effectively, and truly understand your customers, you need a complete picture. This is where real-time data enrichment comes in—the process of instantly augmenting your existing data with valuable, publicly available information.

Traditionally, building an enrichment pipeline was a complex engineering task. It involved writing brittle web scrapers for multiple sites, integrating with various APIs, and building complex parsers—all of which would break the moment a website changed its layout.

What if you could replace all that complexity with a single, intelligent API call? With extract.do, you can. Let's explore how our AI-powered agent can become the core of your real-time data enrichment strategy.

What is Real-Time Data Enrichment?

Real-time data enrichment is the automated process of taking a single piece of data (like an email or a company name) and instantly enhancing it with additional context from external sources.

Imagine a user signs up for your newsletter with jane.doe@acme.com. A real-time enrichment workflow could automatically:

Identify the company name: Acme Inc.
Visit acme.com to find a company description and industry.
Search the web to find Jane Doe's LinkedIn profile.
Extract her job title, like Head of Marketing.
Compile this information into a structured profile.
Push the complete profile directly into your CRM.

All of this happens in seconds, transforming a simple email into a rich, actionable lead without any manual intervention.

The extract.do Advantage: Intelligent, Code-Driven Enrichment

Building the workflow above the old way is a headache. You'd need a scraper for the company website, another for Google search results, and a third for LinkedIn. Each would require constant maintenance.

extract.do simplifies this entire process into a series of declarative API calls. Instead of telling the machine how to find and parse the data with CSS selectors, you simply tell our AI agent what data you want.

Source-Agnostic: Point the agent at a URL, raw HTML, or a block of text. It understands the context and finds the data.
Resilient by Design: Our AI models don't rely on rigid selectors. They understand semantic structure, meaning they are far more resilient to website layout changes.
Your Data, Your Format: You define the output structure using a simple JSON schema or Typescript interface. The agent delivers clean, validated data every time.

This is the "Data as Code" philosophy in action. Your data enrichment logic becomes a simple, readable part of your application's codebase, not a separate, fragile infrastructure project.

Step-by-Step: Building a Lead Enrichment Workflow

Let's build the lead enrichment pipeline we described earlier using extract.do.

The Goal: Convert an email address into a full lead profile ready for our CRM.

The Workflow:

Trigger: A new email clark.k@dailyplanet.com is captured.
Derive Company URL: The domain dailyplanet.com is extracted.
Enrich Company Data: The extract.do agent is pointed at http://dailyplanet.com with instructions to find the company name and a brief description.
Find Professional Profile: The agent is given a search engine URL (e.g., Google or DuckDuckGo) for "Clark K Daily Planet LinkedIn" and is asked to extract the top profile URL.
Extract Role Data: The agent visits the extracted LinkedIn URL and is asked to find the person's full name and job title.
Assemble and Ingest: All the extracted JSON fragments are combined into a final profile and sent to the CRM API.

The Code: A Practical Typescript Example

Here’s how you could implement this enrichment logic within your application.

First, let's define the final data structure we want.

// Define the desired data structure for our enriched lead
interface EnrichedLeadProfile {
  fullName: string;
  email: string;
  jobTitle: string;
  linkedInUrl: string;
  company: {
    name: string;
    website: string;
    description: string;
  };
}

Now, let's create a function that orchestrates the calls to the extract.do agent.

import { DO } from '@do-inc/sdk';

// Initialize the .do client
const secret = process.env.DO_SECRET;
const digo = new DO({ secret });

async function enrichLead(email: string): Promise<EnrichedLeadProfile | null> {
  try {
    const domain = email.split('@')[1];
    const companyUrl = `https://${domain}`;
    const searchName = email.split('@')[0].replace('.', ' '); // "clark k"

    // Step 1: Extract Company Info
    const companyInfo = await digo
      .agent<{ name: string; description: string }>('extract')
      .run({
        source: companyUrl,
        description: 'Extract the company name and a one-sentence summary from their about us or homepage.',
      });

    // Step 2: Find LinkedIn Profile URL
    const searchUrl = `https://www.google.com/search?q=${searchName}+${domain}+linkedin`;
    const profileInfo = await digo
      .agent<{ profileUrl: string }>('extract')
      .run({
        source: searchUrl,
        description: `Find the LinkedIn profile URL for a person associated with ${domain}.`,
      });
      
    if (!profileInfo.profileUrl) return null;

    // Step 3: Extract Details from LinkedIn Profile
    const personDetails = await digo
      .agent<{ fullName: string; jobTitle: string }>('extract')
      .run({
        source: profileInfo.profileUrl,
        description: 'Extract the full name and current job title of this person.',
      });

    // Step 4: Assemble the final object
    const finalProfile: EnrichedLeadProfile = {
      fullName: personDetails.fullName,
      email: email,
      jobTitle: personDetails.jobTitle,
      linkedInUrl: profileInfo.profileUrl,
      company: {
        name: companyInfo.name,
        website: companyUrl,
        description: companyInfo.description,
      },
    };

    console.log('Enriched Profile:', finalProfile);
    // Next step: await sendToCrmApi(finalProfile);
    return finalProfile;

  } catch (error) {
    console.error("Enrichment failed:", error);
    return null;
  }
}

// Run the enrichment
enrichLead('clark.k@dailyplanet.com');

In just a few lines of declarative code, we've built a powerful, resilient enrichment pipeline that queries multiple web sources and returns perfectly structured data, ready for any system you want to connect it to.

Beyond CRMs: Other Powerful Integrations

While lead enrichment is a killer app, this pattern applies anywhere you need to connect and structure disparate information:

E-commerce: Enrich product lists with manufacturer specifications and reviews from across the web.
Finance: Aggregate financial news and sentiment about companies in your portfolio.
Compliance: Automate Know Your Business (KYB) checks by extracting corporate registry information.
Recruiting: Build detailed candidate profiles by aggregating data from portfolios, blogs, and social media.

With extract.do, you have a universal tool to turn the unstructured web into a structured database, available on-demand via a simple API.

Ready to stop maintaining brittle scrapers and start building intelligent data workflows? Get started with extract.do today.

Frequently Asked Questions (FAQs)

Q: What kind of data sources can extract.do handle?
A: extract.do is designed to be source-agnostic. You can provide raw text, HTML content, URLs to websites, or even text from documents and images. The AI agent intelligently parses the content to find the data you need.

Q: How do I define the structure of the extracted data?
A: You define the output structure by providing a simple JSON schema or a Typescript interface. The AI agent uses this schema to understand what fields to look for (e.g., 'name', 'email', 'invoice_amount') and returns the data in that exact format.

Q: Is this just for web scraping?
A: While excellent for web scraping, extract.do is much more. It's a comprehensive extraction and transformation tool. Use it to parse emails, process invoices, standardize user-generated content, or any task that requires turning unstructured information into structured data, like the real-time enrichment pipelines shown above.

Q: How does this compare to traditional ETL tools?
A: extract.do replaces complex ETL pipelines with a simple API call. Instead of building and maintaining brittle parsers and scripts, you simply describe the data you want. Our AI agent handles the heavy lifting, adapting to changes in source format automatically.

Do Work. With AI.