In today's data-driven world, one of the biggest bottlenecks for developers isn't the lack of data, but its format. Vital information is often trapped in unstructured sources: emails, product descriptions, support tickets, PDFs, and messy web pages. Getting this data into a clean, structured JSON format that your applications can actually use often means writing brittle regex, complex custom parsers, or maintaining clunky ETL pipelines.
What if you could skip all that?
What if you could simply describe the data you need in plain English and have an AI agent extract and format it for you instantly? That's the promise of extract.do—an AI-powered API that transforms unstructured text, documents, and websites into clean, structured JSON.
This is intelligent data extraction, where you define your data as code.
Let's dive right into the code. The title promises a solution in three lines, and we're here to deliver. Imagine you have a a piece of text and you need to pull out specific contact information.
Here’s how you do it with the extract.do SDK:
import { DO } from '@do-inc/sdk';
// 1. Initialize the .do client
const secret = process.env.DO_SECRET;
const digo = new DO({ secret });
// The source text and our desired data structure
const sourceText = 'Contact John Doe at j.doe@example.com. He is the CEO of Acme Inc.';
interface ContactInfo {
name: string;
email: string;
company: string;
}
// 2. Run the extraction agent
const extractedData = await digo
.agent<ContactInfo>('extract')
.run({
source: sourceText,
description: 'Extract the full name, email, and company from the text.'
});
// 3. Use your perfectly structured data
console.log(extractedData);
And the output is exactly what you asked for:
{
"name": "John Doe",
"email": "j.doe@example.com",
"company": "Acme Inc."
}
Let's break down what just happened:
No regex, no manual string splitting, no complex logic. Just a clear definition of the desired outcome.
While the example above is powerful, it's just scratching the surface. The true power of extract.do lies in its versatility.
extract.do is source-agnostic. The source property can be:
No. While extract.do is a game-changer for web scraping (goodbye, brittle CSS selectors!), it’s a comprehensive data extraction and transformation engine. Think of it as a universal parser for:
For decades, developers have relied on Extract, Transform, Load (ETL) tools to move and process data. These tools are often complex, requiring you to build and maintain rigid pipelines. If the source website changes a layout div, your scraper breaks. If a new field is added to a log file, your parser fails.
extract.do replaces these fragile pipelines with a single, intelligent API call.
Instead of coding the process of extraction, you simply describe the result you want. The AI agent handles the "how," adapting to variations in the source format. This is a fundamental shift towards a more resilient, developer-friendly "Business-as-Code" approach.
Q: How do I define the structure of the extracted data?
A: You define the output structure by providing a simple JSON schema or, as shown in our example, a Typescript interface. The AI agent uses this schema to understand exactly what fields to look for and returns the data in that precise format.
Q: How does extract.do compare to traditional ETL tools?
A: It replaces complex ETL pipelines with a simple API call. Instead of building and maintaining brittle parsers, you simply describe the data you want. Our AI agent handles the heavy lifting, adapting to changes in source format automatically, saving you countless hours of maintenance.
Stop writing parsers. Stop wrestling with unstructured data. Start building.
With extract.do, you have an intelligent agent ready to turn any data source into the clean, structured JSON your applications need.
Ready to transform your data workflow? Visit extract.do to get your API key and run your first extraction in minutes.