From Unstructured Text to Structured JSON in 3 Lines of Code

In today's data-driven world, one of the biggest bottlenecks for developers isn't the lack of data, but its format. Vital information is often trapped in unstructured sources: emails, product descriptions, support tickets, PDFs, and messy web pages. Getting this data into a clean, structured JSON format that your applications can actually use often means writing brittle regex, complex custom parsers, or maintaining clunky ETL pipelines.

What if you could skip all that?

What if you could simply describe the data you need in plain English and have an AI agent extract and format it for you instantly? That's the promise of extract.do—an AI-powered API that transforms unstructured text, documents, and websites into clean, structured JSON.

This is intelligent data extraction, where you define your data as code.

The Magic: A Practical Example

Let's dive right into the code. The title promises a solution in three lines, and we're here to deliver. Imagine you have a a piece of text and you need to pull out specific contact information.

Here’s how you do it with the extract.do SDK:

import { DO } from '@do-inc/sdk';

// 1. Initialize the .do client
const secret = process.env.DO_SECRET;
const digo = new DO({ secret });

// The source text and our desired data structure
const sourceText = 'Contact John Doe at j.doe@example.com. He is the CEO of Acme Inc.';
interface ContactInfo {
  name: string;
  email: string;
  company: string;
}

// 2. Run the extraction agent
const extractedData = await digo
  .agent<ContactInfo>('extract')
  .run({ 
    source: sourceText,
    description: 'Extract the full name, email, and company from the text.'
   });

// 3. Use your perfectly structured data
console.log(extractedData);

And the output is exactly what you asked for:

{
  "name": "John Doe",
  "email": "j.doe@example.com",
  "company": "Acme Inc."
}

Let's break down what just happened:

Define Your Goal: We defined a Typescript interface called ContactInfo. This isn't just for type safety; it's a direct instruction to the AI. You're showing it the exact fields and structure you want for the final JSON output.
Provide the Souce & Instructions: In the .run() method, we provided two key things: the source text and a simple, human-readable description of our goal.
Let the AI Do the Work: The extract agent intelligently parsed the text, identified the entities matching our description (full name, email, company), and formatted them according to our ContactInfo interface.

No regex, no manual string splitting, no complex logic. Just a clear definition of the desired outcome.

More Than Just Text Snippets

While the example above is powerful, it's just scratching the surface. The true power of extract.do lies in its versatility.

What kind of data sources can it handle?

extract.do is source-agnostic. The source property can be:

Raw Text: Like in our example.
HTML Content: Paste the entire HTML of a page.
A URL: Simply provide a URL, and the agent will fetch and parse the content for you. This makes it an incredibly powerful and simple web scraping tool.
Documents & Images: Process text from uploaded files like PDFs and images to extract data from invoices, receipts, or forms.

Is this just for web scraping?

No. While extract.do is a game-changer for web scraping (goodbye, brittle CSS selectors!), it’s a comprehensive data extraction and transformation engine. Think of it as a universal parser for:

Parsing Email Content: Extract order details, contact requests, or shipping notifications.
Processing Invoices: Pull line items, totals, and due dates from PDF invoices.
Standardizing User Content: Clean up and structure user-submitted profiles or listings.
Enriching Data: Pull key metrics or summaries from long articles or reports.

A Modern Alternative to ETL

For decades, developers have relied on Extract, Transform, Load (ETL) tools to move and process data. These tools are often complex, requiring you to build and maintain rigid pipelines. If the source website changes a layout div, your scraper breaks. If a new field is added to a log file, your parser fails.

extract.do replaces these fragile pipelines with a single, intelligent API call.

Instead of coding the process of extraction, you simply describe the result you want. The AI agent handles the "how," adapting to variations in the source format. This is a fundamental shift towards a more resilient, developer-friendly "Business-as-Code" approach.

Frequently Asked Questions

Q: How do I define the structure of the extracted data?
A: You define the output structure by providing a simple JSON schema or, as shown in our example, a Typescript interface. The AI agent uses this schema to understand exactly what fields to look for and returns the data in that precise format.

Q: How does extract.do compare to traditional ETL tools?
A: It replaces complex ETL pipelines with a simple API call. Instead of building and maintaining brittle parsers, you simply describe the data you want. Our AI agent handles the heavy lifting, adapting to changes in source format automatically, saving you countless hours of maintenance.

Get Started in Minutes

Stop writing parsers. Stop wrestling with unstructured data. Start building.

With extract.do, you have an intelligent agent ready to turn any data source into the clean, structured JSON your applications need.

Ready to transform your data workflow? Visit extract.do to get your API key and run your first extraction in minutes.

Do Work. With AI.