Web Extractor

Structured web data extraction. Send URLs and a JSON Schema describing the data shape you want, receive clean structured data back. Resilient parallel processing — partial failures don't break the batch.

$0.01 / URL USDC Base

Endpoint

POST /services/web-extractor

Input Schema

{
  "urls": ["https://example.com/product"],   // 1-10 URLs
  "schema": {                                // JSON Schema for desired output
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "price": { "type": "number" }
    },
    "required": ["title", "price"]
  }
}

Output Schema

{
  "results": [{
    "url": "string",
    "status": "success | fetch_error | extraction_error",
    "data": { ... },       // Extracted data matching your schema
    "error": null | { "code": "string", "message": "string", "retryable": boolean }
  }],
  "summary": {
    "total": number,
    "succeeded": number,
    "failed": number,
    "price_charged": "string"
  }
}

Example

Request:

curl -X POST /services/web-extractor \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/product"],
    "schema": {
      "type": "object",
      "properties": {
        "title": { "type": "string" },
        "price": { "type": "number" },
        "description": { "type": "string" }
      },
      "required": ["title", "price"]
    }
  }'

Response:

{
  "results": [{
    "url": "https://example.com/product",
    "status": "success",
    "data": { "title": "Widget Pro", "price": 29.99, "description": "A premium widget" },
    "error": null
  }],
  "summary": { "total": 1, "succeeded": 1, "failed": 0, "price_charged": "$0.01" }
}
Back to Home