Structured web data extraction. Send URLs and a JSON Schema describing the data shape you want, receive clean structured data back. Resilient parallel processing — partial failures don't break the batch.
POST /services/web-extractor
{
"urls": ["https://example.com/product"], // 1-10 URLs
"schema": { // JSON Schema for desired output
"type": "object",
"properties": {
"title": { "type": "string" },
"price": { "type": "number" }
},
"required": ["title", "price"]
}
}
{
"results": [{
"url": "string",
"status": "success | fetch_error | extraction_error",
"data": { ... }, // Extracted data matching your schema
"error": null | { "code": "string", "message": "string", "retryable": boolean }
}],
"summary": {
"total": number,
"succeeded": number,
"failed": number,
"price_charged": "string"
}
}
Request:
curl -X POST /services/web-extractor \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com/product"],
"schema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"price": { "type": "number" },
"description": { "type": "string" }
},
"required": ["title", "price"]
}
}'
Response:
{
"results": [{
"url": "https://example.com/product",
"status": "success",
"data": { "title": "Widget Pro", "price": 29.99, "description": "A premium widget" },
"error": null
}],
"summary": { "total": 1, "succeeded": 1, "failed": 0, "price_charged": "$0.01" }
}
Back to Home