External API Guide

Public API integration guide

External systems can query platform status, sources, jobs, latest runs, and use safe simulation endpoints for crawling, parsing, deduplication, compliance, and scheduling logic.

API integration interface

Base URL and response format

Base URL: https://crawler.sun-bd.com

Response envelope:
{
  "success": true,
  "traceId": "00-...",
  "data": { },
  "warnings": []
}

Member authentication

This project does not provide local member login or registration. Protected management APIs require the access token returned by the central member API. Public guide endpoints remain available for demonstrations and safe simulations.

Common endpoints

GET

/api/public/v1/health

Returns platform health, store mode, index prefix, and storage status.

GET

/api/public/v1/catalog

Lists available APIs, methods, purposes, and sample payloads.

GET

/api/public/v1/features

Returns platform capability summaries for product pages or external clients.

GET

/api/public/v1/sources?tenantId=tenant-hq

Reads source URL, legal status, robots status, and scheduling details.

GET

/api/public/v1/jobs?tenantId=tenant-hq

Reads jobs, cadence, enabled state, source mapping, and parser rules.

GET

/api/public/v1/runs/latest?tenantId=tenant-hq

Returns latest worker run status, including success, failure, HTTP 403, CAPTCHA, and quality indicators.

POST

/api/public/v1/crawl/simulate

Runs a crawl dry-run and returns estimated records, warnings, and quality summary.

POST

/api/public/v1/parser-rule/suggest

Suggests CSS selector and XPath rules from a URL or sample HTML.

POST

/api/public/v1/quality/dedupe

Checks whether content is likely duplicate through hash and similarity checks.

POST

/api/public/v1/compliance/robots-check

Checks whether robots.txt allows a specified user-agent to crawl a URL.

POST

/api/public/v1/scheduler/preview

Generates upcoming run times from a cron expression.

Quick tests

curl https://crawler.sun-bd.com/api/public/v1/health

curl https://crawler.sun-bd.com/api/public/v1/catalog

curl "https://crawler.sun-bd.com/api/public/v1/sources?tenantId=tenant-hq"

curl -X POST https://crawler.sun-bd.com/api/public/v1/crawl/simulate \
  -H "Content-Type: application/json" \
  -d "{\"tenantId\":\"tenant-hq\",\"jobId\":\"job-news-demo\",\"maxRecords\":8}"