CT ingestor
atlas-ct-ingestor streams Certificate Transparency logs directly into Postgres. It is the upstream data source for the ct collector and subdomain pivot APIs.
Atlas does not scrape crt.sh.
How it works
- Fetch
https://www.gstatic.com/ct/log_list/v3/log_list.json - Upsert logs into
ct_logswith state (active,readonly,inactive) - For each log: read
get-sthtree size, fetchget-entriesin batches - Decode
leaf_input→ DER certificate - Extract SAN/CN, issuer, validity, SHA-256 fingerprint
- Store in
certificatesandcertificate_names(deduplicated) - Advance
last_fetched_index
Runtime configuration
Config is stored in Postgres (ingestor_config, key ct) and read every poll cycle.
| Field | Default | Description |
|---|---|---|
target_tlds | com, net, org, io, co.uk, com.au | Only store certs with names in these TLDs |
backfill_mode | true (seed) | Process multiple batches per log per cycle |
include_readonly | true (seed) | Also ingest historical readonly logs |
batches_per_cycle | 20 | Batches per log when backfill enabled |
batch_size | 512 | Entries per get-entries request |
View config
bash
curl http://localhost:8090/ct/configUpdate config
bash
curl -X PUT http://localhost:8090/ct/config \
-H "Content-Type: application/json" \
-d '{
"target_tlds": ["com", "io"],
"backfill_mode": true,
"include_readonly": true,
"batches_per_cycle": 25,
"batch_size": 512
}'Start backfill
POST /ct/backfill enables backfill mode and sets TLD targets in one call:
bash
curl -X POST http://localhost:8090/ct/backfill \
-H "Content-Type: application/json" \
-d '{
"target_tlds": ["com", "io", "co.uk", "net", "org"],
"include_readonly": true,
"batches_per_cycle": 30,
"batch_size": 512
}'Monitor progress
bash
curl http://localhost:8090/ct/statusReturns per-log last_fetched_index, last_tree_size, progress_pct, plus global certificate/domain counts.
TLD filtering
When target_tlds is non-empty, certificates are stored only if at least one SAN/CN maps to a registrable domain ending in a listed TLD.
Examples with target_tlds: ["com", "co.uk"]:
| Name | Stored? |
|---|---|
api.example.com | yes |
shop.example.co.uk | yes |
node.example.io | no |
Empty target_tlds disables filtering (store all decoded names).
Rate limiting
- 200–500 ms delay between batches per log
- 15–30 s poll interval between full cycles (
CT_POLL_INTERVAL_SECS) - Respects CT log operator rate limits; backs off on HTTP errors
Local development
bash
cd worker && cargo run --bin atlas-ct-ingestorEnvironment:
| Variable | Default | Description |
|---|---|---|
DATABASE_URL | postgres://atlas:atlas@postgres:5432/atlas | Postgres connection |
CT_POLL_INTERVAL_SECS | 30 | Seconds between ingest cycles |
Docker
yaml
ct-ingestor:
build:
context: ./worker
dockerfile: Dockerfile.ct-ingestor
environment:
DATABASE_URL: postgres://atlas:atlas@postgres:5432/atlas?sslmode=disable
CT_POLL_INTERVAL_SECS: "15"Operational notes
- Cold start: Subdomains appear only after relevant certs are ingested. Backfill
.comfirst for broad coverage. - Readonly logs: Historical logs are large;
progress_pctshows per-log completion. - Dedup: Certificates deduplicate on
fingerprint_sha256; names on(certificate_id, name). - Malformed entries: Skipped silently; ingestion continues.
Related docs
| Guide | Description |
|---|---|
| Collectors | CT collector (local lookup) |
| Data model | ct_logs, certificates schema |
| Operations | Stack deployment |