Skip to content

Collectors

Atlas workers run collectors asynchronously via NATS. Each collector accepts a seed entity, performs external lookups, stores an observation, and returns discoveries for graph expansion.

Overview

CollectorNATS subjectSeed typesIntelligence tables
dnsatlas.jobs.dnsdomain, subdomain, nameserverdns_records, graph_edges
httpatlas.jobs.httpdomain, subdomain, url, iphttp_fingerprints, graph_edges
tlsatlas.jobs.tlsdomain, subdomain, url, ipgraph_edges
ctatlas.jobs.ctdomain, subdomainreads local CT store
rdapatlas.jobs.rdapdomain, subdomain (→ apex)rdap_records, graph_edges

Direct enrichment (POST /domains) uses the same collector logic via atlas.enrich.domain.

DNS

Resolves:

  • A / AAAA — IPv4/IPv6 addresses
  • NS — nameservers
  • MX — mail exchangers
  • TXT — text records
  • CNAME — canonical name aliases

Discoveries: ip, nameserver, mx, txt, domain/subdomain (via CNAME).

Campaign edges: RESOLVES_TO, USES_NS, USES_MX, HAS_TXT, CNAME_TO.

HTTP

Fetches http:// and https:// (with redirect following).

Extracts:

  • Status code, final URL, redirect chain
  • Page <title>
  • Response headers (Server, etc.)
  • Favicon SHA-256 hash
  • Analytics IDs (UA-, GTM-, G- patterns)
  • Linked subdomains from page body (same apex)

Campaign edges: REDIRECTS_TO, SHARES_FAVICON, SHARES_ANALYTICS_ID, LINKED_FROM.

TLS

Connects to port 443 and inspects the leaf certificate.

Extracts:

  • SHA-256 fingerprint
  • Subject, issuer, organisation
  • Validity dates
  • Subject Alternative Names (SANs)

Campaign edges: HAS_CERT, CERT_HAS_SAN, REGISTERED_WITH.

CT (local store)

Does not call crt.sh. Queries the Postgres CT ingestion store:

  • certificate_names for subdomains under the seed apex
  • certificates for metadata (issuer, validity, fingerprint)

Requires ct-ingestor to have ingested relevant certificates first. Use CT backfill to populate TLDs of interest.

Campaign edges: FOUND_IN_CT.

RDAP

Resolves RDAP server via IANA bootstrap, fetches domain JSON, normalises:

  • Registrar, registry
  • Registration / update / expiry dates
  • Nameservers, statuses
  • Visible entity handles (with redaction flags)

Caching: 7-day TTL in rdap_records. Respects 429 rate limits with backoff.

Campaign edges: USES_NS, REGISTERED_WITH.

Collector routing

Not every entity type runs every collector:

EntityCollectors
domain, subdomaindns, http, tls, ct, rdap
iphttp, tls
nameserverdns
urldns, http, tls

Campaign expansion

Discoveries at depth N+1 become expansion suggestions until approved:

bash
curl -X POST http://localhost:8090/campaigns/{id}/expand \
  -H "Content-Type: application/json" \
  -d '{ "entity_ids": ["ent_..."] }'

Respects max_depth and max_entities campaign limits.

Global graph sync

When a campaign collector completes, discoveries are mirrored into:

  • domains, hosts
  • graph_edges (source: campaign:{collector})

This lets pivot APIs surface infrastructure found during campaigns without re-querying campaign tables.

GuideDescription
CT ingestorFeeding the CT collector
API referenceStarting campaigns and enrichment
Data modelWhere observations land

Native tools, weird experiments, and practical performance work.