Skip to content

Data model

Postgres schema for the intelligence graph and campaign orchestration layers.

Intelligence graph (global)

These tables are cross-campaign — the long-lived source of truth for pivots and domain queries.

ct_logs

Tracks public Certificate Transparency log sources and ingestion cursors.

ColumnTypeDescription
idTEXT PKInternal ID
urlTEXT UNIQUECT log base URL
descriptionTEXTLog name from Google log list
stateTEXTactive, readonly, inactive
last_tree_sizeBIGINTLatest observed tree size
last_fetched_indexBIGINTNext entry index to fetch
updated_atTIMESTAMPTZLast sync time

certificates

Deduplicated certificate metadata from CT ingestion.

ColumnTypeDescription
fingerprint_sha256TEXT UNIQUESHA-256 of DER
subject_cnTEXTSubject common name
issuerTEXTIssuer CN
not_before / not_afterTIMESTAMPTZValidity window
raw_derBYTEAOptional raw certificate
first_seenTIMESTAMPTZFirst observation
source_log_idTEXT FKOriginating CT log

certificate_names

SAN/CN names extracted from certificates.

ColumnTypeDescription
certificate_idTEXT FKParent certificate
nameTEXTDNS name (lowercase)
registered_domainTEXTRegistrable apex
is_wildcardBOOLWildcard cert flag
first_seenTIMESTAMPTZFirst observation

Unique on (certificate_id, name).

domains

Registrable domains discovered or seeded.

ColumnTypeDescription
domainTEXT UNIQUEApex domain
first_seen / last_seenTIMESTAMPTZObservation window

rdap_records

Cached RDAP responses per fetch.

ColumnTypeDescription
domain_idTEXT FKParent domain
registrar / registryTEXTNormalised registration info
created_at / updated_at / expires_atTIMESTAMPTZRegistration dates
statusesJSONBRDAP status array
nameserversJSONBNS hostnames
entitiesJSONBVisible entity handles/orgs
redactedBOOLRegistrant data redacted
raw_jsonJSONBFull RDAP JSON
fetched_atTIMESTAMPTZFetch timestamp

dns_records

Resolved DNS observations.

ColumnTypeDescription
nameTEXTQuery name
record_typeTEXTA, AAAA, NS, MX, TXT, CNAME
valueTEXTRecord value
ttlINTOptional TTL
first_seen / last_seenTIMESTAMPTZObservation window

Unique on (name, record_type, value).

hosts

Discovered hostnames (apex or subdomain).

ColumnTypeDescription
hostnameTEXT UNIQUEFQDN
registered_domainTEXTParent apex

http_fingerprints

HTTP surface observations per host.

ColumnTypeDescription
host_idTEXT FKParent host
schemeTEXThttp or https
status_codeINTHTTP status
titleTEXTPage title
server_headerTEXTServer header
headersJSONBResponse headers
favicon_hashTEXTSHA-256 of favicon
technologiesJSONBReserved for future tech detection
tracker_idsJSONBAnalytics IDs (UA-, GTM-, G-)
final_urlTEXTURL after redirects
fetched_atTIMESTAMPTZFetch time

graph_edges

Typed relationships for pivot queries.

ColumnTypeDescription
source_type / source_idTEXTe.g. domain + dom_...
relationshipTEXTe.g. uses_ns, has_certificate
target_type / target_idTEXTe.g. nameserver + ns1.example.net
confidenceREALDefault 1.0
sourceTEXTAttribution (rdap, dns, ct, campaign:dns, …)
first_seen / last_seenTIMESTAMPTZEdge lifetime

Unique on (source_type, source_id, relationship, target_type, target_id).

Common relationships

RelationshipSource → TargetExample
has_subdomaindomain → hostexample.comapi.example.com
has_certificatedomain → certificatefingerprint SHA-256
registered_withdomain → registrarRDAP registrar name
uses_nsdomain → nameserverns1.cloudflare.com
uses_mxdomain → mxmail.example.net
resolves_todomain → ip203.0.113.1
has_entitydomain → entityRDAP handle (non-redacted)
shares_favicondomain → favicon_hashSHA-256
shares_trackerdomain → tracker_idUA-12345
cname_todomain → hostCNAME target
redirects_todomain → urlHTTP redirect

ingestor_config

Runtime configuration for ct-ingestor (key ct).

json
{
  "target_tlds": ["com", "net", "org", "io", "co.uk"],
  "backfill_mode": true,
  "include_readonly": true,
  "batches_per_cycle": 20,
  "batch_size": 512
}

Campaign layer (per run)

TablePurpose
campaignsDiscovery run config and status
entitiesCanonical entities (type, value)
campaign_entitiesCampaign membership + depth
observationsRaw collector JSON per entity
edgesCampaign-scoped relationships with evidence
crawl_jobsPer-entity, per-collector job state
campaign_eventsAudit/event log
expansion_suggestionsPending/approved expansion candidates

Campaign discoveries are synced into the intelligence graph (domains, graph_edges, etc.) when collectors complete.

Entity types (campaign)

TypeExamples
domainexample.com
subdomainapi.example.com
ip203.0.113.1
nameserverns1.example.net
mxmail.example.net
certificateSHA-256 fingerprint
organisationTLS cert org
registrarRDAP registrar
analytics_idUA-12345
favicon_hashSHA-256
urlhttps://example.com/
emailseed classification only
GuideDescription
ArchitectureHow data flows into tables
PivotsQuerying graph_edges
CT ingestorPopulating certificates

Native tools, weird experiments, and practical performance work.