DNSDB
The DNSDB dataset contains multiple types of files:
- Weekly Parquet files (
hostnames.parquet.bz2,ip_hostname.parquet.bz2)
Compressed Parquet files optimized for analytical queries. The data is based on the most recent 30 days of data with a smaller download size compared to CSV/ SQLite formats - Weekly SQLite database (
dnsdb.sqlite.zst)
The SQLite file only contains ahostnamestable to lookup DNS records based on hostname. - Weekly CSV (
dnsdb.csv.gz)
The CSV contains the DNS records collected in the past 30 days. - Daily CSV (
yyyy-mm-dd.csv.zst)
The daily CSV contains all the DNS records collected for the given date. - Daily Parquet (
yyyy-mm-dd.parquet)
Daily file in Parquet format compressed with Zstandard (ZSTD).
File Formats
Parquet Files
Parquet is columnar and compressed, so these files download quickly and let tools like DuckDB/Athena filter billions of rows efficiently.
hostnames.parquet: Contains more columns compare to SQLite file: hostname, domain, type, value, options, last_seen
ip_hostname.parquet: Contains IP-to-hostname mappings with columns: ip, hostname, ip_type
SQLite
If you prefer to work with SQL, we would recommend the SQLite file. It comes pre-indexed for fast lookups by domain and can be synchronized weekly to stay up to date. The schema for the SQLite file is:
CREATE TABLE IF NOT EXISTS hostnames ( hostname TEXT, domain TEXT, type TEXT, value TEXT);CREATE INDEX IF NOT EXISTS domain_index ON hostnames (domain);Examples
SQLite Examples
Get a list of subdomains
The SQL query to fetch the information is simple:
SELECT hostname FROM hostnames WHERE domain='shodan.io';To give you an idea of the performance of the dnsdb.sqlite file here are some numbers for grabbing all the data for the amazonaws.com domain:
$ time sqlite3 dnsdb.sqlite "SELECT count(*) FROM hostnames WHERE domain='amazonaws.com'"4267421real 0m0.193suser 0m0.142ssys 0m0.051s
$ time sqlite3 dnsdb.sqlite "SELECT * FROM hostnames WHERE domain='amazonaws.com'" > /dev/nullreal 0m1.692suser 0m1.432ssys 0m0.260sThere are around 4.2 million records for the domain and it takes around 1.7 seconds to iterate over all of them using a mid-range laptop.
DuckDB Examples
The Parquet files can be queried directly using the DuckDB CLI. DuckDB is lightweight and fast for analytical queries on large datasets.
Get a count of hostnames by domain
duckdb -c "SELECT domain, COUNT(*) as count FROM read_parquet('hostnames.parquet/**') GROUP BY domain ORDER BY count DESC LIMIT 10"100% ▕██████████████████████████████████████▏ (00:01:56.28 elapsed)┌─────────────────────────────────┬───────────┐│ domain │ count ││ varchar │ int64 │├─────────────────────────────────┼───────────┤│ haplorrhini.com │ 137610566 ││ hstgr.net │ 115531802 ││ keenetic.io │ 44049533 ││ zendesk.com │ 31444341 ││ rose-virginia.herokuapp.com │ 25327304 ││ nknlabs.io │ 23584319 ││ herokuappdev.com │ 21955434 ││ pear-virginia.herokuapp.com │ 21817875 ││ magnolia-virginia.herokuapp.com │ 18356650 ││ redbud-virginia.herokuapp.com │ 17662499 │├─────────────────────────────────┴───────────┤│ 10 rows 2 columns │└─────────────────────────────────────────────┘Finding other websites hosted on the same server
DNSDB can be used to quickly identify other websites that are using the same public IP. For example, lets see which IPs that shodan.io is using:
duckdb -c "SELECT hostname,domain,type,value FROM read_parquet('hostnames.parquet/**') WHERE hostname='www' AND domain='shodan.io'"┌──────────┬───────────┬─────────┬─────────────────────┐│ hostname │ domain │ type │ value ││ varchar │ varchar │ varchar │ varchar │├──────────┼───────────┼─────────┼─────────────────────┤│ www │ shodan.io │ A │ 104.18.12.238 ││ www │ shodan.io │ A │ 104.18.13.238 ││ www │ shodan.io │ AAAA │ 2606:4700::6812:cee ││ www │ shodan.io │ AAAA │ 2606:4700::6812:dee │└──────────┴───────────┴─────────┴─────────────────────┘And now we can query ip_hostname.parquet to quickly find all other hostnames associated with those IPs:
duckdb -c "SELECT ip, hostname FROM read_parquet('ip_hostname.parquet/**') WHERE ip='104.18.12.238'"┌───────────────┬───────────────────────────────────────────────────────────┐│ ip │ hostname ││ varchar │ varchar │├───────────────┼───────────────────────────────────────────────────────────┤│ 104.18.12.238 │ imgenergysolutions.com ││ 104.18.12.238 │ www.imgenergysolutions.com ││ 104.18.12.238 │ platform.demo.imaging.dedalus.com ││ · │ · ││ · │ · ││ · │ · ││ 104.18.12.238 │ wire.shodan.io │├───────────────┴───────────────────────────────────────────────────────────┤│ 138 rows (40 shown) 2 columns │└───────────────────────────────────────────────────────────────────────────┘The results above have been truncated because Shodan uses Cloudflare as its CDN so there are a lot of other websites with the same public IP.
Stream and filter DNS records by type
#!/usr/bin/env python3import globimport duckdb
conn = duckdb.connect(":memory:")
# Iterate over all Parquet filesfor f in glob.glob("hostnames.parquet/*", recursive=True): result = conn.execute( f"SELECT * FROM read_parquet('{f}') WHERE type IN ('A', 'AAAA')" )
# Fetch in Arrow record batches arrow_result = result.fetch_record_batch(rows_per_batch=10000)
while True: try: batch = arrow_result.read_next_batch()
# Convert batch to Python and print rows for i in range(len(batch)): row = [batch.column(j)[i].as_py() for j in range(batch.num_columns)] # Auto-convert None to empty strings print(",".join("" if val is None else str(val) for val in row)) except StopIteration: breakThe list of hostnames and domains that DNSDB fetches information for is based on a variety of algorithms that follow common OSINT best practices. Those algorithms run continuously and are constantly adding new hostnames/ domains to the list of targets.
We refresh the DNS records for those targets in the following timeframes:
- Monitored domains/ hostnames have their
AandAAAArecords refreshed every 3 hours - Weekly refresh of all record types
- Semi-weekly refresh of
AandAAAArecords - Daily refresh for
AandAAAAfor hostnames on priority list
⇒ To add hostnames/ domains to the priority list please send list of hostnames enterprise@shodan.io with subject “DNSDB: Priority List”