Skip to content

DNSDB

The DNSDB dataset contains multiple types of files:

  • Weekly Parquet files (hostnames.parquet.bz2, ip_hostname.parquet.bz2)
    Compressed Parquet files optimized for analytical queries. The data is based on the most recent 30 days of data with a smaller download size compared to CSV/ SQLite formats
  • Weekly SQLite database (dnsdb.sqlite.zst)
    The SQLite file only contains a hostnames table to lookup DNS records based on hostname.
  • Weekly CSV (dnsdb.csv.gz)
    The CSV contains the DNS records collected in the past 30 days.
  • Daily CSV (yyyy-mm-dd.csv.zst)
    The daily CSV contains all the DNS records collected for the given date.
  • Daily Parquet (yyyy-mm-dd.parquet)
    Daily file in Parquet format compressed with Zstandard (ZSTD).

File Formats

Parquet Files

Parquet is columnar and compressed, so these files download quickly and let tools like DuckDB/Athena filter billions of rows efficiently.

hostnames.parquet: Contains more columns compare to SQLite file: hostname, domain, type, value, options, last_seen

ip_hostname.parquet: Contains IP-to-hostname mappings with columns: ip, hostname, ip_type

SQLite

If you prefer to work with SQL, we would recommend the SQLite file. It comes pre-indexed for fast lookups by domain and can be synchronized weekly to stay up to date. The schema for the SQLite file is:

CREATE TABLE IF NOT EXISTS hostnames (
hostname TEXT,
domain TEXT,
type TEXT,
value TEXT
);
CREATE INDEX IF NOT EXISTS domain_index ON hostnames (domain);

Examples

SQLite Examples

Get a list of subdomains

The SQL query to fetch the information is simple:

SELECT hostname FROM hostnames WHERE domain='shodan.io';

To give you an idea of the performance of the dnsdb.sqlite file here are some numbers for grabbing all the data for the amazonaws.com domain:

Terminal window
$ time sqlite3 dnsdb.sqlite "SELECT count(*) FROM hostnames WHERE domain='amazonaws.com'"
4267421
real 0m0.193s
user 0m0.142s
sys 0m0.051s
$ time sqlite3 dnsdb.sqlite "SELECT * FROM hostnames WHERE domain='amazonaws.com'" > /dev/null
real 0m1.692s
user 0m1.432s
sys 0m0.260s

There are around 4.2 million records for the domain and it takes around 1.7 seconds to iterate over all of them using a mid-range laptop.

DuckDB Examples

The Parquet files can be queried directly using the DuckDB CLI. DuckDB is lightweight and fast for analytical queries on large datasets.

Get a count of hostnames by domain

Terminal window
duckdb -c "SELECT domain, COUNT(*) as count FROM read_parquet('hostnames.parquet/**') GROUP BY domain ORDER BY count DESC LIMIT 10"
100% ▕██████████████████████████████████████▏ (00:01:56.28 elapsed)
┌─────────────────────────────────┬───────────┐
│ domain │ count │
varchar │ int64 │
├─────────────────────────────────┼───────────┤
haplorrhini.com137610566
hstgr.net115531802
keenetic.io44049533
zendesk.com31444341
│ rose-virginia.herokuapp.com │ 25327304
nknlabs.io23584319
herokuappdev.com21955434
│ pear-virginia.herokuapp.com │ 21817875
│ magnolia-virginia.herokuapp.com │ 18356650
│ redbud-virginia.herokuapp.com │ 17662499
├─────────────────────────────────┴───────────┤
10 rows 2 columns │
└─────────────────────────────────────────────┘

Finding other websites hosted on the same server

DNSDB can be used to quickly identify other websites that are using the same public IP. For example, lets see which IPs that shodan.io is using:

Terminal window
duckdb -c "SELECT hostname,domain,type,value FROM read_parquet('hostnames.parquet/**') WHERE hostname='www' AND domain='shodan.io'"
┌──────────┬───────────┬─────────┬─────────────────────┐
│ hostname │ domain │ typevalue
varcharvarcharvarcharvarchar
├──────────┼───────────┼─────────┼─────────────────────┤
│ www │ shodan.io │ A │ 104.18.12.238
│ www │ shodan.io │ A │ 104.18.13.238
│ www │ shodan.io │ AAAA │ 2606:4700::6812:cee │
│ www │ shodan.io │ AAAA │ 2606:4700::6812:dee │
└──────────┴───────────┴─────────┴─────────────────────┘

And now we can query ip_hostname.parquet to quickly find all other hostnames associated with those IPs:

Terminal window
duckdb -c "SELECT ip, hostname FROM read_parquet('ip_hostname.parquet/**') WHERE ip='104.18.12.238'"
┌───────────────┬───────────────────────────────────────────────────────────┐
│ ip │ hostname │
varcharvarchar
├───────────────┼───────────────────────────────────────────────────────────┤
104.18.12.238imgenergysolutions.com
104.18.12.238www.imgenergysolutions.com │
104.18.12.238platform.demo.imaging.dedalus.com │
│ · │ · │
│ · │ · │
│ · │ · │
104.18.12.238wire.shodan.io
├───────────────┴───────────────────────────────────────────────────────────┤
138 rows (40 shown) 2 columns │
└───────────────────────────────────────────────────────────────────────────┘

The results above have been truncated because Shodan uses Cloudflare as its CDN so there are a lot of other websites with the same public IP.

Stream and filter DNS records by type

#!/usr/bin/env python3
import glob
import duckdb
conn = duckdb.connect(":memory:")
# Iterate over all Parquet files
for f in glob.glob("hostnames.parquet/*", recursive=True):
result = conn.execute(
f"SELECT * FROM read_parquet('{f}') WHERE type IN ('A', 'AAAA')"
)
# Fetch in Arrow record batches
arrow_result = result.fetch_record_batch(rows_per_batch=10000)
while True:
try:
batch = arrow_result.read_next_batch()
# Convert batch to Python and print rows
for i in range(len(batch)):
row = [batch.column(j)[i].as_py() for j in range(batch.num_columns)]
# Auto-convert None to empty strings
print(",".join("" if val is None else str(val) for val in row))
except StopIteration:
break

The list of hostnames and domains that DNSDB fetches information for is based on a variety of algorithms that follow common OSINT best practices. Those algorithms run continuously and are constantly adding new hostnames/ domains to the list of targets.

We refresh the DNS records for those targets in the following timeframes:

  • Monitored domains/ hostnames have their A and AAAA records refreshed every 3 hours
  • Weekly refresh of all record types
  • Semi-weekly refresh of Aand AAAArecords
  • Daily refresh for A and AAAAfor hostnames on priority list
    ⇒ To add hostnames/ domains to the priority list please send list of hostnames enterprise@shodan.io with subject “DNSDB: Priority List”