Bulk Data Files
Shodan provides a few different datasets as bulk data files:
banners-daily
,banners-hourly
: contains all the banner/ service information that crawlers collected during a given day/ hour. Each file is compressed usingzstd
and contains a single JSON-encoded banner per line. The most recent 30 days are always available for download. This data powers the/shodan/host/
set of API endpoints. Visit the data dashboard to get a sense of what the latestbanners-daily
file contains.raw-daily
: the legacy dataset containing the banner data. It’s formatted usinggzip
. We continue to support this dataset but for new projects we recommend the`banners-daily`
/banners-hourly
datasets.dnsdb
: DNS data gathered using OSINT techniques. This data powers the/dns/domain/
endpoint of the API.internetdb
: ready-to-go database file that contains minimal service information but is small enough to fit into memory. It powers the InternetDB API.cvedb
: SQLite database containing information about the CVEs published to NVD. It powers our public CVEDB API.internet-scanners
: contains a list of IPs that have been observed scanning the Internet within the past 24 hours. This data is used to add thescanner
tag on the banners.
Bulk Data API
The Bulk Data API methods provide a programmatic way to discover and download all the raw data files that Shodan generates. And the data itself is stored in the cloud for optimized delivery across regions. The current methods for the API are documented on the developer website.
The /shodan/data
method returns a list of available datasets and metadata about them:
[ { scope: "monthly", name: "internetdb", description: "Minified database containing network information about all IPs on the Internet" }, { scope: "monthly", name: "dnsdb", description: "DNS data for active domains on the Internet" }, { scope: "daily", name: "banners-daily", description: "Data files containing all the information collected during a day" }]
The /shodan/data/{dataset}
method returns a list of URLs that can be used to download the files within a dataset. For example, the below shows part of the response for the /shodan/data/raw-daily
request:
[ { "url": "https://...", "timestamp": 1611711401000, "sha1": "5a91f49c90da5ab8856c83c84846941115c55441", "name": "2021-01-26.json.gz", "size": 104650655998 }, { "url": "https://...", "timestamp": 1611655444000, "sha1": "ea29acc25fc154ac64dde0ab294824ae7f1f64c9", "name": "2021-01-25.json.gz", "size": 152517565458 }, { "url": "https://...", "timestamp": 1611540775000, "sha1": "aed18f2a952df7731fec447d81ead8a96907000d", "name": "2021-01-24.json.gz", "size": 161275556509 }, ...]
Downloading the Data
The Bulk Data API files are hosted on Backblaze B2 which supports the ability to download the data in chunks which means you can use multiple connections to download a single file. It will significantly speed up the downloads if you can take advantage of that, especially as the bulk data file sizes continue to increase. The recommended tool for downloading the data is aria2c. The following is a sample command using aria2c
that downloads a file with 4 concurrent connections to the server:
aria2c -x 4 -s 4 -o filename.json.gz http://<bulk-data-url>
The aria2c
process will pre-allocate the entire data file and then fill in the data as it is downloaded.
Quickstart
If you’re just getting started and want to try out the Bulk Data API then check out the Shodan CLI. It supports all the Bulk Data API methods. For example, to get a list of the available datasets:
shodan data list
To get a list of files for a dataset:
shodan data list --dataset=banners-daily
And then to download a specific file within a dataset:
shodan data download internetdb internetdb.sqlite.bz2
However, the shodan data download
command downloads the data using a single connection which will be significantly slower than using a tool such as aria2c
. Below is the equivalentaria2c
command to download the InternetDB SQLite file using 4 concurrent connections:
aria2c -x 4 -s 4 -o internetdb.sqlite.bz2 https://f001.backblazeb2.com/b2ap...
Useful Links
- Datapedia: https://datapedia.shodan.io/
- aria2c: https://aria2.github.io/
- Developer documentation: https://developer.shodan.io/api
- Postman collection for REST API: https://www.postman.com/shodanhq/workspace/shodan/folder/5677612-ed460277-6845-4a40-9f5e-ba803cfa9f74
- Shodan CLI and Python library: https://github.com/achillean/shodan-python