Skip to content

Real-Time Firehose

The Shodan Firehose is a real-time data feed containing all the information that the Shodan crawlers are collecting. The stream is provided as a web service – once a client connects to the Streaming API it starts receiving JSON-encoded banners until it disconnects from the server. The individual banners are separated by newlines which means you can process the firehose line-by-line and each one contains a banner. The latest information about the Streaming API is available from our developer website:

On average, 1,200-1,500 banners per second flow through the firehose. The exact throughput can vary as we complete specialized data collection, such as the domain crawl, during the month. And when a new port/ service is added we usually perform an immediate crawl which can result in a throughput of up to 1,800 banners per second.

Streaming Queries

The /shodan/banners method is an unfiltered feed of all the data that is collected. That is the method you should consume if you want to build an on-premise copy of the Shodan database. However, there are other methods that contain a filtered feed which can be helpful depending on your use cases. For example, the Streaming API lets you create a data feed of search results using the /shodan/custom method. Instead of running a search query every day to ask for new results you can stay connected to the /shodan/custom API endpoint and Shodan will send you any banners that meet the search criteria.

There is a difference though between the search query syntax of the REST API/ website and of the /shodan/custom Streaming API method: streaming queries are case-sensitive. Otherwise you should be able to take your existing search query, plug it into the Streaming API and get a real-time data feed.

For example, the following command subscribes to industrial control systems that are located in Germany, Switzerland or France:

Terminal window
shodan stream --custom-filters "tag:ics country:CH,DE,FR"

Quickstart

The Shodan CLI supports all methods of the Streaming API. To get started, simply run the command:

Terminal window
shodan stream

If the CLI is correctly configured and your API key has access to the firehose then you should see an endless stream of data flowing:

The equivalent Python code looks like this:

import shodan
# Find your API key on https://account.shodan.io
API_KEY = ''
# Setup the API wrapper
api = shodan.Shodan(API_KEY)
# Start subscribing to the full firehose
for banner in api.stream.banners():
# We are going to simply print the banner to the terminal.
# In your application you will submit the banner to your streaming data pipeline.
print(banner)

Examples

Tracking Compromised Machines

The following script can be used to track which IPs running VMware ESXi have been compromised and it extracts the ransomware Bitcoin address from the HTML:

#!/usr/bin/env python
from shodan import Shodan
from shodan.cli.helpers import get_api_key
from shodan.helpers import get_ip
# If your machine doesn't have the CLI configured then just enter your API key direct
# below as a string ala:
# api = Shodan("MY API KEY")
api = Shodan(get_api_key())
while True:
try:
for banner in api.stream.custom('http.title:"How to Restore Your Files"'):
ip = get_ip(banner)
# Find the BTC wallet
start = banner['http']['html'].find('bitcoins to the wallet <b>')
end = banner['http']['html'].find('</b>', start)
wallet = banner['http']['html'][start + len('bitcoins to the wallet <b>') : end]
print(f'{ip}:{banner["port"]}\t Bitcoin address: {wallet}')
except Exception:
pass

Monitor the Vulnerabilities for a Country

The firehose can be used to identify issues across a large IP space. The following script was written to help CERTs keep track of IPs that are exposing a confirmed vulnerable service:

from shodan import Shodan
from shodan.helpers import get_ip
API_KEY = ''
COUNTRY = 'US'
def main():
api = Shodan(API_KEY)
# Get a list of the verified vulns that Shodan is currently testing
print('# Grabbing list of verified vulnerabilities...')
response = api.count('net:0/0', facets=[('vuln.verified', 100)])['facets']['vuln.verified']
verified_vulns = set([item['value'] for item in response])
while True:
try:
print('# Subscribing to firehose...')
for banner in api.stream.vulns(list(verified_vulns)):
# Ignore results that aren't within our desired country
if 'location' not in banner or banner['location']['country_code'] != COUNTRY:
continue
# Grab the fields we care about in the CSV file
ip = get_ip(banner)
port = banner['port']
country = banner['location']['country_code']
org = banner.get('org', '')
# Remove unverified vulnerabilities
all_vulns = set(banner['vulns'].keys())
critical_vulns = verified_vulns.intersection(all_vulns)
vulns_str = ';'.join(critical_vulns)
print(u'{},{},{},{},{}'.format(ip, port, country, org, vulns_str))
except Exception as e:
print('# Error: {}'.format(e))
return 0
if __name__ == '__main__':
import sys
sys.exit(main())

Frequently Asked Questions

  1. How do I know if I’m getting all the data?
    We offer the optional debug query parameter that makes the firehose send a special event with information about the number of banners that were dropped:

    { "event": "debug", "discarded": 41 }

    If you’re consistently seeing fewer than 800 banners per second then please reach out to Shodan support.
  2. Is it possible to replay data after a network disruption?
    No, the firehose doesn’t currently support the ability to replay data from a specific date/ time. Please download the bulk data file for that date to fill any gaps. Every banner has a unique _shodan.id property which can be used to dedupe.