What is OSINT reconnaissance?

OSINT (Open Source Intelligence) reconnaissance is the systematic collection and analysis of publicly available information about a target — including DNS records, certificate logs, social media, code repositories, and infrastructure scanning services like Shodan — to map attack surface before any direct engagement with target systems.

Querying DNS records, WHOIS, certificate transparency logs, Shodan, and public social media profiles is generally permissible without authorization. Active DNS brute-forcing, HTTP probing that generates target traffic, and network scanning with Nmap require explicit authorization. Accessing systems or using credentials found during OSINT is illegal regardless of intent.

What is Shodan used for in security research?

Shodan continuously scans the entire IPv4 address space and indexes service banners, TLS certificates, and metadata from internet-connected devices. Security researchers use it to identify exposed admin panels, outdated software versions, misconfigured databases, and systems affected by specific CVEs within a target organization's IP space — all without generating any traffic to the target.

How do attackers find exposed credentials on GitHub?

Attackers search GitHub for files like .env, docker-compose.yml, terraform.tfvars, and config files containing keywords like password, secret, key, and token using Google dorks and tools like truffleHog. Automated secret scanners can verify whether found credentials are still active, producing a list of exploitable credentials from public code.

What is certificate transparency and how is it used in OSINT?

Certificate transparency requires every trusted CA to log all issued TLS certificates to public logs. Because every subdomain that has ever had an HTTPS certificate is logged, querying crt.sh with a wildcard search like %.example.com reveals the complete historical subdomain inventory of a target with zero network traffic directed at that target.

OSINT Reconnaissance: Tools and Techniques

Open Source Intelligence is how every serious offensive engagement begins. Before you write a single payload, before you send a single request to a target system, you spend time learning everything that's publicly accessible. What servers are exposed. What software versions are running. What employees work there and what tools they use. What subdomains exist that the security team may have forgotten about. What credentials have appeared in previous breaches.

The intelligence gathered in this phase determines the quality of everything that follows. Engagements where recon is rushed produce generic findings that miss the high-value targets. Engagements where recon is thorough produce the account-takeover chains and critical-infrastructure findings that appear in breach postmortems.

This guide covers professional-level OSINT methodology: domain intelligence, infrastructure mapping, people reconnaissance, credential exposure analysis, and the legal framework that keeps you operating on the right side of the law.

Find your tool

Looking for the right recon, enumeration, or OSINT tool? Browse 3,700+ vetted OSINT & security tools at tools.pwnsy.com.

The Intelligence Cycle Applied to OSINT

OSINT isn't random Googling. It follows a structured process:

Direction — define precisely what you need to know. "Investigate example.com" is too vague. "Find all external infrastructure, employee email addresses, and evidence of credential exposure for example.com and its subsidiaries" is a target.
Collection — systematic data gathering from primary and secondary sources. Primary sources are directly accessible data (DNS records, certificate logs, WHOIS). Secondary sources are aggregated data (Shodan, Censys, breach databases).
Processing — normalize, deduplicate, and structure raw data. A list of 50,000 subdomains is noise. A filtered, resolved, probed list of 200 live hosts is intelligence.
Analysis — draw conclusions and identify attack paths. An exposed Kubernetes dashboard on a non-production subdomain is interesting. Credentials from a 2022 breach that match the company's email format are a lead worth pursuing.
Dissemination — document findings with timestamps. OSINT data has a shelf life — open ports get closed, credentials get rotated, subdomains change. Capture everything with the date and time it was observed.

Domain Reconnaissance

WHOIS and Registration Data

WHOIS records expose registrant details, registration history, name servers, and registrar information. Privacy protection services hide most modern registrations, but corporate domains, legacy registrations, and careless small businesses often still expose useful data.

# Basic WHOIS lookup
whois example.com
whois -h whois.arin.net 93.184.216.34  # IP block owner
 
# Historical WHOIS — see registration history before privacy services
# whoxy.com, DomainTools, SecurityTrails all maintain historical data
# These often expose the registrant email before they switched to privacy protection

What to extract from WHOIS:

Registrant email (pivot to breach databases, other registrations, LinkedIn)
Name servers (reveal hosting infrastructure, CDN usage, secondary DNS providers)
Registration date (older domains often have weaker security posture)
Registrar abuse contact (useful for legitimate disclosure)

Cross-reference registrant emails against other domain registrations using DomainTools reverse WHOIS:

registrant_email:"[email protected]" — shows all domains registered with that email

This frequently reveals related companies, subsidiary brands, development environments registered under personal emails, and testing domains with weaker security.

DNS Enumeration

DNS is an intelligence goldmine. The record types themselves reveal infrastructure choices:

# Retrieve all available record types
dig example.com ANY +noall +answer
 
# MX records — mail server infrastructure
dig example.com MX +short
# If you see: mail.example.com, this is self-hosted
# If you see: *.protection.outlook.com, they use Microsoft 365
# If you see: *.google.com, they use Google Workspace
# The email provider is now known — focus phishing appropriately
 
# SPF record — reveals all services authorized to send email
dig example.com TXT | grep spf
# "v=spf1 include:mailchimp.com include:sendgrid.net include:_spf.google.com ~all"
# Now you know: Mailchimp (email marketing), Sendgrid (transactional email), Google Workspace
 
# DMARC policy — tells you how aggressively they enforce email authentication
dig _dmarc.example.com TXT
# "v=DMARC1; p=reject; rua=mailto:[email protected]"
# p=reject: strict enforcement
# p=none: no enforcement — email spoofing may work for phishing
 
# Zone transfer attempt (rarely succeeds but always worth trying)
# A successful zone transfer reveals the entire DNS namespace at once
dig axfr @ns1.example.com example.com
dig axfr @ns2.example.com example.com
 
# DNSSEC validation
dig example.com DNSKEY
# Lack of DNSSEC means DNS cache poisoning is theoretically possible
 
# Check for wildcard DNS (affects subdomain enumeration validity)
dig nonexistentsubdomain123.example.com A
# If this resolves, wildcard DNS is configured — all discovered subdomains may be false positives

Certificate Transparency Logs

Every TLS certificate issued by a trusted CA is logged in public Certificate Transparency logs. This means every subdomain that has ever had an HTTPS certificate can be discovered without touching the target — zero network traffic to the target, permanent historical record.

# crt.sh — query certificate transparency logs
curl -s "https://crt.sh/?q=%.example.com&output=json" | \
  jq -r '.[].name_value' | \
  sed 's/\*\.//g' | \
  sort -u > ct_subdomains.txt
 
# Look for:
# - subdomain patterns revealing internal naming conventions (dev-, stage-, internal-, admin-)
# - Recently issued certs on previously unknown subdomains (new attack surface)
# - Wildcard certificates (*.example.com) — confirms wildcard DNS
 
# Facebook's CT log search (different database, different results)
# https://developers.facebook.com/tools/ct/
 
# crtfinder — automates CT log discovery with additional filtering
crtfinder -d example.com -o ct_results.txt
 
# Check for SAN (Subject Alternative Name) domains in cert
openssl s_client -connect example.com:443 2>/dev/null | \
  openssl x509 -noout -text | grep -A1 "Subject Alternative Name"
# Reveals all domains on the same certificate — often exposes subsidiaries and internal hostnames

Subdomain Enumeration at Scale

# Phase 1: Passive collection (no traffic to target)
subfinder -d example.com -all -recursive -o subfinder_results.txt
# -all: uses all configured sources (requires API keys for best results)
# -recursive: also enumerate subdomains of discovered subdomains
 
amass enum --passive -d example.com -o amass_passive.txt
 
theHarvester -d example.com -b all -f harvester_results
 
# Phase 2: Combine all passive results
cat subfinder_results.txt amass_passive.txt ct_subdomains.txt | sort -u > all_passive.txt
wc -l all_passive.txt  # Typical range: 100-10,000 depending on target size
 
# Phase 3: DNS resolution — filter to actually-resolving subdomains
# puredns with a large public resolver list
puredns resolve all_passive.txt \
  -r ~/resolvers.txt \  # list of 10,000+ public resolvers
  --resolvers-trusted ~/resolvers-trusted.txt \
  -w resolved_subs.txt
 
# Phase 4: Active brute-force (only with explicit authorization)
puredns bruteforce /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt \
  example.com \
  -r ~/resolvers.txt \
  -w brute_results.txt
 
# Phase 5: Permutation generation
# Common patterns: api-v2, dev-api, internal-app, staging-api
# gotator generates these automatically from known subdomains
gotator -sub resolved_subs.txt \
  -perm permutations_list.txt \
  -depth 2 \
  -numbers 5 \
  -md \
  -prefixes | \
  puredns resolve -r ~/resolvers.txt > permuted_results.txt
 
# Merge all discovered subdomains
cat resolved_subs.txt brute_results.txt permuted_results.txt | sort -u > all_subdomains.txt

Live Host Probing and Fingerprinting

# httpx — HTTP probe with rich metadata
httpx -l all_subdomains.txt \
  -ports 80,443,8080,8443,8888,9000,3000,5000,4000,8000,9090,9443 \
  -status-code \
  -title \
  -tech-detect \
  -content-length \
  -follow-redirects \
  -threads 50 \
  -o live_hosts_full.txt
 
# Extract interesting targets from live host data
cat live_hosts_full.txt | grep -i "admin"    # Admin panels
cat live_hosts_full.txt | grep -i "jenkins"  # CI/CD systems
cat live_hosts_full.txt | grep -i "grafana"  # Monitoring dashboards
cat live_hosts_full.txt | grep -i "kibana"   # Log management
cat live_hosts_full.txt | grep "401\|403"    # Possibly restricted but accessible
cat live_hosts_full.txt | grep "200" | grep -v "Login\|Sign In"  # Accessible without auth

Infrastructure OSINT: Shodan, Censys, and FOFA

Shodan

Shodan continuously scans the entire IPv4 address space and indexes banners, certificates, and service metadata. It answers the question: "What internet-connected services does this organization expose, and what versions are they running?"

# Install Shodan CLI
pip install shodan
shodan init YOUR_API_KEY
 
# Search by organization name
shodan search "org:\"Example Corporation\""
 
# Search by ASN
shodan search "asn:AS12345"
 
# Search for specific software in an organization's IP space
shodan search "org:\"Example Corporation\" product:\"Apache httpd\""
shodan search "org:\"Example Corporation\" http.title:\"Dashboard\""
shodan search "org:\"Example Corporation\" Jenkins"
 
# Find hosts with known CVEs (requires enterprise Shodan plan)
shodan search "vuln:CVE-2021-44228 org:\"Example Corporation\""
shodan search "vuln:CVE-2023-44487 org:\"Example Corporation\""  # HTTP/2 Rapid Reset
 
# Domain-based search (finds all IPs serving certs for that domain)
shodan search "ssl.cert.subject.cn:example.com"
 
# Full host information
shodan host 93.184.216.34  # returns all open ports and banner data
 
# Download and parse results
shodan download --limit 1000 example_org "org:\"Example Corporation\""
shodan parse --fields ip_str,port,transport,product,version example_org.json.gz
 
# Search for exposed admin interfaces
shodan search "org:\"Example Corporation\" title:\"Admin\""
shodan search "org:\"Example Corporation\" title:\"Kubernetes Dashboard\""
shodan search "org:\"Example Corporation\" title:\"Grafana\""
shodan search "org:\"Example Corporation\" title:\"Kibana\""
shodan search "org:\"Example Corporation\" title:\"phpMyAdmin\""

High-value Shodan filters:

| Filter | What It Finds | |---|---| | ssl.cert.subject.cn:example.com | IP infrastructure behind CDN | | http.title:"Index of /" | Directory listing enabled | | product:"Elasticsearch" | Potentially unauthenticated database | | http.html:"X-Powered-By: PHP/5" | Outdated PHP versions | | vuln:CVE-XXXX-XXXX | Systems with specific CVEs (Shodan Enterprise) | | has_screenshot:true | Hosts with screenshots (visual scanning) | | country:"US" port:5432 | Exposed PostgreSQL worldwide | | org:X os:"Windows Server 2008" | End-of-life systems |

Censys

Censys offers complementary coverage to Shodan with stronger certificate enumeration and a more structured query language. Run both — their crawling schedules differ and they often find different hosts.

# Censys CLI
pip install censys
export CENSYS_API_ID=your_api_id
export CENSYS_API_SECRET=your_secret
 
# Search for hosts serving certificates for a domain
censys search "services.tls.certificate.parsed.subject.common_name: example.com" \
  --index-type hosts \
  --fields ip,services.port,services.service_name
 
# Find hosting infrastructure behind a CDN
# CDNs serve their own certs on edge nodes, but the origin server often has its own cert
# Search for certs issued before the CDN deployment date
censys search "services.tls.certificate.parsed.subject.common_name: example.com AND \
  services.tls.certificate.parsed.validity.start: [2020-01-01 TO 2022-01-01]"
 
# Search Censys in-browser at search.censys.io
# Query: parsed.subject.common_name: example.com AND parsed.issuer.organization: "Let's Encrypt"
# This finds Let's Encrypt certs for a domain — origin servers, APIs, internal services

Finding Origin IPs Behind CDN

A common CDN misconfiguration: the origin server's IP is discoverable through Shodan/Censys despite being "hidden" behind Cloudflare, Fastly, or AWS CloudFront.

# Method 1: Certificate history search
# The origin server may have had a cert before the CDN was deployed
# Search Censys or SecurityTrails for historical DNS and cert data
 
# Method 2: Find subdomains not behind CDN
# Many targets protect their main domain with CDN but leave API/admin subdomains exposed
# api.example.com, direct.example.com, origin.example.com, mail.example.com
 
# Method 3: SPF record reveals origin IP
dig example.com TXT | grep spf
# "v=spf1 ip4:203.0.113.1 include:mailchimp.com -all"
# 203.0.113.1 is likely the origin mail server, possibly also the web origin
 
# Method 4: Previous DNS history
# SecurityTrails, WhoisXML API, and Passive Total maintain historical DNS records
# The IP before Cloudflare was configured is often still the origin IP
# Test: curl -H "Host: example.com" http://203.0.113.1/

People OSINT

LinkedIn Intelligence

LinkedIn is the most data-rich corporate intelligence source available. For a target organization, you can enumerate:

Manual intelligence gathering from LinkedIn:
- Total employee count and growth rate (headcount indicates company size)
- Organizational hierarchy (who reports to whom)
- Technical stack from engineer profiles ("Working with Kubernetes, Terraform, Golang")
- Technologies in job postings ("2+ years experience with HashiCorp Vault preferred")
- Recent hires (new CISO = security priorities changing; new DevOps hires = cloud migration)
- Former employees (may retain VPN credentials, know internal systems)
- Contractors and consultants (often have broader access with weaker security controls)

Google dorks for LinkedIn intelligence:

site:linkedin.com/in/ "example.com" "software engineer"
site:linkedin.com/in/ "example.com" "security" "CISO OR director"
site:linkedin.com/jobs/ "example.com" "aws" "kubernetes" "terraform"

Email Format Identification

Once you have employee names from LinkedIn, you need the email format to generate valid addresses.

# Hunter.io — finds email format and lists known addresses
# CLI version:
curl "https://api.hunter.io/v2/domain-search?domain=example.com&api_key=YOUR_KEY" | \
  jq '.data.pattern'
# Returns the pattern, e.g., "first.last" or "first_initial+last"
 
# Use the pattern to generate email addresses for all employees found on LinkedIn:
# John Smith → [email protected] (first.last pattern)
# John Smith → [email protected] (first_initial+last pattern)
 
# Verify emails without sending anything:
# MX record lookup confirms domain has mail servers
# SMTP VRFY command (most servers disable this)
# haveibeenpwned.com API (checks if email appeared in a breach — confirms existence)

Username Enumeration

A single username bridges platforms. Researchers correlate usernames across social media, forums, code repositories, and breach data.

# Sherlock — checks 400+ platforms simultaneously
git clone https://github.com/sherlock-project/sherlock
python3 sherlock/sherlock.py john.doe
# Returns: GitHub, Twitter, Reddit, HackerNews, etc. where the username exists
 
# WhatsMyName — additional platform coverage
# https://whatsmyname.app
 
# Manual investigation platforms:
# GitHub: github.com/johndoe — check public repos, gists, starred repos
# HackerNews: https://hn.algolia.com/?q=johndoe (search comments and posts)
# Reddit: old.reddit.com/user/johndoe (profile, comment history)
# Keybase: keybase.io/johndoe (may have PGP key, linked accounts)

GitHub and Code Repository Intelligence

Code repositories are consistently the richest source of exposed secrets, internal architecture documentation, and misconfigured access.

# Google dorks for GitHub exposure
# site:github.com "example.com" AND (password OR secret OR key OR token)
# site:github.com "example.com" AND ".env"
# site:github.com "example.com" AND "BEGIN RSA PRIVATE KEY"
 
# GitHub search (use web interface or API)
# In GitHub search:
# org:example-company language:Python db_password
# org:example-company filename:.env
# org:example-company "amazonaws.com/s3" bucket_name
 
# GitDorker — automated GitHub dork search
python3 gitdorker.py -tf EXAMPLE_COMPANY_TOKEN \
  -q "example.com" \
  -d dorks.txt \
  -o github_results.txt
 
# truffleHog — scan GitHub repos for secrets
trufflehog github --org=example-company --concurrency=20 --json | \
  jq 'select(.verified == true)' > verified_secrets.json
 
# gitrob — discover repos and scan for sensitive files
gitrob analyze --access-token GITHUB_TOKEN example-company
 
# Specific file types to search:
# *.env, .env.local, .env.production
# docker-compose.yml (often contains credentials)
# terraform.tfvars (cloud credentials)
# kubernetes/secrets.yaml
# config/database.yml
# application.properties / application.yml
# settings.py (Django — DEBUG mode, SECRET_KEY)

Credential Exposure Analysis

Checking whether employee credentials have appeared in previous breaches is a core OSINT technique for social engineering assessments, phishing simulations, and attack path discovery.

# haveibeenpwned.com API — check if emails appeared in breaches
curl "https://haveibeenpwned.com/api/v3/breachedaccount/[email protected]" \
  -H "hibp-api-key: YOUR_API_KEY" | jq '.[].Name'
 
# Bulk email check for corporate accounts
# Get employee email list from Hunter.io / LinkedIn
# Check each against HIBP
for email in $(cat employee_emails.txt); do
  result=$(curl -s "https://haveibeenpwned.com/api/v3/breachedaccount/$email" \
    -H "hibp-api-key: YOUR_KEY" -H "User-Agent: OrgRecon/1.0")
  if [ "$result" != "[]" ]; then
    echo "$email: $result"
  fi
  sleep 1.5  # Respect HIBP rate limits
done
 
# Domain-level breach check
curl "https://haveibeenpwned.com/api/v3/breacheddomain/example.com" \
  -H "hibp-api-key: YOUR_API_KEY"
# Returns a list of all breaches where @example.com addresses appeared
 
# DeHashed — commercial breach data search (more comprehensive, requires subscription)
# Returns actual plaintext/hashed passwords from breach databases
# Useful for: demonstrating that specific employee passwords are exposed

What to do with breach data:

On an authorized red team engagement, breach data becomes:

Password spray input (test the most common passwords from the breach against corporate SSO)
Credential stuffing basis (test recovered plaintext credentials against VPN, Outlook Web Access)
Social engineering context (reference the breach in a phishing pretext to build credibility)

Warning

Credential stuffing against systems you don't have explicit authorization to test is illegal, even during an authorized engagement unless the Rules of Engagement explicitly permit it. Always confirm with the client what credential-based testing is permitted before attempting any form of password spray or credential stuffing.

Infrastructure Scanning and Service Fingerprinting

Network Mapping with Shodan vs. Active Nmap

| | Shodan/Censys (Passive) | Nmap (Active) | |---|---|---| | Traffic to target | None | Yes — appears in firewall logs | | Data freshness | Hours to weeks old | Real-time | | Coverage | IPv4 space, indexed ports | Only ports you specify | | Stealth | Complete | Detectable | | Speed | Instant | Minutes to hours | | Auth required | API key | None (but authorization required) |

Use Shodan/Censys first (passive, no target traffic) to understand the attack surface, then verify with Nmap only on systems you're authorized to actively scan.

# Nmap — comprehensive service fingerprinting (authorized targets only)
# Full TCP port scan with version detection
nmap -sS -sV -sC -p- -T4 --min-rate 1000 -oA full_scan TARGET_IP
 
# Aggressive fingerprinting on specific ports
nmap -sV -sC -p 80,443,8080,8443 --script "http-*" -oA http_scan TARGET_IP
 
# UDP services (often missed)
nmap -sU -sV -p 53,123,161,500,1900,5353 TARGET_IP
 
# SMB enumeration
nmap -p 445 --script smb-enum-shares,smb-enum-users,smb-security-mode TARGET_IP
 
# Service-specific scripts
nmap -p 5432 --script pgsql-brute TARGET_IP   # PostgreSQL
nmap -p 6379 --script redis-info TARGET_IP     # Redis
nmap -p 9200 --script http-elasticsearch-info TARGET_IP  # Elasticsearch
nmap -p 27017 --script mongodb-info TARGET_IP  # MongoDB

Exposed Internal Services

The most common high-value findings from infrastructure OSINT:

# Redis — often no authentication, full read/write access
redis-cli -h TARGET_IP ping
redis-cli -h TARGET_IP KEYS "*"   # list all keys
redis-cli -h TARGET_IP GET "session:user:1234"  # read session data
 
# Elasticsearch — frequently no authentication
curl http://TARGET_IP:9200/_cat/indices?v   # list all indices
curl http://TARGET_IP:9200/users/_search    # query user data
 
# MongoDB — no auth common in development environments
mongosh --host TARGET_IP --port 27017
> show dbs
> use production
> db.users.findOne()
 
# Kubernetes API — exposed dashboard or API
curl https://TARGET_IP:6443/api/v1/pods     # list pods
curl https://TARGET_IP:6443/api/v1/secrets  # list secrets (if unauthenticated access)
# Kubernetes dashboard exposed on :8001 or :30000+ with no auth = critical finding
 
# Jupyter Notebook — often exposed with code execution
curl http://TARGET_IP:8888/api/kernels  # if returns kernel list, execution is possible

OSINT Frameworks and Tools

Maltego

Maltego is the standard tool for visualizing and pivoting through OSINT data. It uses "transforms" to automatically convert one data type to another — domain to IP, IP to ASN, email to social profiles — and displays relationships as a graph.

The community (free) edition is limited but functional for learning. Maltego CE allows transforms against public sources including WHOIS, DNS, Shodan, and Pipl.

Useful Maltego transform sequences:

Domain → DNS Name → IP Address → Netblock → Organization
Email Address → Person → Social Network Profile → Related Emails
Organization → Domain → All Subdomains → Live Web Servers → Technologies

SpiderFoot

SpiderFoot automates collection across 200+ data sources and is excellent for comprehensive, unattended reconnaissance:

# SpiderFoot command line
spiderfoot -s example.com \
  -t INTERNET_NAME \
  -m sfp_whois,sfp_dns,sfp_cert,sfp_shodan,sfp_hackertarget \
  -f json \
  -o results.json
 
# SpiderFoot web interface (more feature-rich)
python3 sf.py -l 127.0.0.1:5001
# Browse to http://127.0.0.1:5001
# New Scan → Enter domain → Select modules → Start
 
# Useful SpiderFoot modules:
# sfp_shodan: Shodan search for target IP space
# sfp_hunter: Hunter.io email enumeration
# sfp_dnscommonsrv: common service subdomain bruteforce
# sfp_cert: certificate transparency logs
# sfp_leakedcredentials: breach database checks
# sfp_linkedinmatch: LinkedIn profile discovery
# sfp_gitreposearcher: GitHub code search

OSINT Framework

OSINT Framework (osintframework.com) is a categorized tree of OSINT tools and techniques. It's not a tool itself — it's a structured directory. Useful for:

Finding tools in specific categories you're less familiar with
Identifying sources you haven't checked for a particular data type
Teaching OSINT methodology to new analysts

Legal and Ethical Boundaries

OSINT operates on publicly available data, but the line between passive intelligence gathering and active reconnaissance — and between legal and illegal — is important to understand.

Generally permissible without authorization:

Querying DNS records, WHOIS, and certificate transparency logs
Searching indexed web content (Google, Bing, DuckDuckGo)
Using Shodan and Censys to query their existing indexes
Reviewing public social media profiles and posts
Searching GitHub for publicly committed code
Checking breach databases for your own organization's exposure

Requires explicit authorization:

Active DNS brute-forcing against a target's name servers
Subdomain enumeration via HTTP probing that generates target traffic
Automated scraping of platforms that prohibit it in their ToS (LinkedIn explicitly prohibits automated scraping)
Network scanning with Nmap or similar tools

Clearly illegal (regardless of intent):

Accessing systems or accounts using credentials found during OSINT
Exploiting discovered vulnerabilities without written authorization
Bypassing authentication to view data (even if the account appears abandoned)
Social engineering individuals without explicit scope coverage in an authorized engagement

GDPR/CCPA considerations: If your OSINT surfaces personal data (home addresses, personal phone numbers, medical information), handle it appropriately. Don't aggregate, store, or republish personal data beyond what the engagement requires. Document what was found, inform the client, and don't dig further into personal information that's outside your engagement scope.

Warning

Bug bounty program scopes frequently exclude automated scanning. "Active recon" and "automated tools" are commonly listed as out-of-scope. Read the scope carefully. Passive OSINT (crt.sh, Shodan, WHOIS) is almost universally permitted. Active DNS brute-forcing and HTTP probing often are not without explicit permission. When in doubt, ask the program before testing.

Building a Repeatable Recon Workflow

The difference between researchers who find consistent results and those who don't is a repeatable, documented methodology. Every step should be reproducible.

#!/bin/bash
# Professional OSINT workflow for authorized engagements
 
TARGET_DOMAIN=$1
CLIENT=$2
DATE=$(date +%Y%m%d)
OUTPUT="recon/${CLIENT}-${DATE}"
 
mkdir -p "$OUTPUT"/{dns,subs,infra,people,web}
 
echo "=== PHASE 1: DNS INTELLIGENCE ==="
whois "$TARGET_DOMAIN" > "$OUTPUT/dns/whois.txt"
dig "$TARGET_DOMAIN" ANY +noall +answer > "$OUTPUT/dns/dns_records.txt"
dig "$TARGET_DOMAIN" MX +short >> "$OUTPUT/dns/dns_records.txt"
dig "_dmarc.$TARGET_DOMAIN" TXT +short > "$OUTPUT/dns/dmarc.txt"
 
echo "=== PHASE 2: CERTIFICATE TRANSPARENCY ==="
curl -s "https://crt.sh/?q=%.${TARGET_DOMAIN}&output=json" | \
  jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u > "$OUTPUT/subs/ct_subs.txt"
 
echo "=== PHASE 3: PASSIVE SUBDOMAIN ENUMERATION ==="
subfinder -d "$TARGET_DOMAIN" -all -silent -o "$OUTPUT/subs/subfinder.txt"
amass enum --passive -d "$TARGET_DOMAIN" -o "$OUTPUT/subs/amass.txt"
 
cat "$OUTPUT/subs/"*.txt | sort -u > "$OUTPUT/subs/all_passive.txt"
echo "Passive subdomains found: $(wc -l < $OUTPUT/subs/all_passive.txt)"
 
echo "=== PHASE 4: INFRASTRUCTURE MAPPING ==="
# Shodan search (requires API key)
shodan search "org:\"$CLIENT\"" --fields ip_str,port,transport,product \
  > "$OUTPUT/infra/shodan_results.txt"
 
echo "=== PHASE 5: LIVE HOST PROBING ==="
puredns resolve "$OUTPUT/subs/all_passive.txt" -r ~/resolvers.txt \
  -w "$OUTPUT/subs/resolved.txt" --quiet
 
httpx -l "$OUTPUT/subs/resolved.txt" \
  -ports 80,443,8080,8443,3000,5000 \
  -status-code -title -tech-detect \
  -o "$OUTPUT/web/live_hosts.txt" --silent
 
echo "=== PHASE 6: EMAIL ENUMERATION ==="
theHarvester -d "$TARGET_DOMAIN" -b all -f "$OUTPUT/people/harvester"
 
echo "=== COMPLETE ==="
echo "Results in: $OUTPUT/"
echo "Live hosts: $(wc -l < $OUTPUT/web/live_hosts.txt)"
echo "Subdomains resolved: $(wc -l < $OUTPUT/subs/resolved.txt)"

Documentation Standard

Document every finding with:

Timestamp — OSINT data ages; what was true today may not be tomorrow
Source — which tool or platform produced this finding
Raw data — a screenshot or copy of the original data, not a summary
Interpretation — what this finding implies and what leads it suggests

The goal of OSINT is not to collect data — it's to develop actionable intelligence. Every piece of data you gather should either answer a specific question or generate a new, more specific question to pursue. Work the intelligence cycle, not the tool list.

Tool we use

Most attacks above start with data already sitting on people-search and broker sites. MyDataRemoval wipes your phone, email, and home address from 200+ broker sites and keeps re-checking every month. Affiliate link — we may earn a commission at no cost to you.

OSINT Reconnaissance: Tools and Techniques

The Intelligence Cycle Applied to OSINT

Domain Reconnaissance

WHOIS and Registration Data

DNS Enumeration

Certificate Transparency Logs

Subdomain Enumeration at Scale

Live Host Probing and Fingerprinting

Infrastructure OSINT: Shodan, Censys, and FOFA

Shodan

Censys

Finding Origin IPs Behind CDN

People OSINT

LinkedIn Intelligence

Email Format Identification

Username Enumeration

GitHub and Code Repository Intelligence

Credential Exposure Analysis

Infrastructure Scanning and Service Fingerprinting

Network Mapping with Shodan vs. Active Nmap

Exposed Internal Services

OSINT Frameworks and Tools

Maltego

SpiderFoot

OSINT Framework

Legal and Ethical Boundaries

Building a Repeatable Recon Workflow

Documentation Standard

Sources & further reading

Related Posts

Best Personal Data Removal Services in 2026 (Remove Yourself from Data Brokers)

Bug Bounty Hunting: Getting Started

The Social Engineering Playbook