Comprehensive Guide to Information Gathering Beyond the Internet

From a personal perspective, I will briefly summarize the information gathering before the point. In essence, most of the content is similar; when encountering a solid wall, there is no need to struggle, after all, all roads lead to Rome. (Experts can also share their views~ Learning from all the experts!!) 

Tell Me

To build a “God’s view”, preliminary information gathering is essential, whether to understand a person, a business, or to delve into a system, all require “information”. Experts once said that the essence of penetration testing is information gathering, and my personal intuitive feeling is “unexpected yet reasonable”.

This article will briefly summarize the matters concerning information gathering before the point. In essence, most of the content is similar; when encountering a solid wall, there is no need to struggle, after all, all roads lead to Rome.

Red Team Knowledge Points General Process:

External Information Gathering -> Pointing -> Privilege Maintenance -> Privilege Escalation -> Internal Information Gathering -> Lateral Movement -> Trace Cleaning

Key Asset Data Information for Pointing:

  • Identify the real IP of the website, explore adjacent segments, bypass security devices

  • Determine whether the target is a honeypot

  • Locate internal IP and system

  • Locate key application systems

  • Locate key enterprise information

External Information Gathering

For external information gathering, there are mainly a few points: IP, domain, enterprise asset information, and corresponding ports/services, fingerprints, sensitive information, social engineering collisions, and other easily attacked surface information.

Tips:

  • Use a proxy pool, balance the load to prevent IP lock.

  • After confirming the target, analyze and find breakthrough and exploit points.

  • Regarding concealment during pointing:

    Before the attack: use a virtual machine as a global proxy, browser privacy mode or Tor, scripts, accounts, etc. not belonging to oneself

    After the attack: Rootkit……

0x00 Common Tools and Resources Brief:

Tools, Websites Remarks
Site Analysis, Webmaster Tools Whois, Record Number, Weight, Company Name, etc.
Tianyancha, Qichacha, Sogou Search Engine Company registered domain name, WeChat official account, APP, software copyright, etc.
ZoomEye, Shodan, FOFA, 0.Zone, Quake Network space asset search engine
ENScanGo, ICP Record Main domain collection
OneForAll, Layer, Rapid7’s open data project, ctfr, EyeWitness Subdomain collection
Kscan, ShuiZe_0x727, ARL Lighthouse, Goby Automation, batch information collection
Bufferfly, Ehole Asset processing, information screening
dnsdb, CloudFlair CDN-related
VirusTotal, Weibu, ip2domain C segment, domain/IP intelligence information
Nmap, Masscan Port service information
Google Hacking, dirsearch, URLFinder WEB site information, API interfaces, etc.
wafw00f waf recognition
Yunxi, Tide, WhatWeb Online CMS recognition
Qimai, Xiaoluanben APP assets
ApkAnalyser App sensitive information
Wuyun Vulnerability Library, CNVD, waybackurls Historical vulnerabilities, historical assets, etc.
n0tr00t/Sreg, reg007 Personal privacy information
GitDorker Asset information, source code leaks
theHarvester, Snov.io Email information collection
OSINT open-source intelligence and reconnaissance tools Open-source intelligence resource navigation
anti-honeypot, Honeypot Hunter Honeypot recognition

0x01 Main Domain Information

The domain name is used to replace the IP address to make it easier for users to find and remember.

For “non-users”, domain information can be obtained: main domain, live sites, related information, phishing information. Provides data support for vulnerability mining.

1. ICP Record

Domestic servers must first handle ICP records before they can go online.

Comprehensive Guide to Information Gathering Beyond the Internet

Record Information Query:

  • ICP Record Query

https://beian.miit.gov.cn/#/Integrated/index

Comprehensive Guide to Information Gathering Beyond the Internet

  • Public Security Bureau Record Query

Comprehensive Guide to Information Gathering Beyond the Internet

Record Reverse Lookup Main Domain

Reverse lookup can be divided into record domain name query and unrecorded domain name query.

Record Domain Name Query

  • Third-party sites

    • ICP Record Query – Webmaster Tools

    • SEO Comprehensive Query

Comprehensive Guide to Information Gathering Beyond the Internet

  • Public Security Bureau Record Query

  • Enterprise Information Query Website

Unrecorded Domain Name Query

  • Known website information acquisition (for example, website site navigation may contain unrecorded sites)

Comprehensive Guide to Information Gathering Beyond the Internet

  • Network space engine to conduct certificate and icon association search. For example: fofa designated search rule is_domain=true, which means only return domain names *(I personally feel that foreign shodan and domestic fofa)*.

Comprehensive Guide to Information Gathering Beyond the Internet

2. Whois

Whois is a protocol used to query domain IP and owner information.

Through whois information, key information of the registrant can be obtained. Such as registrar, contact person, contact email, contact phone number, and can also reverse look up the domain by registrant, email, phone number, and can further mine the domain owner’s information through search engines. It can be social engineering or vulnerability mining.

  • Webmaster Home

    http://whois.chinaz.com

Comprehensive Guide to Information Gathering Beyond the Internet

  • Bugscanner

    http://whois.bugscaner.com

  • Foreign BGP

    https://bgp.he.net

  • who.is

    https://who.is/

  • IP138 website

    https://site.ip138.com/

  • Domain Information Query – Tencent Cloud

    https://whois.cloud.tencent.com/

  • ICANN LOOKUP

    https://lookup.icann.org/

  • Dog Query

    https://www.ggcx.com/main/integrated

Comprehensive Guide to Information Gathering Beyond the Internet

ps:

  • Some whois queries have hidden information, which can be queried on other sites.

  • Whois mainly includes registrar, registrant, email, DNS resolution server, registrant contact phone number.

  • Due to GDRP, ICANN requires all domain registrars to protect the privacy information of the domain whois, so the whois information is becoming less and less…… but there will still be some whois domain systems that have old cached data.

3. IP Reverse Lookup

ps:

  • The target may have multiple domain names bound to the same IP; through IP reverse lookup, other domain name information can be obtained. For example, side stations.

    • By obtaining the real IP of the target and performing reverse lookup, the side station is more realistic.

  • Querying sites requires complexity; a single site may have the possibility of not being able to reverse lookup information.

    • Large enterprises may have different site records

    • I personally recommend three different sites for reverse lookup

How to Query

Online Query Website:

  • Same IP site query, same server site query – Webmaster Tools

  • Dnslytics

  • IP or domain query

Comprehensive Guide to Information Gathering Beyond the Internet

Search Engine:

  • shodan

  • bing

  • fofa

4. HOST Collision

During the information gathering process, there are often some invisible assets due to configuration errors or failure to recycle in time. Direct access may result in access restriction issues, as follows:

  • IP access responses are mostly: nginx, 4xx, 500, 503, various unclear Route json prompts, etc.

Comprehensive Guide to Information Gathering Beyond the Internet

Comprehensive Guide to Information Gathering Beyond the Internet

  • Domain resolution leads to internal addresses

  • There is a real server IP, but the internal domain name cannot be found.

The reason is mostly due to middleware restricting IP access, which cannot be accessed directly via IP and must be accessed using the domain name. If there are no domain name records found in the domain resolution records, then the HOST collision technique can be used by binding the domain name and IP. Once matched to the domain binding configuration on the back-end proxy server, you can access the corresponding business system, thereby discovering invisible assets.

Method:

Use the collected target IP, crawler, or custom internal domain name (internal host pool) as a dictionary, and perform collision through scripts. The script will automatically simulate binding IP and host for request interaction. The result can be judged by the title or response size. As long as the dictionary is strong enough, one or two can always be found. During the brute force, it is also best to try TLS, as some hosts will use TLS.

To verify the result, simply modify the local host file to bind host and IP, then observe the access changes.

Automation:

  • Lighthouse

  • Water Ze

  • https://github.com/cckuailong/hostscan

  • https://github.com/fofapro/Hosts_scan

  • https://github.com/smxiazi/host_scan

Comprehensive Guide to Information Gathering Beyond the Internet

5. DNS Shared Records

About DNS

DNS (Domain Name Server) is a server that performs the conversion between domain names and corresponding IP addresses. DNS keeps a table of domain names and corresponding IP addresses to resolve the domain name of the message, that is, it saves the mapping relationship between IP addresses and domain names. The domain name is the name of a computer or computer group on the Internet, used to identify the electronic position of the computer during data transmission (sometimes also refers to geographical location). The domain name consists of a string of names separated by dots, usually containing the organization name, and always includes a two to three-letter suffix to indicate the type of organization or the country or region where the domain is located. It is also because of the existence of DNS that accessing the corresponding service only requires remembering the domain name, without the need to remember an irregular IP address.

  • DNS server port: tcp/udp 53.

  • Common DNS Records:

    Record Type Description
    A Record Points the domain name to an IP address (external address).
    CNAME Record Points the domain name to another domain name, which then provides the IP address (external address).
    MX Record Email exchange record, records an email domain name corresponding to an IP address, sets the mailbox to receive emails.
    NS Record Domain Name Server record, records which domain name server resolves the domain name. For example, it specifies which DNS service provider resolves the subdomain.
    AAAA Record Points the domain name to an IPv6 address.
    SRV Record Identifies that a certain server uses a certain service, commonly seen in Microsoft’s directory management.
    TXT Record Identifies and describes the domain name, the vast majority of TXT records are used for SPF records (anti-spam).
Utilization Value

By querying the main host of the shared DNS server, relevant domain names can be obtained, generally used for self-built DNS servers. If it is a public DNS server, then the query effect will be particularly poor.

Method
  • Check whether the target has a self-built NS server

nslookup -query=ns baidu.com 8.8.8.8

Comprehensive Guide to Information Gathering Beyond the Internet

  • Bring the obtained NS server into https://hackertarget.com/find-shared-dns-servers/ for querying

Comprehensive Guide to Information Gathering Beyond the Internet

6. Google

Directly search for key content related to the target, such as company name, record, special js referenced, etc.

There are many search engines, here is an example with Google:

Comprehensive Guide to Information Gathering Beyond the Internet

7. Configuration Information

Due to information leakage issues, certain configurations or files may store some domain names related to the target, such as subdomains, code hosting platforms, etc. Generally speaking, the stored information is limited and such files should not exist publicly.

Policy file domain name information issues include:

  • crossdomain.xml file

    Comprehensive Guide to Information Gathering Beyond the Internet

    • Usually, the domain name is directly concatenated with the crossdomain.xml path

  • sitemap file

    Common site map files include:

    Comprehensive Guide to Information Gathering Beyond the Internet

    • sitemap.xml, sitemap.txt, sitemap.html, sitemapindex.xml, sitemapindex.xml path

In terms of policy configuration:

  • Content Security Policy (CSP)

This is a declarative security mechanism that allows website operators to control the behavior of user agents (usually browsers) that comply with CSP. By controlling which features to enable and where to download content from, it can reduce the attack surface of the website. The main purpose of CSP is to defend against cross-site scripting (XSS) attacks. For example, CSP can completely prohibit inline JavaScript and control where external code is loaded from. It can also prohibit dynamic code execution. By disabling all attack sources, XSS attacks become more difficult. Key words in CSP include default-src, img-src, object-src, and script-src. Among them, *-src may contain domain name information.

Key Point:

The Content-Security-Policy attribute of the HTTP header

Comprehensive Guide to Information Gathering Beyond the Internet

8. Crowdsourcing

Platforms like Bug Bounty, Vulnerability Bank, Prophet, Hackerone, etc., provide domain testing ranges.

For example, Hackerone: Alibaba

Comprehensive Guide to Information Gathering Beyond the Internet

9. Enterprise Asset Information

By expanding the query of the target enterprise’s organizational structure, equity information, equity penetration diagram, subsidiaries, investment information of more than 50%, etc., to obtain its product business, domain name, email asset scope, etc., to expand the attack surface. The enterprise asset collection points are roughly as follows:

  • Enterprise Data:

    • Email Query: https://hunter.io/

    • Email Collection

    • Enterprise Structure Portrait

    • Direct units, institutional settings, suppliers* (related contracts, personnel, systems, software, etc.)*, partners, etc.

    • Business Information: Securities, Express, Dedicated Network, Colleges, etc.

  • Personnel Data, such as statistics, responsibilities, departments, personnel’s historical leaked passwords, browsing habits, etc.

  • Device Information:

    • WiFi

    • Common passwords, departmental device information

    • OA/ERP/CRM/SSO/Mail/VPN entrance

    • Network security devices (waf, ips, ids, router statistics)

    • Internally used code hosting platforms (gitlab, daocloud, etc.), bug management platforms, monitoring platforms, etc.

    • Server domain name assets

    • site:xxx

Equity Investment Information

Generally requires 50% or 100% equity to be considered a testing target.

  1. Tianyancha

https://www.tianyancha.com/

Comprehensive Guide to Information Gathering Beyond the Internet

Comprehensive Guide to Information Gathering Beyond the Internet

  1. Qichacha

https://www.qcc.com/

  1. DingTalk Qidian

https://www.dingtalk.com/qidian/home?spm=a213l2.13146415.4929779444.89.7f157166W6H4YZ

Public Account Information

  1. Sogou Search Engine

https://wx.sogou.com/

Comprehensive Guide to Information Gathering Beyond the Internet

  1. Qichacha

https://www.qcc.com/

Comprehensive Guide to Information Gathering Beyond the Internet

Mini Programs

  1. Qichacha

https://www.qcc.com/

Comprehensive Guide to Information Gathering Beyond the Internet

  1. WeChat App

  2. Alipay App

Application Information

  1. Tianyancha

https://www.tianyancha.com/

Comprehensive Guide to Information Gathering Beyond the Internet

  1. Qimai Data

https://www.qimai.cn/

Comprehensive Guide to Information Gathering Beyond the Internet

  1. Qichacha

https://www.qcc.com/

Comprehensive Guide to Information Gathering Beyond the Internet

  1. Xiaolanben

https://www.xiaolanben.com/pc

Comprehensive Guide to Information Gathering Beyond the Internet

Tools

Continuously updating……

  • ShuiZe Information Collection Automation Tool

https://github.com/0x727/ShuiZe_0x727

Author: Ske

Team: 0x727, will successively open source tools in the future, address: https://github.com/0x727

Positioning: Assist red team personnel in quickly collecting information, mapping target assets, and finding weak points

Language: Developed in Python3

Function: One-stop service, just input the root domain to comprehensively collect related assets and detect vulnerabilities. You can also input multiple domains, C segment IPs, etc., specific cases are shown below.

Invocation: The script borrowed from ksubdomain to brute force subdomains and theHarvester to collect emails, thanks to the authors of ksubdomain and theHarvester

  • ARL Asset Reconnaissance Lighthouse System

https://github.com/TophantTechnology/ARL

Aims to quickly scout internet assets associated with targets and build a basic asset information database. Assists the client’s security team or penetration testers to effectively scout and retrieve assets, discover existing weak points and attack surfaces.

  • ENScanGo

https://github.com/wgpsec/ENScan_GO

A tool specifically written by the Wolf Group Security Team’s Keac Master to solve the problem of corporate information collection, can one-click collect the target and its holding company’s ICP records, APPs, mini-programs, WeChat official accounts, etc., and then aggregate and export.

References

https://xz.aliyun.com/t/11112

https://mp.weixin.qq.com/s/MSLwTsfahF2NSSdDznvetw

https://blog.csdn.net/qq_53577336/article/details/122828715

DNS域传送漏洞(一)

https://github.com/bin-maker/2021CDN/

https://www.freebuf.com/articles/web/265016.html

Author: 0nlyuAar0n Qi Anxin Attack and Defense Community https://forum.butian.net/share/1976

Leave a Comment

Your email address will not be published. Required fields are marked *