hacktricks/network-services-pentesting/pentesting-web/uncovering-cloudflare.md

136 lines
11 KiB
Markdown
Raw Normal View History

2022-05-24 10:26:01 +00:00
# Uncovering CloudFlare
2022-04-28 16:01:33 +00:00
<details>
2023-04-25 18:35:28 +00:00
<summary><a href="https://cloud.hacktricks.xyz/pentesting-cloud/pentesting-cloud-methodology"><strong>☁️ HackTricks Cloud ☁️</strong></a> -<a href="https://twitter.com/hacktricks_live"><strong>🐦 Twitter 🐦</strong></a> - <a href="https://www.twitch.tv/hacktricks_live/schedule"><strong>🎙️ Twitch 🎙️</strong></a> - <a href="https://www.youtube.com/@hacktricks_LIVE"><strong>🎥 Youtube 🎥</strong></a></summary>
2022-04-28 16:01:33 +00:00
2023-01-11 16:53:45 +00:00
* Do you work in a **cybersecurity company**? Do you want to see your **company advertised in HackTricks**? or do you want to have access to the **latest version of the PEASS or download HackTricks in PDF**? Check the [**SUBSCRIPTION PLANS**](https://github.com/sponsors/carlospolop)!
* Discover [**The PEASS Family**](https://opensea.io/collection/the-peass-family), our collection of exclusive [**NFTs**](https://opensea.io/collection/the-peass-family)
* Get the [**official PEASS & HackTricks swag**](https://peass.creator-spring.com)
* **Join the** [**💬**](https://emojipedia.org/speech-balloon/) [**Discord group**](https://discord.gg/hRep4RUj7f) or the [**telegram group**](https://t.me/peass) or **follow** me on **Twitter** [**🐦**](https://github.com/carlospolop/hacktricks/tree/7af18b62b3bdc423e11444677a6a73d4043511e9/\[https:/emojipedia.org/bird/README.md)[**@carlospolopm**](https://twitter.com/hacktricks\_live)**.**
2023-01-11 16:53:45 +00:00
* **Share your hacking tricks by submitting PRs to the** [**hacktricks repo**](https://github.com/carlospolop/hacktricks) **and** [**hacktricks-cloud repo**](https://github.com/carlospolop/hacktricks-cloud).
2022-04-28 16:01:33 +00:00
</details>
2020-12-08 12:25:09 +00:00
Techniques to try to uncover web servers behind cloudflare:
### Techniques
* You can also use some service that gives you the **historical DNS records** of the domain. Maybe the web page is running on an IP address used before.
* Same could be achieve **checking historical SSL certificates** that could be pointing to the origin IP address.
* Check also **DNS records of other subdomains pointing directly to IPs**, as it's possible that other subdomains are pointing to the same server (maybe to offer FTP, mail or any other service).
* If you find a **SSRF inside the web application** you can abuse it to obtain the IP address of the server.
*
Search a unique string of the web page in browsers such as shodan (and maybe google and similar?). Maybe you can find an IP address with that content.
* In a similar way instead of looking for a uniq string you could search for the favicon icon with the tool: [https://github.com/karma9874/CloudFlare-IP](https://github.com/karma9874/CloudFlare-IP) or with [https://github.com/pielco11/fav-up](https://github.com/pielco11/fav-up)
* This won't work be very frequently because the server must send the same response when it's accessed by the IP address, but you never know.
### Tools
2022-08-16 17:55:51 +00:00
* Search for the domain inside [http://www.crimeflare.org:82/cfs.html](http://www.crimeflare.org:82/cfs.html) or [https://crimeflare.herokuapp.com](https://crimeflare.herokuapp.com). Or use the tool [CloudPeler](https://github.com/zidansec/CloudPeler) (which uses that API)
2020-12-29 00:31:19 +00:00
* Search for the domain in [https://leaked.site/index.php?resolver/cloudflare.0/](https://leaked.site/index.php?resolver/cloudflare.0/)
2022-04-05 22:24:52 +00:00
* [**CloudFlair**](https://github.com/christophetd/CloudFlair) is a tool that will search using Censys certificates that contains the domain name, then it will search for IPv4s inside those certificates and finally it will try to access the web page in those IPs.
* [Censys](https://search.censys.io/)
* [Shodan](https://shodan.io/)
* [Bypass-firewalls-by-DNS-history](https://github.com/vincentcox/bypass-firewalls-by-DNS-history)
2022-05-24 10:26:01 +00:00
* If you have a set of potential IPs where the web page is located you could use [https://github.com/hakluke/hakoriginfinder](https://github.com/hakluke/hakoriginfinder)
2022-04-28 16:01:33 +00:00
2022-08-17 12:21:23 +00:00
```bash
2022-08-19 16:13:59 +00:00
# You can check if the tool is working with
prips 1.0.0.0/30 | hakoriginfinder -h one.one.one.one
2022-08-17 12:21:23 +00:00
# If you know the company is using AWS you could use the previous tool to search the
## web page inside the EC2 IPs
DOMAIN=something.com
WIDE_REGION=us
for ir in `curl https://ip-ranges.amazonaws.com/ip-ranges.json | jq -r '.prefixes[] | select(.service=="EC2") | select(.region|test("^us")) | .ip_prefix'`; do
echo "Checking $ir"
prips $ir | hakoriginfinder -h "$DOMAIN"
done
```
2023-01-11 16:53:45 +00:00
### Uncovering Cloudflare from AWS machines
2022-04-28 16:01:33 +00:00
2023-02-16 14:44:06 +00:00
For a better description of this process check:
{% embed url="https://trickest.com/blog/cloudflare-bypass-discover-ip-addresses-aws/?utm_campaign=hacktrics&utm_medium=banner&utm_source=hacktricks" %}
2023-01-11 16:53:45 +00:00
```bash
# Find open ports
2023-01-12 12:11:28 +00:00
sudo masscan --max-rate 10000 -p80,443 $(curl -s https://ip-ranges.amazonaws.com/ip-ranges.json | jq -r '.prefixes[] | select(.service=="EC2") | .ip_prefix' | tr '\n' ' ') | grep "open" > all_open.txt
2023-01-11 16:53:45 +00:00
# Format results
cat all_open.txt | sed 's,.*port \(.*\)/tcp on \(.*\),\2:\1,' | tr -d " " > all_open_formated.txt
# Search actual web pages
httpx -silent -threads 200 -l all_open_formated.txt -random-agent -follow-redirects -json -no-color -o webs.json
# Format web results and remove eternal redirects
2023-01-11 16:57:23 +00:00
cat webs.json | jq -r "select((.failed==false) and (.chain_status_codes | length) < 9) | .url" | sort -u > aws_webs.json
# Search via Host header
httpx -json -no-color -list aws_webs.json -header Host: cloudflare.malwareworld.com -threads 250 -random-agent -follow-redirects -o web_checks.json
2023-01-11 16:53:45 +00:00
```
2022-04-28 16:01:33 +00:00
## Bypass Cloudflare for scraping
### Cache
Sometimes you just want to bypass Cloudflare to only scrape the web page. There are some options for this:
* Use Google cache: `https://webcache.googleusercontent.com/search?q=cache:https://www.petsathome.com/shop/en/pets/dog`
* Use other cache services such as [https://archive.org/web/](https://archive.org/web/)
### Cloudflare Solvers
There have been a number of Cloudflare solvers developed:
* [FlareSolverr](https://github.com/FlareSolverr/FlareSolverr)
* [cloudscraper](https://github.com/VeNoMouS/cloudscraper) [Guide here](https://scrapeops.io/python-web-scraping-playbook/python-cloudscraper/)
* [cloudflare-scrape](https://github.com/Anorov/cloudflare-scrape)
* [CloudflareSolverRe](https://github.com/RyuzakiH/CloudflareSolverRe)
* [Cloudflare-IUAM-Solver](https://github.com/ninja-beans/cloudflare-iuam-solver)
* [cloudflare-bypass](https://github.com/devgianlu/cloudflare-bypass) \[Archived]
* [CloudflareSolverRe](https://github.com/RyuzakiH/CloudflareSolverRe)
### Fortified Headless Browsers <a href="#option-4-scrape-with-fortified-headless-browsers" id="option-4-scrape-with-fortified-headless-browsers"></a>
The other option is to do the entire scraping job with a headless browser that has been fortified to look like a real users browser:
* **Puppeteer:** The [stealth plugin](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth) for [puppeteer](https://github.com/puppeteer/puppeteer).
* **Playwright:** The [stealth plugin](https://www.npmjs.com/package/playwright-stealth) is coming to Playwright soon. Follow developments [here](https://github.com/berstend/puppeteer-extra/issues/454) and [here](https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra).
* **Selenium:** The [undetected-chromedriver](https://github.com/ultrafunkamsterdam/undetected-chromedriver) an optimized Selenium Chromedriver patch.
### Smart Proxy With Cloudflare Built-In Bypass <a href="#option-5-smart-proxy-with-cloudflare-built-in-bypass" id="option-5-smart-proxy-with-cloudflare-built-in-bypass"></a>
The alternative to using open source Cloudflare bypasses, is to use smart proxies that develop and maintain their own private Cloudflare bypasses.
These are typically more reliable as it is harder for Cloudflare to develop patches for them, and they are developed by proxy companies who are financially motivated to stay 1 step ahead of Cloudflare and fix their bypasses the very minute they stop working.
Most smart proxy providers ([ScraperAPI](https://www.scraperapi.com/?fp\_ref=scrapeops), [Scrapingbee](https://www.scrapingbee.com/?fpr=scrapeops), [Oxylabs](https://oxylabs.go2cloud.org/aff\_c?offer\_id=7\&aff\_id=379\&url\_id=32), [Smartproxy](https://prf.hn/click/camref:1100loxdG/\[p\_id:1100l442001]/destination:https%3A%2F%2Fsmartproxy.com%2Fscraping%2Fweb)) have some form of Cloudflare bypass that work to varying degrees and vary in cost.
However, one of the best options is to use the [ScrapeOps Proxy Aggregator](https://scrapeops.io/proxy-aggregator/) as it integrates over 20 proxy providers into the same proxy API, and finds the best/cheapest proxy provider for your target domains.
### Reverse Engineer Cloudflare Anti-Bot Protection <a href="#option-6-reverse-engineer-cloudflare-anti-bot-protection" id="option-6-reverse-engineer-cloudflare-anti-bot-protection"></a>
This approach works (and is what many smart proxy solutions do), however, it is not for the faint hearted.
**Advantages:** The advantage of this approach, is that if you are scraping at large scales and you don't want to run hundreds (if not thousands) of costly full headless browser instances. You can instead develop the most resource efficient Cloudflare bypass possible. One that is solely designed to pass the Cloudflare JS, TLS and IP fingerprint tests.
**Disadvantages:** The disadvantages to this approach is that you will have to dive deep into a anti-bot system that has been made purposedly hard to understand from the outside, and split test different techniques to trick their verification system. Then maintain this system as Cloudflare continue to develop their anti-bot protection.
## References
* [https://scrapeops.io/web-scraping-playbook/how-to-bypass-cloudflare/](https://scrapeops.io/web-scraping-playbook/how-to-bypass-cloudflare/)
2023-01-11 16:53:45 +00:00
<details>
2022-04-28 16:01:33 +00:00
2023-04-25 18:35:28 +00:00
<summary><a href="https://cloud.hacktricks.xyz/pentesting-cloud/pentesting-cloud-methodology"><strong>☁️ HackTricks Cloud ☁️</strong></a> -<a href="https://twitter.com/hacktricks_live"><strong>🐦 Twitter 🐦</strong></a> - <a href="https://www.twitch.tv/hacktricks_live/schedule"><strong>🎙️ Twitch 🎙️</strong></a> - <a href="https://www.youtube.com/@hacktricks_LIVE"><strong>🎥 Youtube 🎥</strong></a></summary>
2022-04-28 16:01:33 +00:00
2023-01-11 16:53:45 +00:00
* Do you work in a **cybersecurity company**? Do you want to see your **company advertised in HackTricks**? or do you want to have access to the **latest version of the PEASS or download HackTricks in PDF**? Check the [**SUBSCRIPTION PLANS**](https://github.com/sponsors/carlospolop)!
* Discover [**The PEASS Family**](https://opensea.io/collection/the-peass-family), our collection of exclusive [**NFTs**](https://opensea.io/collection/the-peass-family)
* Get the [**official PEASS & HackTricks swag**](https://peass.creator-spring.com)
* **Join the** [**💬**](https://emojipedia.org/speech-balloon/) [**Discord group**](https://discord.gg/hRep4RUj7f) or the [**telegram group**](https://t.me/peass) or **follow** me on **Twitter** [**🐦**](https://github.com/carlospolop/hacktricks/tree/7af18b62b3bdc423e11444677a6a73d4043511e9/\[https:/emojipedia.org/bird/README.md)[**@carlospolopm**](https://twitter.com/hacktricks\_live)**.**
2023-01-11 16:53:45 +00:00
* **Share your hacking tricks by submitting PRs to the** [**hacktricks repo**](https://github.com/carlospolop/hacktricks) **and** [**hacktricks-cloud repo**](https://github.com/carlospolop/hacktricks-cloud).
2022-04-28 16:01:33 +00:00
</details>