Many web Bots, Crawlers, Vulnerability Scanners, and Data Collectors go Bad. Ignore your robots.txt
settings and skip the Crawler delay value, ie. crawl-delay: 60
And at the end, that causes a High Load CPU Utilization issue, especially in a website with huge content.
Catch The Bad Boy's IP Address That Cause High CPU Utilization
Using Netstat
you can list all IP Addresses connected to your server, then group them to know the most connected one.
# netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n
The result will be a list of IP Addresses.. and beside each one will be the number of its connections. The output is a list like the below screenshot (from ServerFault)
Of course, if we found a high number of connections come from the same IP Address, It's probably the Bad Boy one we need to catch.
Nest step is to make a final check before we take action and block it.
We can check the IP Address sources using Whois and Lookup services like MYIP.MS, and IPLocation.
Firewall The Bad Boy's IP Address
After checking the IP address source:
If you found it:
– does not return to a well-known service OR it returns to bots and services that you can Ignore and ban safely, directly apply the following iptable
rule to block it:
# iptables -I INPUT -s [IP Address] -j REJECT
Catch DDoS and Flooding IP Address
The same technique can be used to detect DDoS and Flooding attacks, as we can refine our netstat
command to catch the Bad Boy's flooding or DDOS trying.
# netstat -ntu | grep ESTAB | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n
To check on ESTABLISHED
state connections instead of all connections, and displays the connections count for each IP, same with SYN_RECV
state
If the outputs show one or more IP Addresses with a high number of connections, and if you run the same command again; and the same IP has more connections. so these IP is the one that makes the attack.