How to detect & capture probing bots

Bots are those small chunks of software designed to perform simple, the repetitive automated tasks quickly.

Many bot attacks are simply probing for vulnerabilities that the attacker can exploit later. It's very important to detect and capture these probing bots.

Shield Security is focused on helping you detect and block bad bots, whatever they're up to. To achieve this, we use bot detection rules, or "bot signals".

Signals are just behaviours that bots have which indicate that they could be a bot. With enough of these behaviours, we can get more confident that a particular visitor is a bad bot.

How to detect & capture probing bots

Shield provides 3 effective ways ("bot signals") for detecting & capturing probing bots. It achieves this through its Detect Probing Bots feature. You can use this feature to:

Identify a bot when it hits a 404
Detect when a visitor tries to load a non-existent page.
Note: If you haven’t been keeping your site’s URLs and links in-sync, you could well have legitimate 404 links on your site which normal users are going to click. 404s are definitely best considered a “signal” and not a definitive bot request. But, if you’re absolutely sure you’re on top of your broken links and you’re handling everything properly, you could treat it as more than just a signal.

Important: 404 errors generated for the following file types won't trigger an offense:

js, css, woff, woff2, gif, jpg, jpeg, png, map.
Tempt a bot with a fake link to follow (Mouse Trap)
Detect a bot when it follows a fake 'no-follow' link. This works because legitimate web crawlers respect 'robots.txt' and 'nofollow' directives.

Note: This is a Mouse Trap feature. It works by leaving a bit of “cheese” in the form a link that alerts Shield to a bad bot whenever it’s been accessed.
Identify a bot when it accesses XML-RPC
Your site likely gets pinged with legitimate XML-RPC requests even though you have no use for it. But it’s also a tempting place for bots to hit-up and so repeated requests at your XML-RPC endpoint is likely a bot too.

Note: If you don't use XML-RPC, there's no reason anything should be accessing it.
Be careful the ensure you don't block legitimate XML-RPC traffic if your site needs it.
We recommend offense here in-case of blocking valid request unless you're sure.

To learn more about XML-RPC attacks and how to block them, read the blog article here.
Invalid Script Load

Detect when a bot tries to load WordPress directly from a file that isn't normally used to load WordPress.
If you look at your WP files you'll see for example index.php, wp-config.php, wp-load.php, wp-settings.php and so on. Generally, index.php is the main file that always loads, but there are others in other scenarios. Like wp-comments-post.php if you're posting a comment.

A bot might try to load wp-settings.php, or it tried wp-load.php. There's no need to load these files directly. If they do, this is probably a bot.

I.e.. if you load your-site.com/wp-load.php, you'll trigger this offense.

Please note that Detect Probing Bots settings will not apply to the whitelisted IPs.

To access the Detect Probing Bots options, simply go to the main Config menu > Bot Blocking > Bot Behaviours section:

Here you'll be able to configure each of bot signals independently from each other and you’ll also be able to decide how you want Shield to respond. You’ll have 4 options to choose from:

Activity Log Only. This option lets you see the activity of these bots on the Activity Log before applying any offenses or blocks to offenders. It’ll let you test-drive the signal before making it take effect.
Increment Offense (by 1). This option puts another black mark against an IP. As always with the offense system, once the limit is reached for an IP address, it is blocked from accessing the site.
Double Offense (by 2). We’ve added the ability to give weight to certain behaviours. By allowing the offense counter to increment by 2, the IP will reach the limit more quickly, and be blocked sooner.
Immediate block. If you decide that a particular signal on your site is severe enough, you can have Shield immediately mark that IP as blocked.

Read more about the offense limit here and the Automatic IP Blacklist System here.

For example, if you configured Activity Log Only for 404 Detect option, and a visitor is trying to load a non-existent page (404), they'll not get blocked / blacklisted. Each 404 page load attempt will be recorded with the Activity Log only:

Or, let's say you configured Double Offense for 404 Detect option, and you have offense limit set to 6. Each time a visitor tries to load a non-existent page (404), instead of incrementing the offense count by 1, it increments by 2. The visitor's IP will reach the limit (6) more quickly, and be blocked sooner. You can see these activities in your Activity Log as well.

Note: You can review and analyse the blacklisted IP under the IP Management and Analysis section.

Important: If permalinks are set to "plain", 404 tracking may not be possible. The Activity Log will show something like

404 detected at "/".

Or, if you configured Immediate block for 404 Detect option, and you have offense limit set to i.e. 6, a visitor will be blocked / blacklisted immediately. You can see these activities in your Activity Log:

Hint: You may also want to use Traffic Watch Viewer to review all logs of HTTP requests made to your WordPress site.

We also recommend you to read:

Note: ShieldPRO is required for the Detect Probing Bots feature. To find out what the extra ShieldPRO features are and how to purchase, please follow this link here.