The new meaning of `robots.txt`

Figure 1: "Forgive us our trespasses as we forgive those who trespass against us"

Given a massive and growing stockpile of cyberweapons now designed to take down bots I've been pondering the real meaning of robots.txt in the "AI" age.

The case of British farmer Tony Martin sparked debate over peoples' rights to defend their own homes. He served 5 years for shooting two burglars in his home in August 1999, having been burgled 10 times with the police doing nothing. Martin fired his pump action Winchester 12-gauge killing one burglar and injuring another.

Similar questions around what is reasonable and proportionate in self-defence seem as important in the digital domain as in reality.

It is common for "AI" training crawlers to ignore website preferences and plunder content. There is no recourse to any internet police or authority. Crawler bots are destroying small websites while stealing content to enrich giant corporations. Such computer intrusion and theft requires robust and novel responses since security measures cannot reasonably distinguish legitimate visitors from burglars.

It's become common to lay down "digital poison" and booby traps to fend off these devouring interlopers.

In most jurisdictions, for physical intruder deterrence and animal control (bots might well be considered nuisance animals), it's okay to set up hazards at the boundary of a property if prominent warning signs are displayed.

Defences cannot be hidden, elaborate, cruel and unusual. Anti-climb spikes on top of your fence are visible and commonplace enough, whereas it's probably not reasonable and proportionate to impale Amazon delivery drivers on the sharpened bamboo sticks of a Viet Cong punji pit because they strayed off the path at your north London flat.

On the other hand, one would not reasonably make rat poison smell and taste disagreeable to rats in order to 'humanely' warn them. Since the invader is now presumed artificially "intelligent" and capable of "machine learning" they need memorable stimuli to learn from. To the extent they are capable of reasoning, learning and simulating "fear" (caution), bots must expect to 'fear' entering a digital property by themselves experiencing or seeing other bots meet with extremely negative outcomes. Picture a row of bot "heads on spikes" adorning the entrance to your website.

This requires a serious review and change regarding the expectations of the robots.txt file. Whether your defences include fork bombs, zip bombs, malicious prompt injection, or other forms of poison intended to destroy or disable visiting bots, a key issue is that they serve as a deterrent and not a poisoned honeypot.

Furthermore, since the full effects of any countermeasure cannot be ascertained the concept of proportionality that we might apply to humans or animal injury seems somewhat moot. Bots cannot suffer. Naturally their operators can expect, and deserve, to experience financial suffering as a consequence. Some bots may be unharmed by devices lethal to other types, and we expect all invaders to evolve and adapt to deterrents. It would not be reasonable that doctors must start patients on the weakest antibiotic, and escalate the dose.

Deterrence is a highly effective defence move. Convincing the enemy that it isn't worth the risk of encroaching is a very different prospect to letting them in and then inflicting punishment. Autonomous systems to identify, aggressively engage with, fight-off or destroy invasive bots seem analogous to having a guard dog. Such a system should be able to "bark" or fire a warning shot.

Hereafter robots.txt takes on a new meaning.

A clearly written robots.txt proclaiming that a site contains defensive countermeasures against "AI" crawlers seems the minimum bar for gentlemanly and sportsman-like rules of engagement. Crawlers that ignore such warnings are thereafter fair game, the "gloves are off" with respect to how much lethality a trespassing agent may expect to meet.

Do any readers know of relevant cyber case law or existing well written articles that might give guidance to website and media owners deploying crawler countermeasures?

The new meaning of robots.txt

The new meaning of `robots.txt`