Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A short history of web bots and bot detection techniques (sinja.io)
73 points by OlegWock 5 months ago | hide | past | favorite | 16 comments


Back in the early 2000s lots of websites had an unauthenticated "guestbook" feature where visitors could leave a message. As soon as Google and page rank became a thing bots would drive by and leave links to the website they were promoting. The idea was to increase the number of backlinks and thus improve your Google rank.

The fix to this was shockingly simple. Add an input box with a standard name like "title" and then hide it with CSS. The bots would always provide a value for every input. If you saw a value for your hidden input you returned 200 but never added the post to your website.


I implemented this very technique last year after getting some crypto spam on the guestbook of my personal website. It works like a charm.


This is bringing me back to running my own site back in the day.


I needed a new github account the other day. The "are you human tests" were so hard that I almost gave up. I think a new way to do this will be needed soon.


Great high-level overview. One of the challenges about learning about bot detection is that it's adversarial, and revealing info about your techniques can help the attackers evade you.

I do work on a bot detection product, and I've seen some group chats where crackers are sharing notes about how they're evading detection tools. The more unnerving part is that the public groups are less serious, and there are certainly better private groups aiming at anything with a good financial reward.


I'm curious about how this world will evolve in the era of AI agents/MCP. It is not entirely unlikely that AI agents will have access to limited wallets etc to facilitate a broader set of use cases. In that case, a one shot solution to bot vs. human may not make sense, and a more nuanced human/bot-we-like/bot-we-don't-like may be needed by corporations. This would esp be the case for unofficial MCP servers that would use technologies like headless browsing etc to support an API.


I'm not sure I understand the mental model you're basing your inferences on, but my model leads to a far different outcome:

If you've got a good enough bot and it's pre-qualified to spend money, then it can use the special "register as a bot" API and provide personal information and whatever else I want to understand that there is a "real human" behind the curtain. A credit card alone is not enough, they can be (trivially) stolen. The way I see it using agentic bots will ultimately require you to provide more personal details than an actual human would.


If I'm running bots that reliably evade bot detection, what would motivate me to provide all that information when I could just ... not?


"robots spending money" has already been going since the 1980s in algorithmic trading.


Maybe I missed it, but I didn't see a mention of the permanent token cell network providers inject into client requests. Knowing what these are and mocking them is another thing a bot might do to impersonate a real device.


Does anyone know of a good reference on the topic of fingerprinting?


https://github.com/gautamkrishnar/nothing-private

Recently used DDG browser it just cant get some sites to clear! Try the flame button. But still logged in after few browser data clearing.

Are company resorting to this kinda tactic to keep user remembered. Its a major booking for lodging site!!!

Qubes OS seems more and more attractive.


https://abrahamjuliot.github.io/creepjs/ https://github.com/abrahamjuliot/creepjs

Usually my go to. The readme, source code and GitHub issues are great source of information, and the website itself is useful to test against.

edit:

For anything network fingerprinting related, especially censorship related I usually browse https://github.com/net4people/bbs/issues


I liked the depiction of different TCP SYN packets ;)


How do systems like OpenAI Operator bypass bot protection for the entire web?


> Orchestraion frameworks

Small typo here




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: