Du denkst, du verifizierst deine Identität, indem du an den "Ampeln" klickst, aber eigentlich arbeitest du kostenlos für KI.
What's your take on this being the next-generation graphic captcha?
It seems one would experience countless times of "Your response to the CAPTCHA appears to be invalid. Please re-verify that you're not a robot below." I wonder what cat owners think.
This is a recently popular post poking fun at the "I'm not a robot" verification process. The video shows that users have to click on those gray "cat poop clumps" one by one with the mouse, drag them to the trash can beside, and finally, after passing the level, tick "I'm not a cat".
This post has an explosive amount of interactions, with over a million views.
The comment section is very lively. Some people think it's much better than identifying the pixelated traffic lights with blurred boundaries.
Some people also associate it with the data refinement work in the American TV series "Severance".
Some people even joke: "So only cat owners are real humans."
One of the highly discussed topics is: "Image verification is helping AI train data for free."
Helping AI train data? Let's delve into this.
As we all know, whether it's registering an account or posting a message, captchas are essential. Its "famous name" is CAPTCHA, which stands for "Completely Automated Public Turing test to tell Computers and Humans Apart". As the name suggests, its function is to distinguish between humans and robots, preventing robots from spamming, vote - rigging, or causing damage.
Initially, its main forms were distorted text or images. The degree of distortion determined how difficult it was to recognize.
But soon, a genius named Luis von Ahn (who is also the later founder of Duolingo) stepped forward. He found that hundreds of millions of people (now billions) around the world were clicking on these captchas every day, and the total wasted time added up to millions of hours. Isn't this just a pure waste of "human brain cycles"?
Thus, a genius idea that "kills two birds with one stone" was born. It's called reCAPTCHA.
Since version 1, this system has not been just a simple "guard". It's a "large - scale human crowdsourcing project".
Each time, the system will pop up two distorted words for you. Among these two words, only one is a "control word" whose answer the system knows, used to determine if you're human. The other "unknown word" is Google's "hidden agenda" - it comes from an ancient scanned book or newspaper that AI OCR (Optical Character Recognition) can't handle.
You have no idea which is which, so you'll honestly fill in both correctly.
As a result, without even realizing it, global netizens, through this "unconscious labor", have literally "transcribed" all the historical archives of The New York Times since 1851 and a vast amount of the "Google Books" project, word by word, into digital versions for free.
However, the AI (Google's OCR) that we "fed" ourselves has "out - competed" the old master (v1 text verification) to death.
In 2014, Google itself publicly admitted that its own AI could crack the most difficult distorted text with an accuracy rate of up to 99.8%. This is thanks to the "Convolutional Neural Network" (CNN). Academic research (such as CapNet) has long confirmed that the accuracy rate of such AI models in cracking text captchas generally reaches 98% or even 100%. The v1 defense line has completely failed technically.
Google Blog: https://security.googleblog.com/2014/04/street-view-and-recaptcha-technology.html
The defense line must be upgraded. So, v2 image verification arrived.
Are you familiar with this "I'm not a robot" verification? "Select all the cars", "Select all the traffic lights", "Select all the crosswalks". So, the question is, around the same time (around 2014), which project was Google "burning money" on like crazy?
That's right, Autonomous Driving (Waymo).
What does an autonomous driving AI need to train the most? Of course, it's to recognize "cars", "traffic lights", "crosswalks", and "bicycles". That is to say, billions of netizens around the world are working for Google's autonomous driving AI for free when logging in, registering, or posting messages.
How large is this "human computation" project? Some scholars estimate that in the past decade or so, the total value of this unpaid labor contributed by humans exceeds $6.1 billion.
In 2024, AI has finally "graduated" and "knocked down" the second old master (v2 jigsaw puzzle).
Researchers from ETH Zurich submitted a paper titled "Breaking reCAPTCHA v2". They used the advanced YOLOv8 object detection model and achieved a 100% accuracy rate in cracking the v2 image challenges.
Paper link: https://arxiv.org/abs/2409.08831
The reason these models are so powerful is that they are trained on a vast amount of precisely labeled datasets (the kind created with the help of reCAPTCHA v2).
The research even shows that AI's performance in solving these problems is "not significantly different" from that of humans. Then you might ask: "Since AI can crack it 100%, why am I still clicking on those damned traffic lights every day?"
Because that jigsaw puzzle is no longer the real defense line.
This 2024 research also confirms an "open secret": The real essence of reCAPTCHA v2 lies in the analysis of your private data.
Remember that "I'm not a robot" checkbox? Google's "advanced risk analysis engine" doesn't care whether you click it or not, but how you click it. It's "peeping" at you in the background:
- Mouse trajectory: Is your movement a smooth one with a bit of "human - like" jitter, or a perfect straight line or teleportation like a robot?
- Click position: Do you click in the middle of the box, or exactly at the center (a robot's behavior)?
- Browser fingerprint: Your screen resolution, plugins, fonts...
- Google Cookie: This is the "killer weapon". A user who has long logged in to a Google account and has a "clean" browsing history is "more human - like" than a user who has just cleared their cookies or is using a VPN.
This offensive - defensive battle has already reached a white - hot stage in the academic community.
Offensive side (AI attack): Attackers face a "chicken - and - egg" problem: You need an AI solver to automatically collect a vast number of samples, but you also need a vast number of samples to train this solver.
The answer is "Generative Adversarial Network" (GAN). Research shows that attackers only need a small number (e.g., 500) of real samples to train a GAN. The "generator" of this GAN will forge new captchas, while the "discriminator" will learn to crack them. This process can generate synthetic training data infinitely, and the "arsenal" of AI attackers is thus established.
Defensive side (v3 shift): Since the jigsaw puzzle can't hold the line, the defense line has completely shifted to reCAPTCHA v3. Its academic term is Behavioral Biometrics. This is the core of v3.
reCAPTCHA v3 is completely invisible and will run on all the pages you visit. It silently observes all your behaviors (mouse movements, scrolling, keyboard rhythms) like a supervisor, and then gives you a "credibility score" from 0.0 (robot) to 1.0 (human).
The cost of this shift is huge:
- Privacy nightmare: This large - scale monitoring is accused of being "spyware" and seriously conflicts with privacy regulations such as GDPR.
- Privacy paradox: The more you try to protect your privacy (using a VPN, clearing cookies, using a privacy browser), the less "credible" data you'll get, and the lower the score v3 will give you, making you "more like a robot".
- "Torture" - level difficulty: The only way to sanction AI is to make the jigsaw puzzle extremely difficult. As a result, it fails to defend against AI but completely locks out users with visual, hearing, or reading disabilities (Dyslexia).
So, what should be done when the "behavioral monitoring" of v3 fails due to privacy issues and AI simulation?
It's still the research team from ETH Zurich mentioned earlier that proposed the most "Matrix - like" solution: Adversarial CAPTCHA.
- Paper title: Seeing Through the Mask: Rethinking Adversarial Examples for CAPTCHAs
- Paper link: https://arxiv.org/abs/2409.05558v1
This idea takes advantage of a fatal weakness of AI: They are easily deceived by "adversarial samples". These are "noisy" images that seem meaningless to the human eye, but AI (such as CNN) will mistake them for a specific object with a 99.9% confidence level.
In the future, captchas may no longer be about "whether you can solve human - related problems", but "whether you'll make the mistakes that only AI would make".
So, let's go back to that "shoveling cat poop" captcha at the beginning.
Do you think you're just having fun with cats? Maybe, you're actually giving pre - employment training for free to a certain "AI cat - poop cleaner" robot. Or, you're proving to the system that you're not stupid enough to click on a TV snowflake that AI mistakes for "cat poop".