CAPTCHA: Human Verification in Online Interactions
Most of you reading this article will have already stumbled upon CAPTCHA, just think back on that bus or stop sign you had to identify to prove that you are a human. But what’s behind this digital test and how does it work? This article will explain everything you need to know about CAPTCHA and why these tests are critical in preventing spam and data scraping in today’s digital environments.
What Is CAPTCHA?
CAPTCHA, which stands for Completely Automated Public Turing test to Tell Computers and Humans Apart, is more than just a series of characters and visuals that test your typing and cognitive skills. It serves as a critical tool in distinguishing true human users from automated bots. In fact, in today’s digital battleground between real users and bots, the role of verification systems has never been more important.
To put that into perspective, let’s take a look at some statistics from a recent report by Imperva. According to Imperva, a staggering 47.4% of all internet traffic in 2022 came from bots, a statistic which has also increased by 5.1% from the previous year. Another interesting statistic from the report reveals that human traffic has decreased to 52.6%, its lowest point in almost a decade.
What does that mean? Countless bots, which are largely driven by spammers and cybercriminals, continuously targeting different platforms with spam, data theft attempts, and brute force attacks. That’s why it is so important to understand CAPTCHA and how it is defending us against this new age of digital attackers.
Let’s start by looking at what CAPTCHA is used for before we dive into how it works and the different types of the test.
What Are CAPTCHAs Used For?
With over one-third of the top 100,000 websites using these human verification systems, the number of industries that currently leverage this technology is extremely diverse, spanning from gaming, payments, and social media, all the way to the entertainment and media industries.
CAPTCHAs are used by web developers and website owners to limit the use of bots on their websites. There are also many different applications, which depend on the type of online activity that is being conducted and the industry. They are typically used to help with the following:
- Enhancing security – They can be used as an added security measure during authentication to limit brute force attacks. In addition, they can help protect websites from DDoS attacks by ensuring that only humans are able to access the site.
- Restricting account creation – Bots frequently spam registration systems, creating fake accounts and straining service providers’ resources. In addition to this, these accounts contribute to an increase in fraud. Human verification tests help service providers by preventing bots from registering and mass generating accounts.
- Preventing spam and mass commenting – Bots target message boards, contact forms and review sites with spam comments. These comments are often used for malicious purposes including spreading malware and phishing campaigns, trolling or disruption, competitive sabotage, and search engine manipulation.
- Limiting survey or poll skewing – These liveness detection systems help prevent bots from submitting votes on online surveys or polls. By ensuring that each vote is entered by a human, it discourages the submission of multiple votes, as it takes longer for each vote to be submitted by the end user.
- Preventing ticket and product scalping – Online ticket sales for events and online retailers often use verification systems to stop bots from buying tickets and products in bulk. Scalpers use bots to mass purchase products that are in high demand and resell them at higher prices.
- Preventing automated data extraction – Digital verification tests can also prevent bots from scraping websites for data such as prices, addresses, and contact information.
How Does CAPTCHA Work?
Now that we have gone through the basics, let’s get into how the technology behind these human-detecting tests actually works. From a high-level perspective, human verification systems will ask a user to prove if they are human by demonstrating human action or “intelligence”. This can happen in many ways, such as trying to interpret an image, a sound, or a group of letters and numbers that have some noise or have been somehow transformed.
Basically, adding a digital challenge means adding some friction for your users, which is obviously something that businesses usually tend to avoid, but it can be beneficial to some if they want to slow down malicious behaviors or limit automated action. Some business cases exist where organizations want to avoid the costs that come with not having such systems in place.
By definition, these tests are completely automatic and usually do not require human intervention for administration or maintenance. This has obviously huge advantages in terms of costs and reliability.
When it comes to types of CAPTCHAs, there are several methods used to distinguish between human and bot users. Most modern iterations can be divided into three main buckets: text-based, image-based, and audio-based.
- Text-based CAPTCHAs assume that humans can read distorted text much better than computers, so a distorted image of a set of characters is displayed for the user to prove they are a human. The text is usually warped, obscured by lines, or displayed against a noisy background image to make it more challenging for bots to decipher. The text-based verification systems mentioned below were developed to address the shortcomings of earlier versions and adapt to the increasing sophistication of such systems.
- Gimpy: Developed by researchers at Yahoo, Gimpy is one of the earliest human verification systems. It uses distorted images of several overlapping words from an 850-word dictionary. Users must identify three words to pass.
- EZ-Gimpy: A simplified version of Gimpy, presenting a single distorted word. It is easier for humans and bots to solve.
- Gimpy-r: Known as reloaded Gimpy, this advanced version of Gimpy features more complex distortions and obfuscations to counter sophisticated automated verification solvers.
- Simard’s HIP: This system requires users to identify letters and numbers within distorted and noisy images.
- Image-based CAPTCHAs require the identification of objects or patterns within images. For example, users may be asked to select all parts of the image that contain cars, bicycles, or traffic lights.
- Math problem CAPTCHAs ask users to solve simple addition, subtraction, or multiplication questions to prove they are human. Similarly, some of these systems use trivia questions to challenge users and ensure they are not bots.
- Audio-based tests were created for visually-impaired users who have difficulty reading text or seeing pictures. With this type, a series of spoken words is reproduced for the user so that he/she can simply type them in to complete the task. It often benefits from the fact that there are less bots equipped with speech recognition features than image recognition ones.
- 3D CAPTCHAs present objects in 3D and require users to manipulate them. These are less common, as they require additional computational resources to generate.
- Gesture-based CAPTCHAs challenge the user to perform specific actions with their mouse or touchscreen, such as dragging and dropping objects into specific locations or drawing basic shapes.
What is reCAPTCHA?
A reCAPTCHA is defined by Google—a company that created such a system—as “a seamless fraud detection service that stops bots and other automated attacks while approving valid users.” The main idea behind them is to leverage a risk-based algorithm that uses machine learning to adapt to every customer and bot interaction without creating the same level of friction that traditional human verification tests present in the user experience.
Users might simply be asked to check a box that says “I’m not a robot”, and reCAPTCHA analyzes their behavior to determine their authenticity.
reCAPTCHA usually provides better levels of security through IP tracking, cookies analysis and user behavior analysis. Even though they produce a very similar outcome of CAPTCHA, the former is generally considered to be better in terms of both user experience and security.
Limitations of CAPTCHAs
CAPTCHAs are invaluable tools for internet security, and function quite well in differentiating genuine users from automated bots. However, with the rise of AI technology and machine learning, challenge-response tests are now struggling to fend off more sophisticated attacks which use modern technology to spoof image recognition and other cognitive and visual tests.
Firstly, it’s important to clarify the difference between CAPTCHA and the famous Turing test, introduced by Alan Turing in 1950. The former is specifically designed to draw a distinction between human users and bots, which makes it different from the Turing test which evaluates whether a computer is able to mimic human intelligence. With the continued progress of Artificial Intelligence and Machine Learning, the grey area between the two tests is becoming increasingly difficult to navigate. This brings forth one of the biggest limitations of CAPTCHA tests and a string of related issues for the popular bot defense.
That said, let’s get into a more detailed overview of the limitations of this technology:
- Blurred Lines with AI – With the recent evolution and accessibility of AI and Machine Learning, it has become significantly harder to identify human users with image recognition and visual challenges, highlighting a continuous fight between CAPTCHA creators and AI developers.
- Cybercriminal Exploitations – Cybercrime groups have been beating this technology for years now, utilizing targeted system exploits, inexpensive gig-economy platforms, and even paid humans to solve challenges presented to users.
- Replay Attacks – Replay attacks use malicious services to mirror CAPTCHA challenges onto their own platform, have visitors solve them, and then forward the solutions to the target site.
- AI Training – Challenge-response systems have recently become data sources for AI training projects, empowering the same algorithms that have been created to bypass their own software.
The key element to remember is that CAPTCHAs are not authentication mechanisms, but more akin to liveness detection systems. If targeted by malicious threat actors, they lose their efficiency quickly, either through inexpensive human labor or more sophisticated automated software. Therefore it’s important to recognize these systems as an added complexity to slow down opponents, not a defensive system of its own.
Drawbacks of Using CAPTCHAs
Because of their limitations, some websites are exploring alternative methods of bot detection, such as behavioral analysis, honeypots, or other authentication systems.
Additional drawbacks that are influencing this include:
- User Friction and Inconvenience – One of the most problematic aspects of this security mechanism is the amount of friction that they create in the experience of users visiting a specific website or accessing an online resource.
- Impact on Conversion Rates – They can discourage legitimate users from completing actions or making purchases, lowering the conversion rate for businesses.
- Accessibility Issues and Concerns – Digital verification tests can be challenging – if not impossible – for many users to recognize the text, even if their visual capacity is not impaired. They are even more troublesome for people with visual or cognitive impairments. Moreover, alternative types, such as audio-based challenges, can also still create difficulties for those with hearing impairments.
- CAPTCHAs that rely on a specific language or cultural references can also create issues for users from different countries or cultures.
- In addition, bot-detection tests can be problematic to successfully complete on mobile devices, due to their limited screen size and touch interface.
- False Positives / Negatives – Sophisticated bots can sometimes bypass these challenges and successfully respond, while legitimate users might be mistakenly identified as bots. The newest generation of these systems are created to resist the most sophisticated text recognition programs. The tradeoff between more complex challenges and increased friction is typically forced over time in a position where humans will become less successful than bots at solving them.
- Computational Costs – Implementing challenge-response tests can increase server load, require additional computer resources, and potentially slow down website performance.
- Privacy Concerns – Some services track user behavior to target users with advertisements.
Throughout the years, CAPTCHA has provided online businesses with an essential tool to protect their websites, online resources, and digital infrastructure. As technology evolves, the future of these human verification systems will likely be influenced by the ongoing battle between cyber-security experts, malicious actors and the latest advancements.
Instead of relying on these liveness detection systems alone, websites might adopt more robust authentication systems like passwordless MFA, passkeys, and next-gen authentication systems that authenticate users with biometrics and modern cryptography, and eliminate AI and machine learning susceptibilities.
Kelvin Zero’s Multi-Pass provides just that – a next-gen authentication solution that integrates cutting-edge offline fingerprint biometrics and cryptography.