What is screen scraping and how does it relate to APIs?
Screen scraping is a common challenge for businesses with a large online presence, such as financial services and e-commerce businesses. It can be referred to by many different names like web data mining, web scraping, web harvesting, etc. While screen scraping was once seen primarily as a front-end web application security challenge, the changing nature of line-of-business applications is bringing the issue of scraping to the realm of API security.
For example, business-to-consumer (B2C) architectures have evolved over time from monolithic web applications to new API-based front-end frameworks that can meet the needs of web and mobile applications. Meanwhile, the growing use of business-to-business (B2B) APIs by industry ecosystem partners creates even more potential scenarios for scraping to occur.
B2B APIs have different API consumers than B2C APIs, which expands the universe of potential data retrieval scenarios. Some forms of scraping can be legitimate, but more often than not they are used to abuse APIs. Examples may include:
- Aggregation of information for use in unauthorized ways, such as product descriptions and product reviews
- Collect pricing information from e-commerce sites to inform competitive pricing strategies and offers, especially those with ever-changing pricing patterns like travel, hotel, and car rental to name only a few
- Access frequently changed information such as interest rates from financial sites or betting odds from gambling sites for competitive reasons.
In addition to unwanted forms of data leakage, API scraping can place a heavy resource load on application infrastructure. And unfortunately, mitigating it isn’t as simple as setting up rate limits or quotas. Many sophisticated players are adept at scraping in a “low and slow” manner that falls below existing limits and quotas. This makes it difficult to shut down without disrupting legitimate API usage.
Additionally, the fact that API scraping likely works within these existing rate limit and quota settings means that most organizations have no visibility into what is really going on.
How do most organizations protect against API scraping?
Most organizations rely on rate limits and quotas to limit the ability to perform web scraping. While not a magic bullet for the reasons outlined above, it is nonetheless an important first step. At a minimum, this sets an upper limit on the amount of scraping that can occur.
Another crucial best practice is to ensure that clients connecting to APIs are valid. For example, if APIs are typically accessed by mobile devices, steps should be taken to ensure that the mobile client accessing the API has not been hacked, that the integrity of the mobile device does not has not been compromised by jailbreak, etc.
Some organizations may also use specialized bot mitigation tools to protect their web applications from automated scraping. These solutions bring value to B2C API traffic. But since they require specific browser or mobile app instrumentation, they are completely inefficient for B2B API scraping, where browsers and mobile apps don’t exist, which usually come from some programmatic client. Similarly, compromised Internet of Things (IoT) or Internet of Everything (IoE) devices can be used to create “swarms” that do not originate from standard web or mobile application clients.
So, in summary, even if you have rate limits and quotas in place, you will still end up with two main points of exposure:
- You remain open to weak and slow scraping on B2C APIs.
- Authenticated B2B API traffic is completely unmonitored.
And these risks are more than theoretical. Earlier this year, a malicious actor was able to exploit a vulnerability in the Twitter API to retrieve account details of approximately 5.4 million users.
How does Neosec’s approach close these critical protection gaps?
Neosec’s most significant advance in API security is the extension of API monitoring and analysis to authenticated traffic. B2B APIs represent a much larger attack surface and a potential route to more valuable enterprise assets.
Behavioral analysis at the authenticated user level is the key to monitoring B2B APIs. This is the only way to know when a seemingly legitimate and authenticated API consumer using no known attack pattern is scraping your APIs. This requires a context that can only come from analyzing API requests from the same user over a long period of time, even if they have changed access tokens more than 100 times.
Below is a summary of how Neosec’s approach can extend your API protection capabilities beyond traditional bot mitigation techniques.
Comparison of Bot Mitigation and Data Scraping API
|Top 10 OWASP APIs||Robot Mitigation||Neosec|
|What||UI-based API (B2C only)||All APIs (B2C, B2B)|
|Where||In the browser||Via API|
|How||Detects signals from browser or mobile app and human users – assumes all humans are good||Behavioral profiling of users and IPs|
|Impact on user experience||High||Down|
|Endurance||Easier to circumvent||Robust|
|Strengths||Block high-volume automated scraping on websites||Detects a wide range of abuse and misuse by malicious insiders and attackers impersonating legitimate users|
|Common scraping use cases||Scratch prices on the website
(for example: Airlines, Playstation 5)
|Scraping any API resource by any authenticated user – from resellers, partners, vendors to customers|
*** This is a syndicated blog from the Security Bloggers Blog Network written by the Neosec team. Read the original post at: https://www.neosec.com/blog/how-do-you-protect-an-api-from-scraping