An organization’s web presence is the public face of the business and the main point of contact between it and its customers. A website that is slow to load, not user-friendly, or lacks the information that users require can result in lost sales and customers.
Many organizations consider the impacts of unpatched vulnerabilities in web applications on their security and profitability. However, the potential costs of web scraping can be significant as well. Protecting web applications with a web application firewall (WAF) capable of identifying and blocking malicious web scrapers is essential to protecting the business’s bottom line.
What is Web Scraping?
The use of automation is growing rapidly on the Internet. An estimated 37.1% of traffic on the Internet is driven by bots, automated programs that are designed to interact with websites to accomplish certain goals. Of this 37.1%, most of this traffic is malicious, with over a quarter of all Internet traffic linked to “bad” bots.
Web scrapers are one example of the use of automation on the Internet. A web scraper is a program that is designed to visit a website and collect the information publicly posted on it. Web scrapers are not necessarily malicious. The bots used by search engines like Google are web scrapers. They collect the information that is posted on a webpage in order to index it and enable users to find it via their search engines.
However, not all web scrapers are designed for benign purposes. A company may use a web scraper on a competitor’s site to collect intellectual property for use in their own product development and marketing campaigns. A cybercriminal may use web scraping to look for sensitive information accidentally exposed on a company’s web presence or to build a profile designed to improve the effectiveness of a spear phishing attack.
The Growth of Web Scraping
Web scraping bots are not commonly thought of as a security threat to an organization. Bots do the same thing as a legitimate user: visit a webpage and view the content made publicly available. However, bots do so more quickly and at a greater volume than human users.
In fact, web scrapers account for 40-60% of all traffic to organizations’ websites in certain industries. This means that a high percentage – potentially even the majority – of an organization’s overhead associated with maintaining a web presence does not even benefit the target audience.
The financial repercussions of bad bots can vary greatly based upon an organization’s industry and the purpose of the webpage. In the ecommerce space, an organization can lose up to 80% of the profitability of its web presence due to the impacts of web scraping.
Web Scraping Significantly Harms Targeted Businesses
Web scrapers account for a significant percentage of an organization’s web traffic. This leads to a number of financial impacts to an organization:
- Web Server Utilization: An organization’s web servers must process and respond to every request made by scraping bots. This consumes a large amount of computational resources and network bandwidth. This waste of resources increases overhead and may impact the user experience of other, legitimate website visitors.
- Competitive Advantage: An organization must post product and pricing information to attract customers and make sales; however, this same data is valuable to a competitor. Web scrapers can help with the development of competitive advertising and pricing designed to defeat the organization’s current marketing strategy.
- Skewed Analytics: Many organizations collect analytics on the usage of their website to inform marketing campaigns, product decisions, etc. Web scrapers skew these analytics, potentially causing an organization to overestimate the impact of a particular marketing strategy or underestimate conversion rates of certain content.
- SEO Rankings: Web scrapers collect information from an organization’s website, potentially for use elsewhere. If an organization’s copyrighted images and content appear on other pages, it can hurt their rankings on search engines due to duplicate content.
The problem with many of these impacts of web scraping is that they are mostly invisible to the organization. Without the ability to differentiate bot-driven, automated traffic from human users, it is impossible to determine if the use of web scraping (with malicious intent or otherwise) is having a significant impact on the overhead associated with the organization’s web presence and how business decisions are made based upon website analytics.
Protecting Against Web Scraping
Traffic to an organization’s website can be broken up into three main categories. Traffic can originate from a human user, who may or may not be a potential customer or have malicious intent. All other traffic is automated and may either be benign (like Google’s crawlers) or malicious (like many web scrapers).
Blocking human users or benign bots can harm an organization since it may drive away potential customers or cause a decrease in the website’s SEO rankings. However, a failure to block malicious bots has significant impacts on an organization’s profitability, including increased operational expenditure and business decisions based upon invalid analytics data.
Addressing these concerns requires the ability to protect an organization’s web presence against bad bots. A WAF has the visibility required to differentiate types of traffic; however, visibility is not enough. When exploring potential options for protecting an organization’s web presence from bad bots, look for a solution with deep expertise in bot detection and differentiation of bad bots from human users and benign automated visitors.