Billions of people use the web on a daily basis. However, most of them usually consume less than 5 percent of its content. This 5 percent is known as the Surface Web, the part of the web whose content can be indexed and found by standard search engines that use link-crawling techniques, like Google, Bing, Yahoo, etc. These search engines use automated robots, called “crawlers”, which move from link to link in order to reach as much content as possible, and index it in the search engines’ special databases.
The remaining 95 percent (which cannot be found by search engines) is known as the Deep Web. In fact, it is impossible to calculate the precise size, but most experts believe the percentage falls somewhere between 95 percent and 99 percent.
This Deep Web can be divided into several types and depth levels:
- The 1st layer of content may be relatively accessible for humans to consume, but not-so-easy for web crawlers to discover and index. This includes unlinked content, non-text content (multimedia files or other file formats not handled by search engines), databases, academic journals and so forth. This content is typically found on dynamic web pages, private or blocked websites (like those that require login credentials or ask to answer a CAPTCHA to access), web archives, interactive tools, contextual websites or limited-access websites with non-standard DNS\TDLs.
- The 2ndlayer is known as the Darknet and includes peer-to-peer (P2P) networks as well as anonymity networks, such as Tor or I2P. These networks often use non-standard communications protocols and ports and can only be accessed with specific software, configurations or authorizations. The content found in the Darknet (as well as other sites that are hosted on infrastructure that requires specific software to access it) is called Dark Web. Many people confusingly refer to Darknet or Dark Web when they use the term “Deep Web”, even though the former is only a small part of the later.
- The 3rdand 4th layers consist of classified and proprietary information (respectively). The content of these layers, hosted on alternative or private networks, can be accessed only through exploitation of security flaws or direct access (if there’s an internet-facing device on the network). Examples of this kind of network are the U.S. Department of Defense’s SIPRNet (Secret Internet Protocol Router Network) and the U.S. Defense Intelligence Agency-run JWICS (Joint Worldwide Intelligence Communications System).
A commonly leveraged attribute of the Deep Web is it’s high level of anonymity, which lends itself to a number of use cases. Individuals or organizations looking to avoid censorship or shield their communications from government intervention and surveillance are likely to use the Deep Web. The same applies to dissidents in restrictive regimes looking to share information and news that is otherwise censored in their countries, those intending to anonymously leak any kind of information to journalists.
The Dark Web, while only comprised of a small percentage of the Deep Web, attracts considerable attention due to its illicit purposes. These include illegal trade in drugs, weapons, credit cards, personal information, fake identifications, travel documents, banned content like child pornography, criminal services such as assignations and more.
While standard search engines cannot find Deep Web content, there are tools to expose it. Many libraries, for example, offer clients authenticated access to databases and academic journals. Many of these resources include search tools with access to web archives that collect information (not exposed on public-facing web properties) for long-term preservation.
The Dark Web, as its name suggests, is more difficult for finding specific information. For starters, the manner in which data is stored and accessed requires special software that provides anonymity. The Tor Browser and Onion routing are the common elements, which will be described in greater detail in an upcoming post. Many users navigate their way on the Dark Web simply by clicking from link to link or accessing pages that list popular site addresses. Reliability isn’t nearly what surface web users have come to expect, mostly due to the transient and illicit nature of the businesses. Address and domains commonly change, for example. Services come and go. There are also special search engines that maintain an index of sites dealing in specific categories (e.g., drugs and weapons markets) and rendering results according to relevance. There are also gateway services, such as Tor2Web, which offer access content hosted on hidden services via common browsers.
In the next post, we’ll further explore the technology and tools, the available services and products, currencies, geography, and evolution of the Dark Web.
This blog was contributed by Danielle Kreinin, a senior intelligence analyst with FraudAction Research Labs.
RSA’s FraudAction team continuously monitors dark web activity to gather intelligence on a number of threats to financial institutions including new malware, phishing, and rogue apps. This current, actionable information feeds a variety of RSA products and services including RSA Fraud Risk and Intelligence Services.