When we search the Web (commonly through Google) most of what we see is on the surface. That's why it is called "the Surface Web." The tools we use are not able to navigate the Deep Web. Also known as "the Invisible Web," the Deep Web includes:
Click through a category to learn how that amount of data translates.
Is NOT named after the Star Wars character Yoda. This commonly cited "fact" appears to have originated from someone making a joke in an article that has since been referenced several times.
Its name comes from the prefix ‘Yotta’ derived from the Ancient Greek οκτώ (októ), meaning “eight”, because it is equal to 1,0008
Data taken from Globally Interconnected Object Databases by Julian Bunn, and The Zettabyte Era Officially Begins (How Much is That?) by Thomas Barnett Jr.
What is the “deep web”? Briefly, it is the information that is not found by using general search engines, (i.e. Google). The “deep web” generally consists of alternative formats (non-text) including audio, video, images, etc.; dynamic sites where information is searchable. Specific examples are people finder directories (AMA, ABA, etc.); patent databases; legal; medical; multimedia, job search sites, travel sites, etc. Deep web information is frequently proprietary (library databases are an example).
Estimates vary as to the percentage that remains untapped by general search engines. Some say upwards of 80-90% of information is not accessible to those general search engines.
General search engines, at this moment, are unable to enter databases or retrieve their contents because the databases are often dynamically generated lacking fixed URLs or need a user-constructed search to function. Google, for an example, is quite closed about what parts of the web it is able to mine and the algorithms behind those searches. We know that Google can mine (or index) parts of these sites but Google is elusive about the depth of its crawl/search. Their reach is only a partial indexing of sites like .gov (census, loc, etc.) Sites that require registration (NYTImes); fee-based (Lexis-Nexis); interactive; behind firewalls are often inaccessible to Google and the like.
Here are some examples of deep websites.
The Census Bureau: http://www.census.gov
FBI In the Vault: http://vault.fbi.gov/
GPO Access: http://www.gpo.gov/fdsys/
Bureau of Labor Statistics: http://www.bls.gov/
Bureau of Justice Statistics: http://www.bjs.gov/
Library of Congress: http://www.loc.gov/
PubMed: http://www.ncbi.nlm.nih.gov/pubmed (found on the ZL Online Resources Page)
Gallup Poll News Service: (Lexis Nexis database found on the ZL Online Resources Page) Source Directory Find--[keyword--Gallup Poll News Service > OK to continue. Kw to searchable topic i.e. social media http://www.lexisnexis.com/hottopics/lnacademic/?
Taylor Publications Index: http://www2.taylor.edu/pubindex/
The Echo (1913-Spring 1922 and 1972 – present)*, Taylor Magazine (1963-present), the Gem (1898-1920)*, and the Express (1996-2007) are being indexed and can be searched via the Index to Publications. (Note this index is incomplete.)
Taylor Yearbooks: The Gem; Illium found in Internet Archive https://archive.org/