Skip to main content

JRN 315 - Advanced Media Writing

This guide is designed to provide resources for JRN 315.

Deep Web

Deep Web

Introduction to [Subject]

When we search the Web (commonly through Google) most of what we see is on the surface. That's why it is called "the Surface Web." The tools we use are not able to navigate the Deep Web. Also known as "the Invisible Web," the Deep Web includes:

  • Dynamic content sources
  • Unlinked content pages 
  • Password-protected websites
  • Limited access content pages
  • Non-HTML/text content that is text that may be encoded in multimedia files or formats not accessible by search engines.

Data on the web

Data on the Web

Zettabytes

More about big data

http://mobirank.pl/2015/01/10/internet-w-2015-roku/

 

Deep Web Infographic

Deep Web

What is a Zettabyte?

Understanding Byte Sizes

Click through a category to learn how that amount of data translates.

 

Data taken from Globally Interconnected Object Databases by Julian Bunn, and The Zettabyte Era Officially Begins (How Much is That?) by Thomas Barnett Jr.

Deep Websites

Deep Websites

Examples of Deep Web Sites

What is the “deep web”? Briefly, it is the information that is not found by using general search engines, (i.e. Google). The “deep web” generally consists of alternative formats (non-text) including audio, video, images, etc.; dynamic sites where information is searchable.  Specific examples are people finder directories (AMA, ABA, etc.); patent databases; legal; medical; multimedia, job search sites, travel sites, etc. Deep web information is frequently proprietary (library databases are an example). 

Estimates vary as to the percentage that remains untapped by general search engines.  Some say upwards of 80-90% of information is not accessible to those general search engines.

General search engines, at this moment, are unable to enter databases or retrieve their contents because the databases are often dynamically generated lacking fixed URLs or need a user-constructed search to function.  Google, for an example, is quite closed about what parts of the web it is able to mine and the algorithms behind those searches.  We know that Google can mine (or index) parts of these sites but Google is elusive about the depth of its crawl/search.  Their reach is only a partial indexing of sites like .gov (census, loc, etc.)  Sites that require registration (NYTImes); fee-based (Lexis-Nexis); interactive; behind firewalls are often inaccessible to Google and the like.

Here are some examples of deep websites.

Deep Websites

Deep Websites

The Census Bureau: http://www.census.gov

FBI: http://www.fbi.gov/foia/ 

FBI In the Vault: http://vault.fbi.gov/ 

GPO Access: http://www.gpo.gov/fdsys/

Bureau of Labor Statistics: http://www.bls.gov/

Bureau of Justice Statistics: http://www.bjs.gov/

Library of Congress: http://www.loc.gov/

PubMed: http://www.ncbi.nlm.nih.gov/pubmed (found on the ZL Online Resources Page)

Gallup Poll News Service: (Lexis Nexis database found on the ZL Online Resources Page) Source Directory Find--[keyword--Gallup Poll News Service > OK to continue.  Kw to searchable topic i.e. social media http://www.lexisnexis.com/hottopics/lnacademic/?

deep trending information sites

Trending Information Sites

deep news sites

Deep Web Sources Taylor specific

Deep Web Taylor Sites & Sources

Taylor Publications Indexhttp://www2.taylor.edu/pubindex/

The Echo (1913-Spring 1922 and 1972 – present)*, Taylor Magazine (1963-present), the Gem (1898-1920)*, and the Express (1996-2007) are being indexed and can be searched via the Index to Publications.  (Note this index is incomplete.)

Taylor Yearbooks: The Gem; Illium found in Internet Archive https://archive.org/

Page Bottom