Home Collections Help
 

NIH Library Web Archives Help

Content

Access

Errors

Content

What is included in the NIH Library Web Archives

Currently, the NIHL Archives consists of 2 collections that cover 40 web sties. We crawl every link that is part of the originating site including images, text, and video.

How is web-content captured?

Web content is capture through a process know as crawling. Most files with a URL are captured during a crawl. Crawls are customizable, and vary according to a particular collection. Examples of crawls are Daily, Weekly, or Monthly. For more information, the Internet Archives maintains a complete glossary of web archiving terms.

What does it mean when there's an asterisk (*) next to a date on the search results page?

An asterisk indicates that the content has been updated from the previously archived copy. If you don't see an asterisk next to an archived page, then the content on the archived page is probably identical to the previously archived copy.

How do I compare two archived versions of a web site?

Yes. First, search for a page. On the results page for a particular url, click compare archive pages at the top of the screen.

compare pages

The page will reload with check boxes next to each date the site was archived. Choose the two versions you would like to compare and hit the compare two dates button (remember that if you don't see an asterisk next to an archived document, then the content on the archived page is probably identical to the previously archived copy).

compare pages 2

Deletions will appear in blue with a line through the text and additions will appear in green.

How do I know the date a site was archived?

The banner at the top of an archived page lists the date and time when a site was captured

capture_date

Why isn't the site I am looking for in the archives?

There are a three reasons why a site may not be in the archives.

  1. It may not meet our criteria for archiving.
  2. Technological reasons may have prevented us from archiving the page
  3. The site owner may have requested that the site not be included in the collection.

What types of web content cannot be captured?

As a crawler visits a site, it will gather and organize the contents it encounters. This is known as harvesting. However, there are certain types of content that our crawler cannot harvest. These are:

  • Robots.txt: A site owner puts a robots.txt file on a site to keep crawlers from crawling the site. Our crawler will not harvest a site that has a robots.txt file.

  • JavaScript: JavaScript elements are often hard to archive and even harder to display. In addition, if JavaScript needs to contact the originating server in order to work, the archived version will not work correctly. Instead, the user will be sent to the live web if the site contains a lot of Java Script. If the site only contains a small amount of Java script (e.g. a visit counter), the site will display properly but the java code item will not.

  • Date Displays: If a site contains code to calculate the current date, the current date will appear on the site regardless of the date it was actually added to the collection. You should check the yellow band at the top of the archived site to determine the date the page was archived.

  • Server Side Image Maps: If the site needs to contact the originating server in order to work, it will fail when archived.

  • Streaming Media: This is a one-way transmission over a data network that is played as it is received and is not stored permanently on the requesting computer. While we can’t harvest streaming media, we can harvest downloadable media files.

  • Password Protected Sites: The crawler cannot collect any site that requires a password or that is database driven because it requires user input. This includes https sites.

  • Form Driven Content: If you need to fill in a form to get access to the content, the crawler typically cannot retrieve this content.

Access

Who has access to the archives?

The pages in the archives are made available to the public for use in research, teaching, and private study, pursuant to the U.S. Copyright Law. The user must assume full responsibility for any use of the materials, including but not limited to, infringement of copyright and publication rights of reproduced materials. For more information about use, please consult contact us.

Can I search the full text of the NIH Library Web Archives?

Full text search capability is available for the archives. The archives can also be searched by URL and file type. For more help with searching, consult the searching help page.

Can I download content from the archives?

We do not prohibit downloading from our collection, however, the user must assume full responsibility for any use of the materials, including but not limited to, infringement of copyright and publication rights of reproduced materials. For more information about use, please consult our copyright statement. Whenever materials from our collection are used in a publication or other product we request that the copy carry a credit line stating "Courtesy of the NIH Library Web Archive".

Errors

Common Error Messages

Below is a list of common error messages you may encounter while searching the archives. If you see an error message that does not have the Internet Archive Wayback Machine logo in the upper left corner, you are most likely looking at an archived error page or the live web.

Failed Connection: The server that the particular piece of information is stored on is down. Generally these errors clear up within two weeks.

Robots.txt Query Exclusion: A site owner puts a robots.txt file on a site to keep crawlers from crawling the site. Our crawler will not harvest a site that has a robots.txt file.

Blocked Site Error: Site owners or copyright holders have requested that the site be excluded from the collection. It is possible that the State Archives obtained a copy of the web site you are looking for directly from the agency without using the automated crawler. Please contact us to determine if the web site is available.

Path Index Error: A path index error message refers to a problem in our database. These errors may take time to fix. If you encounter this error message please alert us to the problem by contacting us and identifying the link that you were trying to reach and the page that you were trying to link from.

Not in Archive: The page you are trying to access is not part of the archives. Refer to this question for reasons why a site may not be included in the archives.

Why can't I see the images on an archived site?

Most images display properly in the archives. When there is a small red "x" where the image should be it means that technological issues prevented the capture of the image content. When an image is grayed out it means that the site owner used robots.txt exclusions to block access to the images directory.

 

Valid XHTML 1.0 Transitional

NIH Library | Office of History, National Institutes of Health
This page was last updated on 4/1/2011