Protecting Information from being cached by Public Search Engines

For Webmasters

To protect a Website from being Cached.

Through robot.txt

Some crawlers for search engines obey the Robot Exclusion Standard. We can simply include a "robots.txt" file to a root directory of web servers to tell crawlers what pages to be excluded. The "robots.txt" should look like the following:

User-agent: *
Disallow: /cgi-bin/
Disallow: /private/

For Webmasters and Web Publishers

To protect Web pages from being cached.

Through Meta tags

Another method is to add a NOINDEX tag to web pages those we want to exclude.

For Yahoo and Google, the tag should look like
<META NAME="robots" CONTENT="noindex">

For MSN, the tag should look like
<META NAME="*" CONTENT="noindex" />

References

  1. Yahoo
    https://help.yahoo.com/kb/search-for-desktop/remove-search-results-yahoo-search-sln4530.html
  2. Bing
    https://www.bing.com/webmasters/help/content-removal-broken-links-or-outdated-cache-cb6c294d
  3. Google
    https://support.google.com/websearch/troubleshooter/3111061

 

IT.ServiceDesk@cityu.edu.hk