Table of Contents

  • Introduction
  • Administrator
  • User
    · Introduction
    · Making a searchpage
    · Other character sets
    · Indexing
  • Appendix
  • Indexing
    This chapter explains how to help IntraSeek and its users to find more accurate information, and how to control the crawlers progress through your web site in detail.

    Supported standards
    IntraSeek supports the "/robots.txt" standard. You may wish to consult the Web Robots Pages for more information.

    However, we recommend you to use the "META=robots" tags (described below) instead, as this is more flexible. To maintain the "/robots.txt" file, you must have root privileges, while any HTML writer can control the META method locally.

    META Tags
    IntraSeek supports "noindex" and "nofollow" for the "META=robots" (More information can be found at Lund University Library).

    Assume the following scenario:

    You have a large mailing list archive on the Web, and want to make it searchable with IntraSeek. The main page is an index consisting of a lot of titles and hypertext links to separate pages holding the actual mail content. You index the archive with IntraSeek, which wanders through all pages starting with the index page. Once you make this material searchable, the index web page will probably prove to be the best 'hit' for your searches.

    The reason for this is that the main index page holds nearly all keywords and terms. But as this page is of no interest for the people who search for a specific topic, we want to exclude it from the IntraSeek search results. The best way to do this is to add:

    <meta name="ROBOTS" content="NOINDEX">

    ... on the page, which will tell IntraSeek to skip the content of this page, but still follow the hypertext links found here.

    The syntax for the "META=robots" tag is:

    <meta name="ROBOTS" CONTENT="robots-terms">

    where "content=robots-terms" is a comma-separated list of one or more of the following terms:

    ALL, NONE, INDEX, NOINDEX, FOLLOW, NOFOLLOW.

    NONE
    Tells the IntraSeek crawler to ignore the page, and not retrieve any new hypertext links from it. This works in the same way as: NOINDEX, NOFOLLOW.
    ALL
    No restrictions (default). Index all words, and follow all new links found. This works in the same way as: INDEX, FOLLOW.
    INDEX
    Index this page.
    NOINDEX
    Do not index this page.
    FOLLOW
    Follow all new hypertext links found at this page.
    NOFOLLOW
    Do not follow any new hypertext links found at this page.

    If you just want to remove certain areas of a document from the index and summaries you can use the <no_index> ... </no_index> tag. This tag is only used by IntraSeek and is not standard by any means. Browsers might dislike it and HTML-checkers complain. The reason for the implementation in this way is that there is no official standard on how to solve this issue.

    By default, new links found within the no_index specified area are followed. Thats why <no_index> takes one optional attribute called nofollow. If the tag is used like this: <no_index nofollow>, new links found within the no_index area will be visited.

    For example of the usage of <no_index>, see any artist's page at Lothlorien (see References).

    Other META tags
    IntraSeek supports META keywords. Do use them.

    <meta name="keywords" Content="KEYWORD,KEYWORD,KEYWORD....">

    These keywords should describe the content of the page, and are considered extra important by IntraSeek. We recommend the usage of meta keywords. For an example on how keywords can be successfully used for better search results, please check the Lothlorien gallery (see References). In this gallery, all picture pages also contain 5-15 meta keywords which describe the subject of the shown picture.

    This is how you use meta descriptions:

    Meta descriptions are used as page summaries when presenting search results, if any are found. If no meta description is found, IntraSeek will create a summary from the text that appears at the top of the page.

    A good use for meta descriptions could be to avoid the text from navigation interfaces at the top of the pages, which would otherwise become summary.

    <meta name="description" Content="Here you write the summary of page content.....">

    Frames & Indexing
    Frames are a big problem for all search engines. Most site maintainers optimize their web pages for the latest versions of Netscape and IE, ignoring the <noframes> options. Right now, no search engine on the market can follow frames and reconstruct the exact view (frameset). It is very difficult, if not impossible, to keep track of all dynamic frames and framesets, and if Java script is used to generate links, it becomes even harder.

    IntraSeek follows links found in framesets. This is an acceptable solution, albeit not a good one.
    For more information on this topic, check out the page 'How to use frames and still be spidered' at the Search engine watch pages (See References).