![]() Table of Contents
|
Indexing This chapter explains how to help IntraSeek and its users to find more accurate information, and how to control the crawlers progress through your web site in detail.
Supported standards
However, we recommend you to use the "META=robots" tags (described below) instead, as this is more flexible. To maintain the "/robots.txt" file, you must have root privileges, while any HTML writer can control the META method locally.
META Tags
Assume the following scenario: You have a large mailing list archive on the Web, and want to make it searchable with IntraSeek. The main page is an index consisting of a lot of titles and hypertext links to separate pages holding the actual mail content. You index the archive with IntraSeek, which wanders through all pages starting with the index page. Once you make this material searchable, the index web page will probably prove to be the best 'hit' for your searches. The reason for this is that the main index page holds nearly all keywords and terms. But as this page is of no interest for the people who search for a specific topic, we want to exclude it from the IntraSeek search results. The best way to do this is to add:
... on the page, which will tell IntraSeek to skip the content of this page, but still follow the hypertext links found here. The syntax for the "META=robots" tag is:
where "content=robots-terms" is a comma-separated list of one or more of the following terms: ALL, NONE, INDEX, NOINDEX, FOLLOW, NOFOLLOW.
If you just want to remove certain areas of a document from the index and summaries you can use the <no_index> ... </no_index> tag. This tag is only used by IntraSeek and is not standard by any means. Browsers might dislike it and HTML-checkers complain. The reason for the implementation in this way is that there is no official standard on how to solve this issue. By default, new links found within the no_index specified area are followed. Thats why <no_index> takes one optional attribute called nofollow. If the tag is used like this: <no_index nofollow>, new links found within the no_index area will be visited. For example of the usage of <no_index>, see any artist's page at Lothlorien (see References).
Other META tags
| |||||
![]() |
These keywords should describe the content of the page, and are
considered extra important by IntraSeek. We recommend the usage of
meta keywords. For an example on how keywords can be successfully used
for better search results, please check the Lothlorien gallery (see
References). In this gallery, all picture pages also contain 5-15 meta
keywords which describe the subject of the shown picture.
This is how you use meta descriptions: Meta descriptions are used as page summaries when presenting search results, if any are found. If no meta description is found, IntraSeek will create a summary from the text that appears at the top of the page. A good use for meta descriptions could be to avoid the text from navigation interfaces at the top of the pages, which would otherwise become summary.
Frames & Indexing
IntraSeek follows links found in framesets. This is an acceptable
solution, albeit not a good one.
|