Table of Contents
Introduction
Administrator
User
Appendix
|
Advanced profile configuration
This page contains the more advanced configuration options not
mentioned on the "New profile page" are described.
- Activated profile
-
If "active" the intraseek engine will try to mount the data base at
startup, and it will also be scheduled for automatic crawler launches.
- Automated update of the data bases
-
Defines at which intervals the data bases should be automatically
updated. If you want the crawler for a profile to be automatically
launched at certain intervals to keep the index data bases up-to-date,
you can use this function. The selections should be rather self
explanatory. If you select Never! at the first selection, the
two following (days and time) will be ignored and
crawlers will not be automatically launched. For more status reports
on scheduled crawlers, look at the logs page (in the Intraseek
configuration interface).
- Crawler log detail level
-
Selects how much information the crawler should write in the log.
, Fatal Errors, Errors, Warnings. Reports. Scheduler info., Rejects. Accepts
Full (Default), Yes, Yes, Yes, Yes
Medium, Yes, Yes, Yes, No
Short, Yes, Yes, No, No
None, Yes, No, No, No
|
Fatal Errors |
Errors |
Warnings.
Reports.
Scheduler info. |
Rejects,
Accepts |
Full (Default) |
Yes |
Yes |
Yes |
Yes |
Medium |
Yes |
Yes |
Yes |
No |
Short |
Yes |
Yes |
No |
No |
None |
Yes |
No |
No |
No |
|
|
- Crawler walk pause
-
The crawler is quite fast. If you index pages outside your own net,
you should slow it down somewhat by changing the "Crawler walk pause"
between each document download, which is the number of seconds the
crawler should pause before fetching the next document. This keeps it
from causing large loads on the web servers of others.
- Crawler nice increment
-
Sets the "nice level" of the crawler. If you are not familiar with
UNIX, the UNIX-command man nice will explain things
further. As a short summary, the nice increment level defines how nice
the process should be with usage of system resources. A crawler with a
high nice value will happily give away processor time to other
processes. The level is zero by default, and the maximum level is
19. If the web server runs as root, it is possible to make the process
more "mean" by setting nice increment as far down as -19.
- Stop lists
-
Specifies which lists of short, common words that should not be
indexed, e g "and" or "again". Here, several stop lists are
specified. Select one or more (or none) depending on which languages
are used on your server's pages. The stop lists are stored as ordinary
text files in the directory ENGINE_HOME/resource/.
(For further technical details on this function, check the chapter
IntraSeek and
memory usage.
- Additional stop words
-
Indicates extra stop words that should be filtered. If "Yoyodyne
Productions" is present on every one of your pages, it may be a good
idea to specify "yoyodyne" and "productions" here. The disadvantage is
that it will not then be ossible to search for the words "yoyodyne" or
"productions", the advantage is that the data base files will be
smaller, and searches faster.
- Query Logs active
-
If "Yes", queries will be logged to disk, for top 100
statistics, and such.
- Safety save
-
This value says how many pages a crawler should go through before
automatic saving and reorganization of its data base.
For further technical details on this function, check the chapter IntraSeek and memory
usage.
- Max documents to download
-
Specifies the maximum number of pages the crawler should index. It is
a good idea to specify a maximum here. In case something should go
wrong, you avoid having the entire partition filled by a huge data
base. Going wrong usually means that the robot has become lost on the
Internet, due to erroneously written accept and avoid patterns.
- Crawler page fetch Timeout
-
Defines how many seconds that should pass before the download of a
page will be aborted. For example, if a crawler can connect to a page,
but doesn't get anything from the web server in the other end, it
would patiently wait for information - forever, if it hadn't been for
this setting. Enter how many seconds you allow the fetcher to do its
work.
- Site structure logging
-
Creates logs of web site errors and warnings. If active, site
structure logs will be generated for this profile. See the logs chapter for information on site structure
logs. If you are not interested in the site structure log, you can
turn it off here and save crawler time consumption, space on disc and
memory. (You will benefit from less memory usage by the crawler, and
avoid logs that take place on disc. The operations controlling the log
will be disabled as well.)
- Max size of query logs
-
Is specified in bytes. When a query log exceeds this size, it will be
moved to a .bak file. The old .bak file will be removed.
- Number of max quick links displayed
-
.
If you have a search resulting in several hundred pages, a list of
several links to the next pages of result will be displayed below the
list of summaries. The "quick links" referred to here, are the maximum
number of links to show.
- Number of documents summaries
-
Defines how many search summaries to display at every page.
- Quoted search enabled
-
If set to Yes, the users of the search engine can use quotation
marks to search for a phrase. For example, a search for "John Carl
Smith" will search for persons with this name. Without quotes, the
search would return any pages that use any of those common
names.
|
|
Note that an extra data base will be
used to store the extra information, if this setting is enabled. With
the current implementation of full text searches we cannot guarantee
good performance for data bases covering more than 1000 documents. If
you have more documents, turn this option off, or IntraSeek can
sometimes get stuck with heavy calculations for several seconds.
- Wildcards enabled
-
If set to Yes, the users of the search engine can use quotation
marks (?) and asterisks (*) to broaden searches. A search for
net* might match "netscape", "nethack", "network" and so on. A
search for int??net matches "intranet" as well as "internet".
|
|
Note that IntraSeek requires that the user specifies at least three
characters in front of the * notation, and that there is no
distinction made between lower- and uppercase searches. Also note that
an extra data base will be used to store the extra information, if
this setting is enabled.
- Summary text length
-
Is the length (in characters) of the summaries displayed along with
the search results and the link and the hit percent. If used on the
web page, the Meta description will be used for this, otherwise the
first part of the document becomes a summary.
|