![]() Table of Contents
|
Other character sets This information is only useful if you plan to run IntraSeek in a country that uses a different character set than Iso-8859-1 (Latin 1) So if you plan to run IntraSeek in English text, you can disregard this chapter. Other charsets supported by IntraSeek are:
| |||
![]() |
Note that "iso-8859-1" is the default charset. It
will be used by default unless something else is specified.
If you have a document in, say Latin 2, you have to make sure that your documents are delivered in the right way. Either you can use the Netscape-proposed META tag for this, as in this example:
Using RXML to put the information in the HTTP header is a better way (It adds the content type information to the HTTP header, making it less dependent on browsers that support META-tags like these), like this:
Note that both examples above should be placed within the <head> container. Now, we should have documents that are handled adequately by the IntraSeek crawler. The reason for this is to make correct lowercase representation of all words. If you search on "FOO" or "FoO" you should still get the same result, so to make a correct lowercase of international characters the crawler has to know which character set to use. Finally, there is one last problem. When the user enters a query, the query itself must be lowercased as well. This is a problem since we do not know which character set the user is using. As far as I know, there is no good way to determine this. A workaround for this, which is not necessarily the best way, is to use a new attribute to the <intraseek_results> tag called charset=. It can be used in this way:
Take careful note of the fact that you should write the name of the charset in lower cased letters (ie, not "ISO-8859-2" or "Iso-8859-2"). You have probably figured out the problem already - you get stuck with a certain charset here. But, luckily the problem is not a major problem in this example because if you are using iso-8859-2 it includes all the correct lower-case cases for iso-8859-1 as well! You can have a look at the modules/languages.h file for more information on the different lowercase tables. |