|Understanding Keyword Indexing and the HKWT (Helix Keyword Separator Table) Resource|
In Helix, a keyword is any contiguous series of characters that do not include a Keyword Separator. Helix uses the term “keyword” and not simply “word” because any character can be set to act as a word character. Characters not defined as word characters are called keyword separators, as they determine the breaks between keywords.
|About the Helix Keyword Separator Table||
Helix maintains an internal table that specifies which characters are treated as word characters vs. keyword separators. This table is know as the Keyword Separator Table is stored in a HKWT resource in the resource fork of the application and/or collection. This table can be modified by the end user, but there is no built-in interface to it, and so is typically overlooked or restricted to use by technically savvy users.
Helix users may desire to modify the Keyword Separator Table in order to change the behavior of Helix’s Mixed Case ◊ and word ◊ … tiles, and keyword-based queries. For example, a designer that wishes to allow keyword searches on names such as “O’Malley” could modify the Keyword Separator Table to include the apostrophe as a word character.
|Application-Based Rules Issue in Helix 6.0 & Prior||
In Helix 6.0 and prior, the HKWT resource that defines the Keyword Separator Table is stored within the Helix application, not the collection. If a collection relies on a modified Keyword Separator Table, the user must remember to update the HKWT resource with every new version of Helix installed. Failure to maintain the HKWT resource results in inconsistent results in Keyword-based searches.
Starting in Helix 6.1 (and RADE 6.1.1): Each collection can now optionally have its own HKWT resource, providing a customized Keyword Separator Table. Helix now checks for the HKWT resource inside the collection before loading the default HKWT from the application. Note, however, that the HKWT resource is not automatically added to new collections: it must be explicitly copied into the collection when collection-level specificity is desired. (See Keyword Management Utility below.)
|Missing Characters Issue||
In Helix 6.0 and earlier, many ‘High ASCII’ characters that are used frequently in European languages were being ignored as word characters. Consequently, words containing characters such as Å and Ø were excluded from keyword searches.
These omissions — including the fi & fl ligatures — have been corrected in Helix 6.1.
Non-Breaking Space: Curiously, even the non-breaking space (NBSP) character was not considered a word character before, even thought the very definition of the non-breaking space speaks for its inclusion. The non-breaking space — created by pressing option-space — is now treated as a word character.
Note: These changes first appeared in Helix Client/Server and Helix Engine 6.1, but not in Classic Helix RADE until release 6.1.1.
Note: The German Eszett: ß (ASCII character 0xA7) was not added to the Word Character list until Helix 6.1.3 (all products).
|Inaccurate Documentation Issue||
It was also discovered that the previously published information, such as that found in Appendix A of The Helix Reference is inaccurate. For example: É is noted as being a word character, but in the actual HKWT resource, it was being treated as a separator.
This technote corrects the documentation and represents the official published specification for the Helix Keyword Separator Table in Helix 6.1 and later.
|Important Note on Keyword Index Changes||
When the Keyword Separator Table is changed, Keyword Indexes in all affected collections must be rebuilt. Otherwise, pre-existing entries that were created while the old table was in effect remain in the index, and the index will be unreliable. Currently there is no code in Helix to detect this situation and automatically rebuild keyword indexes. Helix Utility includes a Break All Indexes command, but that also breaks regular field indexes, which are not affected by this change.
If the Keyword Index is not rebuilt, this problem can occur: when a keyword field is modified, the words in the field are added to the Keyword Index based on the new table. However, the old entries for that field, based on the old table, are not removed from the index. The keyword reindexing code searches for words to remove based on the current table, as it has no access to the table that was in effect when the data was previously entered. This can cause duplicate ‘hits’ when doing keyword searches.
In summary: when changing the Keyword Separator Table, be sure to rebuild all existing Keyword Indexes in any affected collections. Helix 6.2 (or later) users can easily rebuild just the Keyword Indexes in a collection with the Rebuild Keyword Indexes script available on the AppleScripts for Helix page.
|Default Keyword Separator Tables, with Revisions for Helix 6.1||
These are the official, accurate tables for various versions of Helix. Those found in older references should be discarded.
Note: The Keyword Separator Table changes in Helix 6.1. page highlights the changes made in Helix 6.1 by specifically noting the changed characters.
|Keyword Management Utility||
A simple utility has been developed to help manage HKWT resources. The Keyword Management Utility is an AppleScript that can:
You can also edit the HKWT resource directly using a resource editor. See our Resource Editing page for information on resource editors.
|Changes In macOS||
Classic Helix allows you to specify any search string when specifying keyword-based restriction using the Word Starts With & Word Equals operators in Form and Power Queries, even logically impossible strings that contain Separator characters. macOS Helix checks the search term against the Keyword Separator Table and reports an error when Separator characters are included. (In Classic a query can be specified that can never produce results because keyword separator characters are included.)
Terms Used In This Technote:
|The Helix Reference||
Keywords are described in the following sections of The Helix Reference: