|R7708: Changes to Sorting and Equivalency in macOS (The HTAB/3000 Resource Reference)|
Specification Change (plus known problem)
|Equivalency Tests Can Change||
Because the way macOS sorts characters is different than Classic, it is possible that certain tests that worked in Classic will fail in macOS. For example, the tile Name < ~ (a test to make sure the name does not start with a character lower than A in the ASCII chart) would return false in Classic. However, in macOS, the tilde (~) is sorted higher than the regular alpha characters, so that test will return true for regular names.
The solution for cases such as this is to construct the test to use the lowest (or highest) character inclusive to the set, providing a character set independent implementation. As Helix moves to Unicode, where the rules are always subject to revision, it will become more and more important to avoid solutions that rely on assumptions that rules not explicitly specified will never change.
In the case above, a better test would be A ≤ Extract 1 thru 1 from Name ≤ Z if the goal is to ensure that the text entered into the name field starts with an alphabetic character in the range of A–Z. (Drop that into a Not ▣ tile to find entries that do not meet the specification.)
|From Mac-Roman to Unicode||
macOS changes the sorting rules as compared to Classic Mac OS. One example is noted above, where Classic, sort the tilde character below alpha characters, but macOS sorts it above them. Why the change?
macOS follows the Unicode Consortium’s sorting (‘collation’) rules as specified by this technote. That rather dense specification document includes paragraphs such as this:
For scripts and characters not used in a particular language, explicit rules may not exist. For example, Swedish and French have clearly specified, distinct rules for sorting ä (either after z or as an accented character with a secondary difference from a), but neither defines the ordering of characters such as Ж, ש, ♫, ∞, ◊, or ⌂.
Most important is this paragraph:
Collation order is not fixed. Over time, collation order will vary: there may be fixes needed as more information becomes available about languages; there may be new government or industry standards for the language that require changes; and finally, new characters added to the Unicode Standard will interleave with the previously-defined ones. This means that collations must be carefully versioned.
The net result of this situation is that sorting rules in Helix have changed, and will most likely change again, as macOS implements new rules set forth by this independent body.
|Helix and the HTAB/3000 Resource||
When a collection is created, a permanent sort order and equivalency table is created and stored in a resource with a type of HTAB and an ID of 3000. (Hence: HTAB/3000). This resource controls a collection’s sort order and its character equivalencies. The sort order aspect is used to determine that “Ed” appears before “Fred” in a list. The equivalency aspect is used to determine that “Ed = ed” when you are doing a query on that data.
The information that goes into the HTAB/3000 table is taken from the operating system, so that a collection created on a Swedish system will sort according to Swedish sorting rules, which differ from German sorting rules. (To give one example: z < ö in Swedish, but ö < z in German.)
Because this table is built when a collection is created, it is critical that the system’s default language be set according to the desired sorting rules before creating a new collection. Once the collection is saved, the HTAB/3000 does not change.
The primary use of the HTAB/3000 table is in indexing text and styled text fields. Unlike a collection or relation window, which will at most contain a few hundred icons and can be sorted on the fly, relations can contain millions of records, making it vital to index certain fields to achieve acceptable performance. HTAB/3000 provides an unchanging reference table for ensuring that indexes remain consistent regardless of changing conditions at the operating system level.
If it does become necessary to change the sort order and equivalency of a collection, it is critical that Helix Utility be used first to break all indexes in the collection, so they can be rebuilt using the rules of the updated HTAB/3000. The most practical way to change the table is to create a new collection in the proper language, then use a resource editor to copy the HTAB/3000 from the new collection and paste it into the old, replacing the existing HTAB/3000. Please contact our tech support department if you need assistance in this task.
|Sorting and Equivalency, Before and After||
The charts at the right* show the HTAB/3000 resource, as created by various versions of Helix. Keep in mind that this table is never modified after a collection is created, so a collection that was created in (for example) Helix 4.5 will still have a ‘Prior to 6.1’ table, even after it has been updated to Helix 6.2 or later.
Prior to Helix 6.1 — that is, when Helix RADE was a Classic application with no understanding of macOS — the sorting and equivalency follow a simple progression: non-printable characters, then numbers, US-English punctuation and other ‘standard’ symbols in ascending ASCII value order. Next comes alphabetic characters — including diacritical variations — followed by more symbols again in ascending ASCII value order. Those whose ASCII value is ≥ 128 are referred to as ‘High-ASCII’ characters, and are replaced by their equivalent Unicode characters in Helix 7.0 and later.
Helix 6.1 stands alone of all the versions in the way it created the HTAB/3000 resource. It largely preserves the order of prior versions, with the exception of the tab and return characters, which sort after all other non-printable characters, and in the fact that tab, return, space and non-breaking space are treated as equivalent in text comparisons.
Helix 6.2 and later use a significantly different — and more logical — sorting order as specified by macOS. As always, non-printable characters come first, with the tab, return, space and non-breaking space being treated the same as in Helix 6.1. Next comes the non-alphanumeric symbols arranged in a fairly logical order: all currency symbols being grouped together, for example. Finally come the numbers and alpha characters, with a few changes, such as placing ™ between t and u.
* These tables represent the HTAB/3000 resource, as created on a US English system. Collections created on other systems may have different sorting and equivalency rules applied.
|Helix 6.2b1–b12 Bug||
There is one addition to the sorting and equivalency information presented above. Collections created in Helix RADE 6.2b1 (5698) through 6.2b12 (5820) contain a HTAB/3000 that was constructed using “case and diacritical sensitive” rules. This means that for every sort and equivalency test on text or styled text data, ed ≠ Ed. This causes comparisons that the Helix specification considers as equal to fail, and causes lists to sort with case- and diacritic-sensitivity applied.
Only collections created in Helix RADE 6.2b1 (5698) through 6.2b12 (5820) are affected by this bug, which was fixed in Helix RADE 6.2b13 (5821).
The sorting and equivalency table created by these versions is shown at right. If you discover that your collection is affected by this bug, contact our tech support department immediately and we will guide you through the steps to correct it.
The HTAB/3000 resource was designed when Helix ran on systems that supported ASCII text encodings, and is an impractical solution for Unicode-based text handling. Although Helix 7.0 is the first version to support Unicode text entry, it still uses the HTAB/3000 resource, causing all Unicode characters (except Roman characters with diacriticals) to sort together, in the order the records were created. A future version of Helix will fully support Unicode sort (collation) and equivalency rules.
See R7134: Unicode Support in Helix 7.0 for more information on Unicode in Helix.