Class Index | File Index

Classes


Class Collator

A class that implements a locale-sensitive comparator function for use with sorting function. The comparator function assumes that the strings it is comparing contain Unicode characters encoded in UTF-16.

Collations usually depend only on the language, because most collation orders are shared between locales that speak the same language. There are, however, a number of instances where a locale collates differently than other locales that share the same language. There are also a number of instances where a locale collates differently based on the script used. This object can handle these cases automatically if a full locale is specified in the options rather than just a language code.

Options

The options parameter can contain any of the following properties:

Operation

The Collator constructor returns a collator object tailored with the above options. The object contains an internal compare() method which compares two strings according to those options. This can be used directly to compare two strings, but is not useful for passing to the javascript sort function because then it will not have its collation data available. Instead, use the getComparator() method to retrieve a function that is bound to the collator object. (You could also bind it yourself using ilib.bind()). The bound function can be used with the standard Javascript array sorting algorithm, or as a comparator with your own sorting algorithm.

Example using the standard Javascript array sorting call with the bound function:

var arr = ["ö", "oe", "ü", "o", "a", "ae", "u", "ß", "ä"];
var collator = new Collator({locale: 'de-DE', style: "dictionary"});
arr.sort(collator.getComparator());
console.log(JSON.stringify(arr));

Would give the output:

["a", "ae", "ä", "o", "oe", "ö", "ß", "u", "ü"]
When sorting an array of Javascript objects according to one of the string properties of the objects, wrap the collator's compare function in your own comparator function that knows the structure of the objects being sorted:

var collator = new Collator({locale: 'de-DE'});
var myComparator = function (collator) {
  var comparator = collator.getComparator();
  // left and right are your own objects
  return function (left, right) {
  	return comparator(left.x.y.textProperty, right.x.y.textProperty);
  };
};
arr.sort(myComparator(collator));

Sort Keys

The collator class also has a method to retrieve the sort key for a string. The sort key is an array of values that represent how each character in the string should be collated according to the characteristics of the collation algorithm and the given options. Thus, sort keys can be compared directly value-for-value with other sort keys that were generated by the same collator, and the resulting ordering is guaranteed to be the same as if the original strings were compared by the collator. Sort keys generated by different collators are not guaranteed to give any reasonable results when compared together unless the two collators were constructed with exactly the same options and therefore end up representing the exact same collation sequence.

A good rule of thumb is that you would use a sort key if you had 10 or more items to sort or if your array might be resorted arbitrarily. For example, if your user interface was displaying a table with 100 rows in it, and each row had 4 sortable text columns which could be sorted in acending or descending order, the recommended practice would be to generate a sort key for each of the 4 sortable fields in each row and store that in the Javascript representation of the table data. Then, when the user clicks on a column header to resort the table according to that column, the resorting would be relatively quick because it would only be comparing arrays of values, and not recalculating the collation values for each character in each string for every comparison.

For tables that are large, it is usually a better idea to do the sorting on the server side, especially if the table is the result of a database query. In this case, the table is usually a view of the cursor of a large results set, and only a few entries are sent to the front end at a time. In order to sort the set efficiently, it should be done on the database level instead.

Data

Doing correct collation entails a huge amount of mapping data, much of which is not necessary when collating in one language with one script, which is the most common case. Thus, ilib implements a number of ways to include the data you need or leave out the data you don't need using the JS assembly tool:
  1. Full multilingual data - if you are sorting multilingual data and need to collate text written in multiple scripts, you can use the directive "!data collation/ducet" to load in the full collation data. This allows the collator to perform the entire Unicode Collation Algorithm (UCA) based on the Default Unicode Collation Element Table (DUCET). The data is very large, on the order of multiple megabytes, but sometimes it is necessary.
  2. A few scripts - if you are sorting text written in only a few scripts, you may want to include only the data for those scripts. Each ISO 15924 script code has its own data available in a separate file, so you can use the data directive to include only the data for the scripts you need. For example, use "!data collation/Latn" to retrieve the collation information for the Latin script. Because the "ducet" table mentioned in the previous point is a superset of the tables for all other scripts, you do not need to include explicitly the data for any particular script when using "ducet". That is, you either include "ducet" or you include a specific list of scripts.
  3. Only one script - if you are sorting text written only in one script, you can either include the data directly as in the previous point, or you can rely on the locale to include the correct data for you. In this case, you can use the directive "!data collate" to load in the locale's collation data for its most common script.
With any of the above ways of including the data, the collator will only perform the correct language-sensitive sorting for the given locale. All other scripts will be sorted in the default manner according to the UCA. For example, if you include the "ducet" data and pass in "de-DE" (German for Germany) as the locale spec, then only the Latin script (the default script for German) will be sorted according to German rules. All other scripts in the DUCET, such as Japanese or Arabic, will use the default UCA collation rules.

If this collator encounters a character for which it has no collation data, it will sort those characters by pure Unicode value after all characters for which it does have collation data. For example, if you only loaded in the German collation data (ie. the data for the Latin script tailored to German) to sort a list of person names, but that list happens to include the names of a few Japanese people written in Japanese characters, the Japanese names will sort at the end of the list after all German names, and will sort according to the Unicode values of the characters.
Defined in: ilib-full-dyn.js.

Class Summary
Constructor Attributes Constructor Name and Description
 
Collator(options)
Method Summary
Method Attributes Method Name and Description
 
compare(left, right)
Compare two strings together according to the rules of this collator instance.
<static>  
Retrieve the list of ISO 15924 script codes that are available in this copy of ilib.
<static>  
Collator.getAvailableStyles(locale)
Retrieve the list of collation style names that are available for the given locale.
 
Return a comparator function that can compare two strings together according to the rules of this collator instance.
 
sortKey(str)
Return a sort key string for the given string.
Class Detail
Collator(options)
Parameters:
{Object} options
options governing how the resulting comparator function will operate
Method Detail
{number} compare(left, right)
Compare two strings together according to the rules of this collator instance. Do not use this function directly with Array.sort, as it will not have its collation data available and therefore will not function properly. Use the function returned by getComparator() instead.
Parameters:
{string} left
the left string to compare
{string} right
the right string to compare
Returns:
{number} a negative number if left comes before right, a positive number if right comes before left, and zero if left and right are equivalent according to this collator

<static> Collator.getAvailableScripts()
Retrieve the list of ISO 15924 script codes that are available in this copy of ilib. This list varies depending on whether or not the data for various scripts was assembled into this copy of ilib. If the "ducet" data is assembled into this copy of ilib, this method will report the entire list of scripts as being available. If a collator instance is instantiated with a script code that is not on the list returned by this function, it will be ignored and text in that script will be sorted by numeric Unicode values of the characters.
Returns:
Array. an array of ISO 15924 script codes that are available

<static> Collator.getAvailableStyles(locale)
Retrieve the list of collation style names that are available for the given locale. This list varies depending on the locale, and depending on whether or not the data for that locale was assembled into this copy of ilib.
Parameters:
{Locale|string=} locale
The locale for which the available styles are being sought
Returns:
Array. an array of style names that are available for the given locale

{function(...)|undefined} getComparator()
Return a comparator function that can compare two strings together according to the rules of this collator instance. The function returns a negative number if the left string comes before right, a positive number if the right string comes before the left, and zero if left and right are equivalent. If the reverse property was given as true to the collator constructor, this function will switch the sign of those values to cause sorting to happen in the reverse order.
Returns:
{function(...)|undefined} a comparator function that can compare two strings together according to the rules of this collator instance

{string} sortKey(str)
Return a sort key string for the given string. The sort key string is a list of values that represent each character in the original string. The sort key values for any particular character consists of 3 numbers that encode the primary, secondary, and tertiary characteristics of that character. The values of each characteristic are modified according to the strength of this collator instance to give the correct collation order. The idea is that this sort key string is directly comparable byte-for-byte to other sort key strings generated by this collator without any further knowledge of the collation rules for the locale. More formally, if a < b according to the rules of this collation, then it is guaranteed that sortkey(a) < sortkey(b) when compared byte-for-byte. The sort key string can therefore be used without the collator to sort an array of strings efficiently because the work of determining the applicability of various collation rules is done once up-front when generating the sort key.

The sort key string can be treated as a regular, albeit somewhat odd-looking, string. That is, it can be pass to regular Javascript functions without problems.

Parameters:
{string} str
the original string to generate the sort key for
Returns:
{string} a sort key string for the given string

Documentation generated by JsDoc Toolkit 2.4.0 on Tue Feb 02 2016 15:53:54 GMT-0800 (PST)