RuleBasedCollator

Kotlin |Java

class RuleBasedCollator : Collator

kotlin.Any
↳	android.icu.text.Collator
	↳	android.icu.text.RuleBasedCollator

RuleBasedCollator is a concrete subclass of Collator. It allows customization of the Collator via user-specified rule sets. RuleBasedCollator is designed to be fully compliant to the Unicode Collation Algorithm (UCA) and conforms to ISO 14651.

A Collator is thread-safe only when frozen. See #isFrozen() and android.icu.util.Freezable.

Users are strongly encouraged to read the User Guide for more information about the collation service before using this class.

Create a RuleBasedCollator from a locale by calling the getInstance(Locale) factory method in the base class Collator. Collator.getInstance(Locale) creates a RuleBasedCollator object based on the collation rules defined by the argument locale. If a customized collation ordering or attributes is required, use the RuleBasedCollator(String) constructor with the appropriate rules. The customized RuleBasedCollator will base its ordering on the CLDR root collation, while re-adjusting the attributes and orders of the characters in the specified rule accordingly.

RuleBasedCollator provides correct collation orders for most locales supported in ICU. If specific data for a locale is not available, the orders eventually falls back to the CLDR root sort order.

For information about the collation rule syntax and details about customization, please refer to the Collation customization section of the User Guide.

Note that there are some differences between the Collation rule syntax used in Java and ICU4J:

According to the JDK documentation:
Modifier '!' : Turns on Thai/Lao vowel-consonant swapping. If this rule is in force when a Thai vowel of the range \U0E40-\U0E44 precedes a Thai consonant of the range \U0E01-\U0E2E OR a Lao vowel of the range \U0EC0-\U0EC4 precedes a Lao consonant of the range \U0E81-\U0EAE then the vowel is placed after the consonant for collation purposes.
If a rule is without the modifier '!', the Thai/Lao vowel-consonant swapping is not turned on.
ICU4J's RuleBasedCollator does not support turning off the Thai/Lao vowel-consonant swapping, since the UCA clearly states that it has to be supported to ensure a correct sorting order. If a '!' is encountered, it is ignored.
As mentioned in the documentation of the base class Collator, compatibility decomposition mode is not supported.

Examples

Creating Customized RuleBasedCollators:

String simple = "& a < b < c < d";
  RuleBasedCollator simpleCollator = new RuleBasedCollator(simple);
 
  String norwegian = "& a , A < b , B < c , C < d , D < e , E "
                     + "< f , F < g , G < h , H < i , I < j , "
                     + "J < k , K < l , L < m , M < n , N < "
                     + "o , O < p , P < q , Q <r , R <s , S < "
                     + "t , T < u , U < v , V < w , W < x , X "
                     + "< y , Y < z , Z < \u00E5 = a\u030A "
                     + ", \u00C5 = A\u030A ; aa , AA < \u00E6 "
                     + ", \u00C6 < \u00F8 , \u00D8";
  RuleBasedCollator norwegianCollator = new RuleBasedCollator(norwegian);

Concatenating rules to combine Collators:

// Create an en_US Collator object
  RuleBasedCollator en_USCollator = (RuleBasedCollator)
      Collator.getInstance(new Locale("en", "US", ""));
  // Create a da_DK Collator object
  RuleBasedCollator da_DKCollator = (RuleBasedCollator)
      Collator.getInstance(new Locale("da", "DK", ""));
  // Combine the two
  // First, get the collation rules from en_USCollator
  String en_USRules = en_USCollator.getRules();
  // Second, get the collation rules from da_DKCollator
  String da_DKRules = da_DKCollator.getRules();
  RuleBasedCollator newCollator =
                              new RuleBasedCollator(en_USRules + da_DKRules);
  // newCollator has the combined rules

Making changes to an existing RuleBasedCollator to create a new Collator object, by appending changes to the existing rule:

// Create a new Collator object with additional rules
  String addRules = "& C < ch, cH, Ch, CH";
  RuleBasedCollator myCollator =
      new RuleBasedCollator(en_USCollator.getRules() + addRules);
  // myCollator contains the new rules

How to change the order of non-spacing accents:

// old rule with main accents
  String oldRules = "= \u0301 ; \u0300 ; \u0302 ; \u0308 "
                  + "; \u0327 ; \u0303 ; \u0304 ; \u0305 "
                  + "; \u0306 ; \u0307 ; \u0309 ; \u030A "
                  + "; \u030B ; \u030C ; \u030D ; \u030E "
                  + "; \u030F ; \u0310 ; \u0311 ; \u0312 "
                  + "< a , A ; ae, AE ; \u00e6 , \u00c6 "
                  + "< b , B < c, C < e, E & C < d , D";
  // change the order of accent characters
  String addOn = "& \u0300 ; \u0308 ; \u0302";
  RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);

Putting in a new primary ordering before the default setting, e.g. sort English characters before or after Japanese characters in the Japanese Collator:

// get en_US Collator rules
  RuleBasedCollator en_USCollator
                         = (RuleBasedCollator)Collator.getInstance(Locale.US);
  // add a few Japanese characters to sort before English characters
  // suppose the last character before the first base letter 'a' in
  // the English collation rule is \u2212
  String jaString = "& \u2212 <\u3041, \u3042 <\u3043, "
                    + "\u3044";
  RuleBasedCollator myJapaneseCollator
               = new RuleBasedCollator(en_USCollator.getRules() + jaString);

This class is not subclassable

Summary

Inherited constants

From class Collator

`Int`	`CANONICAL_DECOMPOSITION` Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation. CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.
`Int`	`FULL_DECOMPOSITION` [icu] Note: This is for backwards compatibility with Java APIs only. It should not be used, IDENTICAL should be used instead. ICU's collation does not support Java's FULL_DECOMPOSITION mode.
`Int`	`IDENTICAL` Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation. Note this value is different from JDK's
`Int`	`NO_DECOMPOSITION` Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator. Note this value is different from the JDK's.
`Int`	`PRIMARY` Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.
`Int`	`QUATERNARY` [icu] Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuation in the User Guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.
`Int`	`SECONDARY` Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.
`Int`	`TERTIARY` Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.

Public constructors
`RuleBasedCollator(rules: String!)` Constructor that takes the argument rules for customization.

Public methods
Any	`clone()` Clones the RuleBasedCollator
RuleBasedCollator!	`cloneAsThawed()` Provides for the clone operation.
Int	`compare(source: String!, target: String!)` Compares the source text String to the target text String according to the collation rules, strength and decomposition mode for this RuleBasedCollator.
Boolean	`equals(other: Any?)` Compares the equality of two Collator objects.
Collator!	`freeze()` Freezes the collator.
CollationElementIterator!	`getCollationElementIterator(source: UCharacterIterator!)` Return a CollationElementIterator for the given UCharacterIterator.
CollationElementIterator!	`getCollationElementIterator(source: String!)` Return a CollationElementIterator for the given String.
CollationElementIterator!	`getCollationElementIterator(source: CharacterIterator!)` Return a CollationElementIterator for the given CharacterIterator.
CollationKey!	`getCollationKey(source: String!)` Get a Collation key for the argument String source from this RuleBasedCollator.
Unit	`getContractionsAndExpansions(contractions: UnicodeSet!, expansions: UnicodeSet!, addPrefixes: Boolean)` Gets unicode sets containing contractions and/or expansions of a collator
Int	`getDecomposition()` Returns the decomposition mode of this Collator.
Int	`getMaxVariable()` [icu] Returns the maximum reordering group whose characters are affected by the alternate handling behavior.
Boolean	`getNumericCollation()` Method to retrieve the numeric collation value.
IntArray!	`getReorderCodes()` Retrieves the reordering codes for this collator.
String!	`getRules()` Gets the collation tailoring rules for this RuleBasedCollator.
String!	`getRules(fullrules: Boolean)` Returns current rules.
Int	`getStrength()` Returns this Collator's strength attribute.
UnicodeSet!	`getTailoredSet()` Get a UnicodeSet that contains all the characters and sequences tailored in this collator.
VersionInfo!	`getUCAVersion()` Get the UCA version of this collator object.
Int	`getVariableTop()` [icu] Gets the variable top value of a Collator.
VersionInfo!	`getVersion()` Get the version of this collator object.
Int	`hashCode()` Generates a unique hash code for this RuleBasedCollator.
Boolean	`isAlternateHandlingShifted()` Checks if the alternate handling behavior is the UCA defined SHIFTED or NON_IGNORABLE.
Boolean	`isCaseLevel()` Checks if case level is set to true.
Boolean	`isFrenchCollation()` Checks if French Collation is set to true.
Boolean	`isFrozen()` Determines whether the object has been frozen or not.
Boolean	`isLowerCaseFirst()` Return true if a lowercase character is sorted before the corresponding uppercase character.
Boolean	`isUpperCaseFirst()` Return true if an uppercase character is sorted before the corresponding lowercase character.
Unit	`setAlternateHandlingDefault()` Sets the alternate handling mode to the initial mode set during construction of the RuleBasedCollator.
Unit	`setAlternateHandlingShifted(shifted: Boolean)` Sets the alternate handling for QUATERNARY strength to be either shifted or non-ignorable.
Unit	`setCaseFirstDefault()` Sets the case first mode to the initial mode set during construction of the RuleBasedCollator.
Unit	`setCaseLevel(flag: Boolean)` When case level is set to true, an additional weight is formed between the SECONDARY and TERTIARY weight, known as the case level.
Unit	`setCaseLevelDefault()` Sets the case level mode to the initial mode set during construction of the RuleBasedCollator.
Unit	`setDecomposition(decomposition: Int)` Sets the decomposition mode of this Collator.
Unit	`setDecompositionDefault()` Sets the decomposition mode to the initial mode set during construction of the RuleBasedCollator.
Unit	`setFrenchCollation(flag: Boolean)` Sets the mode for the direction of SECONDARY weights to be used in French collation.
Unit	`setFrenchCollationDefault()` Sets the French collation mode to the initial mode set during construction of the RuleBasedCollator.
Unit	`setLowerCaseFirst(lowerfirst: Boolean)` Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY.
RuleBasedCollator!	`setMaxVariable(group: Int)` [icu] Sets the variable top to the top of the specified reordering group.
Unit	`setNumericCollation(flag: Boolean)` [icu] When numeric collation is turned on, this Collator makes substrings of digits sort according to their numeric values.
Unit	`setNumericCollationDefault()` Method to set numeric collation to its default value.
Unit	`setReorderCodes(vararg order: Int)` Sets the reordering codes for this collator.
Unit	`setStrength(newStrength: Int)` Sets this Collator's strength attribute.
Unit	`setStrengthDefault()` Sets the collation strength to the initial mode set during the construction of the RuleBasedCollator.
Unit	`setUpperCaseFirst(upperfirst: Boolean)` Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY.

Inherited functions

From class Collator

`Int`	`compare(source: Any!, target: Any!)` Compares the source Object to the target Object.
`Boolean`	`equals(source: String!, target: String!)` Compares the equality of two text Strings using this Collator's rules, strength and decomposition mode. Convenience method.
`Array<Locale!>!`	`getAvailableLocales()` Returns the set of locales, as Locale objects, for which collators are installed. Note that Locale objects do not support RFC 3066.
`Array<ULocale!>!`	`getAvailableULocales()` [icu] Returns the set of locales, as ULocale objects, for which collators are installed. ULocale objects support RFC 3066.
`String!`	`getDisplayName(objectLocale: ULocale!)` [icu] Returns the name of the collator for the objectLocale, localized for the default `DISPLAY` locale.
`String!`	`getDisplayName(objectLocale: ULocale!, displayLocale: ULocale!)` [icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.
`String!`	`getDisplayName(objectLocale: Locale!)` [icu] Returns the name of the collator for the objectLocale, localized for the default `DISPLAY` locale.
`String!`	`getDisplayName(objectLocale: Locale!, displayLocale: Locale!)` [icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.
`IntArray!`	`getEquivalentReorderCodes(reorderCode: Int)` Retrieves all the reorder codes that are grouped with the given reorder code. Some reorder codes are grouped and must reorder together. Beginning with ICU 55, scripts only reorder together if they are primary-equal, for example Hiragana and Katakana.
`ULocale!`	`getFunctionalEquivalent(keyword: String!, locID: ULocale!)` [icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.
`ULocale!`	`getFunctionalEquivalent(keyword: String!, locID: ULocale!, isAvailable: BooleanArray!)` [icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service. If two locales return the same result, then collators instantiated for these locales will behave equivalently. The converse is not always true; two collators may in fact be equivalent, but return different results, due to internal details. The return result has no other meaning than that stated above, and implies nothing as to the relationship between the two locales. This is intended for use by applications who wish to cache collators, or otherwise reuse collators when possible. The functional equivalent may change over time. For more information, please see the Locales and Services section of the ICU User Guide.
`Collator!`	`getInstance()` Returns the Collator for the current default locale. The default locale is determined by java.util.Locale.getDefault().
`Collator!`	`getInstance(locale: ULocale!)` [icu] Returns the Collator for the desired locale. For some languages, multiple collation types are available; for example, "de@collation=phonebook". Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in language tag syntax ("el-u-kf-upper"). See User Guide: Collation API.
`Collator!`	`getInstance(locale: Locale!)` Returns the Collator for the desired locale. For some languages, multiple collation types are available; for example, "de-u-co-phonebk". Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper", only with `ULocale`) or in language tag syntax ("el-u-kf-upper"). See User Guide: Collation API.
`Array<String!>!`	`getKeywordValues(keyword: String!)` [icu] Given a keyword, returns an array of all values for that keyword that are currently in use.
`Array<String!>!`	`getKeywordValuesForLocale(key: String!, locale: ULocale!, commonlyUsed: Boolean)` [icu] Given a key and a locale, returns an array of string values in a preferred order that would make a difference. These are all and only those values where the open (creation) of the service with the locale formed from the input locale plus input keyword and that value has different behavior than creation with the input locale alone.
`Array<String!>!`	`getKeywords()` [icu] Returns an array of all possible keywords that are relevant to collation. At this point, the only recognized keyword for this service is "collation".

From class Comparator

`Comparator<T>!`	`reversed()` Returns a comparator that imposes the reverse ordering of this comparator.
`Comparator<T>!`	`thenComparing(other: Comparator<in T>!)` Returns a lexicographic-order comparator with another comparator. If this `Comparator` considers two elements equal, i.e. `compare(a, b) == 0`, `other` is used to determine the order. The returned comparator is serializable if the specified comparator is also serializable.
`Comparator<T>!`	`thenComparing(keyExtractor: Function<in T, out U>!)` Returns a lexicographic-order comparator with a function that extracts a `Comparable` sort key.
`Comparator<T>!`	`thenComparing(keyExtractor: Function<in T, out U>!, keyComparator: Comparator<in U>!)` Returns a lexicographic-order comparator with a function that extracts a key to be compared with the given `Comparator`.
`Comparator<T>!`	`thenComparingDouble(keyExtractor: ToDoubleFunction<in T>!)` Returns a lexicographic-order comparator with a function that extracts a `double` sort key.
`Comparator<T>!`	`thenComparingInt(keyExtractor: ToIntFunction<in T>!)` Returns a lexicographic-order comparator with a function that extracts an `int` sort key.
`Comparator<T>!`	`thenComparingLong(keyExtractor: ToLongFunction<in T>!)` Returns a lexicographic-order comparator with a function that extracts a `long` sort key.

Parameters
`source`	String!: the source text String.
`target`	String!: the target text String.

Parameters
`contractions`	UnicodeSet!: if not null, set to contain contractions
`expansions`	UnicodeSet!: if not null, set to contain expansions
`addPrefixes`	Boolean: add the prefix contextual elements to contractions

Exceptions
`java.lang.IllegalArgumentException`	if the new strength value is not valid.
`java.lang.IllegalArgumentException`	If the new strength value is not one of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.

RuleBasedCollator

Summary

Public constructors

RuleBasedCollator

Public methods

clone

cloneAsThawed

compare

equals

freeze

getCollationElementIterator

getCollationElementIterator

getCollationElementIterator

getCollationKey

getContractionsAndExpansions

getDecomposition

getMaxVariable

getNumericCollation

getReorderCodes

getRules

getRules

getStrength

getTailoredSet

getUCAVersion

getVariableTop

getVersion

hashCode

isAlternateHandlingShifted

isCaseLevel

isFrenchCollation

isFrozen

isLowerCaseFirst

isUpperCaseFirst

setAlternateHandlingDefault

setAlternateHandlingShifted

setCaseFirstDefault

setCaseLevel

setCaseLevelDefault

setDecomposition

setDecompositionDefault

setFrenchCollation

setFrenchCollationDefault

setLowerCaseFirst

setMaxVariable

setNumericCollation

setNumericCollationDefault

setReorderCodes

setStrength

setStrengthDefault

setUpperCaseFirst