Previous: Unicode Representations, Up: Unicode


5.7.3 Alphabets

Applications often need to manipulate sets of characters, such as the set of alphabetic characters or the set of whitespace characters. The alphabet abstraction provides an efficient implementation of sets of Unicode scalar values.

— procedure: alphabet? object

Returns #t if object is a Unicode alphabet, otherwise returns #f.

— procedure: alphabet unicode-char ...

Returns a Unicode alphabet containing the Unicode characters passed as arguments.

— procedure: scalar-values->alphabet items

Returns a Unicode alphabet containing the scalar values described by items. Items must satisfy well-formed-scalar-values-list?.

— procedure: alphabet->scalar-values alphabet

Returns a well-formed scalar-values list that describes the scalar values represented by alphabet.

— procedure: well-formed-scalar-values-list? object

Returns #t if object is a well-formed scalar-values list, otherwise returns #f. A well-formed scalar-values list is a proper list, each element of which is either a unicode scalar value or a pair of unicode scalar values. A pair of scalar values represents a contiguous range of scalar values. The car of the pair is the lower limit, and the cdr is the upper limit. Both limits are inclusive, and the lower limit must be less than or equal to the upper limit.

— procedure: char-in-alphabet? char alphabet

Returns #t if char is a member of alphabet, otherwise returns #f.

Character sets and alphabets can be converted to one another, provided that the alphabet contains only 8-bit scalar values. This is true because 8-bit scalar values in Unicode map directly to ISO-8859-1 characters, which is what character sets contain.

— procedure: char-set->alphabet char-set

Returns a Unicode alphabet containing the scalar values that correspond to characters that are members of char-set.

— procedure: alphabet->char-set alphabet

Returns a character set containing the characters that correspond to 8-bit scalar values that are members of alphabet. (Scalar values outside the 8-bit range are ignored.)

— procedure: string->alphabet string

Returns a Unicode alphabet containing the scalar values corresponding to the characters in string. Equivalent to

          (char-set->alphabet (string->char-set string))
— procedure: alphabet->string alphabet

Returns a newly-allocated string containing the characters corresponding to the 8-bit scalar values in alphabet. (Scalar values outside the 8-bit range are ignored.)

— procedure: 8-bit-alphabet? alphabet

Returns #t if alphabet contains only 8-bit scalar values, otherwise returns #f.

— procedure: alphabet+ alphabet ...

Returns a Unicode alphabet that contains each scalar value that is a member of any of the alphabet arguments.

— procedure: alphabet- alphabet1 alphabet2

Returns a Unicode alphabet that contains each scalar value that is a member of alphabet1 and is not a member of alphabet2.