Utility functions to operate on strings.
Some of them are portable versions and/or replacements for some useful GNU and POSIX extesions.
String handling | Utility functions to operate on strings. |
Functions | |
nacore_char_utf8_encode() | Encodes a Unicode code point into an UTF-8 character. |
nacore_char_utf8_decode() | Decodes the Unicode code point associated to an UTF-8 character. |
nacore_char_utf16le_encode() | Encodes a Unicode code point into an UTF-16LE character. |
nacore_char_utf16le_decode() | Decodes the Unicode code point associated to an UTF-16LE character. |
nacore_string_utf8_to_utf16le() | Converts a UTF-8 encoded string to UTF-16LE. |
nacore_string_utf16le_to_utf8() | Converts a UTF-16LE encoded string to UTF-8. |
nacore_string_get_size() | Returns the number of bytes making up a string including the terminating null character. |
nacore_strnlen() | Gets the number of bytes in a string, not including the terminating null character, up to a certain length. |
nacore_strgraphemes() | Calculates the number of graphemes in a string. |
nacore_strngraphemes() | Calculates the number of graphemes in a string, up to a certain number. |
nacore_astrcpy() | Analog of strcpy() that allocates a string large enough to hold the output including the terminating null character. |
nacore_asprintf() | Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character. |
nacore_vasprintf() | Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character. |
nacore_string_split() | Creates an auto-allocating list of strings by splitting the given string on boundaries formed by the given separator string. |
_NACORE_DEF size_t nacore_char_utf8_encode( char * utf8c, uint32_t cp )
Encodes a Unicode code point into an UTF-8 character.
If utf8c is NULL, it does only calculate the length in bytes of the UTF-8 character corresponding to the given code point.
utf8c | Pointer to a large enough buffer to contain the resulting UTF-8 character (worst case: 4 bytes wide) or NULL. |
cp | Code point. |
Length in bytes of the resulting UTF-8 character or 0 if cp is outside of the valid value range (0 to 0x10ffff, except 0xd800 to 0xdfff, 0xfeff and 0xfffe).
_NACORE_DEF size_t nacore_char_utf8_decode( const char * utf8c, uint32_t * cp )
Decodes the Unicode code point associated to an UTF-8 character.
If cp is not NULL and the encoding is valid, the code point is stored into *cp.
utf8c | Pointer to the buffer containing the UTF-8 character to be decoded. |
cp | Pointer to the memory location where to put the code point value or NULL. |
Length in bytes of the decoded character or 0 if the encoding is not valid, in which case nothing is written in *cp.
_NACORE_DEF size_t nacore_char_utf16le_encode( char * utf16lec, uint32_t cp )
Encodes a Unicode code point into an UTF-16LE character.
If utf16lec is NULL, it does only calculate the length in bytes of the UTF-16LE character corresponding to the given code point.
utf16lec | Pointer to a large enough buffer to contain the resulting UTF-16LE character (worst case: 4 bytes wide) or NULL. |
cp | Code point. |
Length in bytes of the resulting UTF-16LE character or 0 if cp is outside of the valid value range (0 to 0x10ffff, except 0xd800 to 0xdfff, 0xfeff and 0xfffe).
_NACORE_DEF size_t nacore_char_utf16le_decode( const char * utf16lec, uint32_t * cp )
Decodes the Unicode code point associated to an UTF-16LE character.
If cp is not NULL and the encoding is valid, the code point is stored into *cp.
utf16lec | Pointer to the buffer containing the UTF-16LE character to be decoded. |
cp | Pointer to the memory location where to put the code point value or NULL. |
Length in bytes of the decoded character or 0 if the encoding is not valid, in which case nothing is written in *cp.
_NACORE_DEF char * nacore_string_utf8_to_utf16le( const char * str_utf8 )
Converts a UTF-8 encoded string to UTF-16LE.
It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.
str_utf8 | UTF-8 encoded string to be converted. |
A malloc()-allocated, UTF-16LE encoded string or NULL if there was not enough memory. The caller is in charge of free()ing such string.
_NACORE_DEF char * nacore_string_utf16le_to_utf8( const char * str_utf16le )
Converts a UTF-16LE encoded string to UTF-8.
It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.
str_utf16le | UTF-16LE encoded string to be converted. |
A malloc()-allocated, UTF-8 encoded string or NULL if there was not enough memory. The caller is in charge of free()ing such string.
_NACORE_DEF size_t nacore_string_get_size( const char * s, void * unused )
Returns the number of bytes making up a string including the terminating null character.
Can be safely casted to nacore_get_size_cb type.
s | The string. |
unused | Unused, set to NULL. |
Number of bytes.
_NACORE_DEF size_t nacore_strnlen( const char * s, size_t maxlen )
Gets the number of bytes in a string, not including the terminating null character, up to a certain length.
In doing this, the function looks only at the first maxlen bytes at s and never beyond s + maxlen.
s | String to be examined. |
maxlen | Maximum number of bytes. |
strlen(s) if that is less than maxlen, or maxlen if there is no null character among the first maxlen characters pointed to by s.
_NACORE_DEF size_t nacore_strgraphemes( const char * s )
Calculates the number of graphemes in a string.
It operates accordingly to the default Unicode 6.0 rules for extended grapheme clusters (see UAX #29: Unicode Text Segmentation, Grapheme Break Chart, GraphemeBreakProperty-6.0.0.txt).
s is assumed to be well encoded.
s | String to be examined. |
Number of graphemes.
_NACORE_DEF size_t nacore_strngraphemes( const char * s, size_t max )
Calculates the number of graphemes in a string, up to a certain number.
It operates accordingly to the default Unicode 6.0 rules for extended grapheme clusters (see UAX #29: Unicode Text Segmentation, Grapheme Break Chart, GraphemeBreakProperty-6.0.0.txt).
The function won’t look beyond the max’th grapheme in the string.
s is assumed to be well encoded.
s | String to be examined. |
max | Maximum number of graphemes. |
Number of graphemes.
_NACORE_DEF char * nacore_astrcpy( const char * s, void * unused )
Analog of strcpy() that allocates a string large enough to hold the output including the terminating null character.
The allocated storage should be free()d when it is no longer needed.
Can be safely casted to nacore_to_string_cb type.
s | The string to copy. |
unused | Unsed, set to NULL. |
A malloc()-allocated string copy or NULL if there was not enough memory.
_NACORE_DEF NACORE_FORMAT_PRINTF( 2, 3 ) int nacore_asprintf(char **strp, const char *fmt, ...)
Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.
If strp is not NULL, it is set to a pointer to the malloc()-allocated string. Such string should be free()d when it is no longer needed.
The function is not affected by locale settings (it acts as if the C locale is used).
It supports all C99 conversion specifications, except %lc and %ls kinds.
Length and precision modifiers for %s kind of conversion specifications indicate number of bytes.
strp | Memory location where to put a pointer to the allocated string or NULL. |
fmt | printf()-like format string. |
... | printf()-like extra arguments. |
Length of the output string in bytes excluding the terminating null character. If there was not enough memory and strp is not NULL, *strp is set to NULL.
_NACORE_DEF NACORE_FORMAT_VPRINTF( 2 ) int nacore_vasprintf(char **strp, const char *fmt, va_list ap)
Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.
If strp is not NULL, it is set to a pointer to the malloc()-allocated string. Such string should be free()d when it is no longer needed.
The function is not affected by locale settings (it acts as if the C locale is used).
It supports all C99 conversion specifications, except %lc and %ls kinds.
Length and precision modifiers for %s kind of conversion specifications indicate number of bytes.
va_end() is not called on ap inside the function, hence its value is undefined after the call.
strp | Memory location where to put a pointer to the allocated string. |
fmt | vprintf()-like format string. |
ap | vprintf()-like va_list. |
Length of the output string in bytes excluding the terminating null character. If there was not enough memory and strp is not NULL, *strp is set to NULL.
_NACORE_DEF nacore_list nacore_string_split( const char * s, const char * sep, nacore_filter_cb filter_cb, void * filter_opaque )
Creates an auto-allocating list of strings by splitting the given string on boundaries formed by the given separator string.
The list will use nacore_string_get_size() as data size callback.
If filter_cb is not NULL, it will be called along with filter_opaque for each substring; if it is to be filtered out (i.e., filter_cb returns 0) the substring will not be included in the resulting list.
s | Input string. |
sep | Separator string. |
filter_cb | Value filtering callback. |
filter_opaque | Extra opaque data to be passed to filter_cb or NULL. |
Auto-allocating list of strings or NULL if some error occurred, in which case errno is set to EAGAIN if the system lacked the necessary resources (other than memory), ENOMEM if there was not enough memory, EPERM if the caller does not have the priviledge to perform the operation or NACORE_EUNKNOWN if another kind of error happened.
Encodes a Unicode code point into an UTF-8 character.
_NACORE_DEF size_t nacore_char_utf8_encode( char * utf8c, uint32_t cp )
Decodes the Unicode code point associated to an UTF-8 character.
_NACORE_DEF size_t nacore_char_utf8_decode( const char * utf8c, uint32_t * cp )
Encodes a Unicode code point into an UTF-16LE character.
_NACORE_DEF size_t nacore_char_utf16le_encode( char * utf16lec, uint32_t cp )
Decodes the Unicode code point associated to an UTF-16LE character.
_NACORE_DEF size_t nacore_char_utf16le_decode( const char * utf16lec, uint32_t * cp )
Converts a UTF-8 encoded string to UTF-16LE.
_NACORE_DEF char * nacore_string_utf8_to_utf16le( const char * str_utf8 )
Converts a UTF-16LE encoded string to UTF-8.
_NACORE_DEF char * nacore_string_utf16le_to_utf8( const char * str_utf16le )
Returns the number of bytes making up a string including the terminating null character.
_NACORE_DEF size_t nacore_string_get_size( const char * s, void * unused )
Gets the number of bytes in a string, not including the terminating null character, up to a certain length.
_NACORE_DEF size_t nacore_strnlen( const char * s, size_t maxlen )
Calculates the number of graphemes in a string.
_NACORE_DEF size_t nacore_strgraphemes( const char * s )
Calculates the number of graphemes in a string, up to a certain number.
_NACORE_DEF size_t nacore_strngraphemes( const char * s, size_t max )
Analog of strcpy() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF char * nacore_astrcpy( const char * s, void * unused )
Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF NACORE_FORMAT_PRINTF( 2, 3 ) int nacore_asprintf(char **strp, const char *fmt, ...)
Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF NACORE_FORMAT_VPRINTF( 2 ) int nacore_vasprintf(char **strp, const char *fmt, va_list ap)
Creates an auto-allocating list of strings by splitting the given string on boundaries formed by the given separator string.
_NACORE_DEF nacore_list nacore_string_split( const char * s, const char * sep, nacore_filter_cb filter_cb, void * filter_opaque )
A function that returns the size of some value.
typedef size_t ( * nacore_get_size_cb )(const void *value, void *opaque)
A function that retuns a textual description of some value.
typedef char * ( * nacore_to_string_cb )(const void *value, void *opaque)