String handling

Utility functions to operate on strings.

Some of them are portable versions and/or replacements for some useful GNU and POSIX extesions.

Summary
String handlingUtility functions to operate on strings.
Functions
nacore_char_utf8_encode()Encodes a Unicode code point into an UTF-8 character.
nacore_char_utf8_decode()Decodes the Unicode code point associated to an UTF-8 character.
nacore_char_utf16le_encode()Encodes a Unicode code point into an UTF-16LE character.
nacore_char_utf16le_decode()Decodes the Unicode code point associated to an UTF-16LE character.
nacore_string_utf8_to_utf16le()Converts a UTF-8 encoded string to UTF-16LE.
nacore_string_utf16le_to_utf8()Converts a UTF-16LE encoded string to UTF-8.
nacore_string_get_size()Returns the number of bytes making up a string including the terminating null character.
nacore_strnlen()Gets the number of bytes in a string, not including the terminating null character, up to a certain length.
nacore_strgraphemes()Calculates the number of graphemes in a string.
nacore_strngraphemes()Calculates the number of graphemes in a string, up to a certain number.
nacore_astrcpy()Analog of strcpy() that allocates a string large enough to hold the output including the terminating null character.
nacore_asprintf()Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.
nacore_vasprintf()Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.
nacore_string_split()Creates an auto-allocating list of strings by splitting the given string on boundaries formed by the given separator string.

Functions

nacore_char_utf8_encode()

_NACORE_DEF size_t nacore_char_utf8_encode(char *utf8c,
uint32_t cp)

Encodes a Unicode code point into an UTF-8 character.

If utf8c is NULL, it does only calculate the length in bytes of the UTF-8 character corresponding to the given code point.

Parameters

utf8cPointer to a large enough buffer to contain the resulting UTF-8 character (worst case: 4 bytes wide) or NULL.
cpCode point.

Returns

Length in bytes of the resulting UTF-8 character or 0 if cp is outside of the valid value range (0 to 0x10ffff, except 0xd800 to 0xdfff, 0xfeff and 0xfffe).

nacore_char_utf8_decode()

_NACORE_DEF size_t nacore_char_utf8_decode(const char *utf8c,
uint32_t *cp)

Decodes the Unicode code point associated to an UTF-8 character.

If cp is not NULL and the encoding is valid, the code point is stored into *cp.

Parameters

utf8cPointer to the buffer containing the UTF-8 character to be decoded.
cpPointer to the memory location where to put the code point value or NULL.

Returns

Length in bytes of the decoded character or 0 if the encoding is not valid, in which case nothing is written in *cp.

nacore_char_utf16le_encode()

_NACORE_DEF size_t nacore_char_utf16le_encode(char *utf16lec,
uint32_t cp)

Encodes a Unicode code point into an UTF-16LE character.

If utf16lec is NULL, it does only calculate the length in bytes of the UTF-16LE character corresponding to the given code point.

Parameters

utf16lecPointer to a large enough buffer to contain the resulting UTF-16LE character (worst case: 4 bytes wide) or NULL.
cpCode point.

Returns

Length in bytes of the resulting UTF-16LE character or 0 if cp is outside of the valid value range (0 to 0x10ffff, except 0xd800 to 0xdfff, 0xfeff and 0xfffe).

nacore_char_utf16le_decode()

_NACORE_DEF size_t nacore_char_utf16le_decode(const char *utf16lec,
uint32_t *cp)

Decodes the Unicode code point associated to an UTF-16LE character.

If cp is not NULL and the encoding is valid, the code point is stored into *cp.

Parameters

utf16lecPointer to the buffer containing the UTF-16LE character to be decoded.
cpPointer to the memory location where to put the code point value or NULL.

Returns

Length in bytes of the decoded character or 0 if the encoding is not valid, in which case nothing is written in *cp.

nacore_string_utf8_to_utf16le()

_NACORE_DEF char * nacore_string_utf8_to_utf16le(const char *str_utf8)

Converts a UTF-8 encoded string to UTF-16LE.

It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.

Parameters

str_utf8UTF-8 encoded string to be converted.

Returns

A malloc()-allocated, UTF-16LE encoded string or NULL if there was not enough memory.  The caller is in charge of free()ing such string.

nacore_string_utf16le_to_utf8()

_NACORE_DEF char * nacore_string_utf16le_to_utf8(const char *str_utf16le)

Converts a UTF-16LE encoded string to UTF-8.

It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.

Parameters

str_utf16leUTF-16LE encoded string to be converted.

Returns

A malloc()-allocated, UTF-8 encoded string or NULL if there was not enough memory.  The caller is in charge of free()ing such string.

nacore_string_get_size()

_NACORE_DEF size_t nacore_string_get_size(const char *s,
void *unused)

Returns the number of bytes making up a string including the terminating null character.

Can be safely casted to nacore_get_size_cb type.

Parameters

sThe string.
unusedUnused, set to NULL.

Returns

Number of bytes.

nacore_strnlen()

_NACORE_DEF size_t nacore_strnlen(const char *s,
size_t maxlen)

Gets the number of bytes in a string, not including the terminating null character, up to a certain length.

In doing this, the function looks only at the first maxlen bytes at s and never beyond s + maxlen.

Parameters

sString to be examined.
maxlenMaximum number of bytes.

Returns

strlen(s) if that is less than maxlen, or maxlen if there is no null character among the first maxlen characters pointed to by s.

nacore_strgraphemes()

_NACORE_DEF size_t nacore_strgraphemes(const char *s)

Calculates the number of graphemes in a string.

It operates accordingly to the default Unicode 6.0 rules for extended grapheme clusters (see UAX #29: Unicode Text Segmentation, Grapheme Break Chart, GraphemeBreakProperty-6.0.0.txt).

s is assumed to be well encoded.

Parameters

sString to be examined.

Returns

Number of graphemes.

nacore_strngraphemes()

_NACORE_DEF size_t nacore_strngraphemes(const char *s,
size_t max)

Calculates the number of graphemes in a string, up to a certain number.

It operates accordingly to the default Unicode 6.0 rules for extended grapheme clusters (see UAX #29: Unicode Text Segmentation, Grapheme Break Chart, GraphemeBreakProperty-6.0.0.txt).

The function won’t look beyond the max’th grapheme in the string.

s is assumed to be well encoded.

Parameters

sString to be examined.
maxMaximum number of graphemes.

Returns

Number of graphemes.

nacore_astrcpy()

_NACORE_DEF char * nacore_astrcpy(const char *s,
void *unused)

Analog of strcpy() that allocates a string large enough to hold the output including the terminating null character.

The allocated storage should be free()d when it is no longer needed.

Can be safely casted to nacore_to_string_cb type.

Parameters

sThe string to copy.
unusedUnsed, set to NULL.

Returns

A malloc()-allocated string copy or NULL if there was not enough memory.

nacore_asprintf()

_NACORE_DEF NACORE_FORMAT_PRINTF(
   2,
   3
) int nacore_asprintf(char **strp, const char *fmt, ...)

Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.

If strp is not NULL, it is set to a pointer to the malloc()-allocated string.  Such string should be free()d when it is no longer needed.

The function is not affected by locale settings (it acts as if the C locale is used).

It supports all C99 conversion specifications, except %lc and %ls kinds.

Length and precision modifiers for %s kind of conversion specifications indicate number of bytes.

Parameters

strpMemory location where to put a pointer to the allocated string or NULL.
fmtprintf()-like format string.
...printf()-like extra arguments.

Returns

Length of the output string in bytes excluding the terminating null character.  If there was not enough memory and strp is not NULL, *strp is set to NULL.

nacore_vasprintf()

_NACORE_DEF NACORE_FORMAT_VPRINTF(
   2
) int nacore_vasprintf(char **strp, const char *fmt, va_list ap)

Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.

If strp is not NULL, it is set to a pointer to the malloc()-allocated string.  Such string should be free()d when it is no longer needed.

The function is not affected by locale settings (it acts as if the C locale is used).

It supports all C99 conversion specifications, except %lc and %ls kinds.

Length and precision modifiers for %s kind of conversion specifications indicate number of bytes.

va_end() is not called on ap inside the function, hence its value is undefined after the call.

Parameters

strpMemory location where to put a pointer to the allocated string.
fmtvprintf()-like format string.
apvprintf()-like va_list.

Returns

Length of the output string in bytes excluding the terminating null character.  If there was not enough memory and strp is not NULL, *strp is set to NULL.

nacore_string_split()

_NACORE_DEF nacore_list nacore_string_split(const char *s,
const char *sep,
nacore_filter_cb filter_cb,
void *filter_opaque)

Creates an auto-allocating list of strings by splitting the given string on boundaries formed by the given separator string.

The list will use nacore_string_get_size() as data size callback.

If filter_cb is not NULL, it will be called along with filter_opaque for each substring; if it is to be filtered out (i.e., filter_cb returns 0) the substring will not be included in the resulting list.

Parameters

sInput string.
sepSeparator string.
filter_cbValue filtering callback.
filter_opaqueExtra opaque data to be passed to filter_cb or NULL.

Returns

Auto-allocating list of strings or NULL if some error occurred, in which case errno is set to EAGAIN if the system lacked the necessary resources (other than memory), ENOMEM if there was not enough memory, EPERM if the caller does not have the priviledge to perform the operation or NACORE_EUNKNOWN if another kind of error happened.

_NACORE_DEF size_t nacore_char_utf8_encode(char *utf8c,
uint32_t cp)
Encodes a Unicode code point into an UTF-8 character.
_NACORE_DEF size_t nacore_char_utf8_decode(const char *utf8c,
uint32_t *cp)
Decodes the Unicode code point associated to an UTF-8 character.
_NACORE_DEF size_t nacore_char_utf16le_encode(char *utf16lec,
uint32_t cp)
Encodes a Unicode code point into an UTF-16LE character.
_NACORE_DEF size_t nacore_char_utf16le_decode(const char *utf16lec,
uint32_t *cp)
Decodes the Unicode code point associated to an UTF-16LE character.
_NACORE_DEF char * nacore_string_utf8_to_utf16le(const char *str_utf8)
Converts a UTF-8 encoded string to UTF-16LE.
_NACORE_DEF char * nacore_string_utf16le_to_utf8(const char *str_utf16le)
Converts a UTF-16LE encoded string to UTF-8.
_NACORE_DEF size_t nacore_string_get_size(const char *s,
void *unused)
Returns the number of bytes making up a string including the terminating null character.
_NACORE_DEF size_t nacore_strnlen(const char *s,
size_t maxlen)
Gets the number of bytes in a string, not including the terminating null character, up to a certain length.
_NACORE_DEF size_t nacore_strgraphemes(const char *s)
Calculates the number of graphemes in a string.
_NACORE_DEF size_t nacore_strngraphemes(const char *s,
size_t max)
Calculates the number of graphemes in a string, up to a certain number.
_NACORE_DEF char * nacore_astrcpy(const char *s,
void *unused)
Analog of strcpy() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF NACORE_FORMAT_PRINTF(
   2,
   3
) int nacore_asprintf(char **strp, const char *fmt, ...)
Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF NACORE_FORMAT_VPRINTF(
   2
) int nacore_vasprintf(char **strp, const char *fmt, va_list ap)
Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF nacore_list nacore_string_split(const char *s,
const char *sep,
nacore_filter_cb filter_cb,
void *filter_opaque)
Creates an auto-allocating list of strings by splitting the given string on boundaries formed by the given separator string.
typedef size_t (*nacore_get_size_cb)(const void *value, void *opaque)
A function that returns the size of some value.
typedef char * (*nacore_to_string_cb)(const void *value, void *opaque)
A function that retuns a textual description of some value.
Error code for unknown errors.
Close