ogonek::text
is intended as a Unicode-based string class. Not as a glorified container of characters, like std::basic_string
, but as an actual piece of Unicode text.
text
is not about storage, so it delegates that to another container. That container can be customized, yielding a varied range of performance characteristics suitable to any situations. One could have a Unicode text array, similar to std::basic_string
, or one could have a Unicode text deque, or even a rope.
text
is to be seen as a sequence of codepoints, not as a sequence of code units. The encoding is also not fixed and customizable by the user. So one can have a Unicode rope on UTF-16, or a Unicode deque on UTF-8.
text
has strong validity invariants. Attempting to construct an instance from an invalid sequence of code units is an error unless a replacement strategy is provided.
text
is a range of codepoints with functionality depending on the underlying container and encoding. It supports at least forward iteration, but can support all the other iteration features giving the right underlying encoding and container (using utf32
and a std::vector
would give random-access iteration; utf16
and a std::deque
would give bidirectional iteration; and utf7
and std::deque
would only give forward iteration).
The customization points (container and encoding) are to be template parameters, but it may or may not be desirable to provide type erased alternatives.
There's a platform dependent preferred_host_encoding
alias for some encoding that is preferrable on the host (utf16
on Windows, and utf8
on Linux), and aliases for the text
variants using that encoding.
text
provides controlled access to the underlying code units. If the user wants to manipulate code units directly, they can simply use a container directly. The user can move the underlying container out of an instance text
, manipulate it, and then move it back in or create a new instance of text
from it. This last operation can enforce the validity invariants by rechecking the data.
Interoperation with APIs operating on null-terminated arrays of code units can be done using a container that stores such a null-terminated array, like std::basic_string
.
Interoperation between ogonek::text
and ICU's UnicodeString
is intended, but requires further study.