The VSS library is designed to provide advanced string and text processing capabilities. The concept behind the new string library is to offer a convenient and robust API that allows developers to work with Unicode text, regardless of its internal representation. In this article, we will introduce you to the library and explain its purpose, highlighting its usefulness for developers working in this area.
Although Ada offers several standard string types, and there are several libraries developed by the Ada community, each one has its own drawbacks or limitations.
The String, Wide_String, and Wide_Wide_String types are indefinite, which can be inconvenient when storing string values in an object or container. The Unbounded_String, Unbounded_Wide_String, and Unbounded_Wide_Wide_String types are definite, but their set of provided operations is limited, and dot notation is not available for them.
Furthermore, each type is restricted to a specific character set, necessitating the conversion of the character set when reading, writing, or interacting with external sources. String and Unbounded_String types only support Latin-1, while wide types use 2 or 4 bytes per character, even for ASCII.
Unfortunately, the most commonly used encoding, UTF-8, is not natively supported by any of these types. The UTF8_String type attempts to fill this gap, but it breaks the user's expectation that each element is a character and places the burden and complexity of working with the encoding on the user.
In modern times, text is not merely a sequence of characters but rather
consists of grapheme clusters, words, and lines, as defined by the Unicode
standard. As a result, tasks such as comparing and sorting strings (collation)
and case conversion cannot be performed solely at the level of individual
characters. For example To_Upper ("ß") = "SS"
.
The standard library does not provide support for this.
To overcome these issues, the VSS library:
- provides a definite type to represent a Unicode character string with a convenient set of operations. A dedicated string vector type with an efficient implementation.
- provides an encoding-agnostic API that allows efficient implementations tailored to the platform or application.
- offers a comprehensive range of string and string vector operations, comparable to those found in other programming languages.
- takes advantage of more modern language features and technologies, offering improved performance, memory usage, or other benefits.
The library can be found on GitHub and is distributed under the Apache 2.0 license. It can be built using an Ada 2022 compliant compiler. Additionally, it is possible to use Alire to build the library.
git clone https://github.com/AdaCore/VSS.git
cd VSS
alr build
The VSS library is divided into multiple projects:
vss_text.gpr
- base string library with- Unicode string, string vector, byte vector types
- input/output text streams to read/write files, memory and stdin/stdout
- iterators for characters, grapheme clasters, words and lines
- encoders and decoders for several of the most popular text encodings
vss_regexp.gpr
- a regular expression enginevss_json.gpr
- a JSON streaming API that allows for efficient parsing and composing of JSON content on the flyvss_xml.gpr
- a XML streaming API implemented over XMLAda or Matreshka librariesvss_xml_templates.gpr
- a XML template engine inspired by Zope Page Templates
How about giving the VSS string library a try?
We starts with creating a sample Alire crate and adding VSS as a dependency:
alr init --bin vss_test
cd vss_test
alr pin vss --use=PATH_TO_VSS_FOLDER
# or you can use a Git repository link:
alr pin vss --use=https://github.com/AdaCore/VSS.git --branch=master
Then we modify vss_test.adb
to the following code:
pragma Wide_Character_Encoding (UTF8);
with VSS.Strings;
with VSS.Strings.Conversions;
with Ada.Wide_Wide_Text_IO;
procedure Vss_Test is
Text : VSS.Strings.Virtual_String := "𝛼−𝛽";
begin
Ada.Wide_Wide_Text_IO.Put_Line
(VSS.Strings.Conversions.To_Wide_Wide_String (Text));
end Vss_Test;
The first line specifies to GNAT that the source code representation will
use UTF-8 encoding. Then we add VSS library units and Wide_Wide_Text_IO
package. The Text
variable initialization leverages Ada 2022 syntax
for user defined literals. It hides a call to
VSS.Strings.To_Virtual_String
for the string literal.
The explicit call is required for converting back to a string.
To build and execute this code just run:
alr run
Having Text
we can:
- find if it's empty:
Text.Is_Empty
- find text's length in characters:
Text.Character_Length
- find text's hash:
Text.Hash
- check is it starts (or ends) with other string:
Text.Starts_With ("𝛼")
- change character cases:
Text.To_Uppercase
- etc.
We can modify Text by
- Appending string or character:
Text.Append ('.');
- Prepending string or character:
Text.Prepend (">>>");
- Erasing:
Text.Clear;
- etc.
We can split Text
to a string vector (defined in VSS.String_Vectors
):
declare
List : VSS.String_Vectors.Virtual_String_Vector := Text.Split ('−');
begin
for Item of List loop
Ada.Wide_Wide_Text_IO.Put_Line
(VSS.Strings.Conversions.To_Wide_Wide_String (Item));
end loop;
end;
A dedicated function Text.Split_Lines
split the text to a string vector
using specified line separator.
Conversely, the vector type offers the Join
and Join_Lines
functions
for the opposite operations.
In conclusion, the VSS library provides advanced string and text processing capabilities. It offers an API that allows developers to work with Unicode text, regardless of its internal representation. The library overcomes the limitations of Ada's standard string types and other community-developed string libraries. It provides a definite type to represent a Unicode character string with a comprehensive range of operations comparable to those found in other programming languages. Additionally, it is encoding-agnostic, allowing efficient implementations tailored to the platform or application. The library is divided into multiple projects, with each project catering to a specific need. The VSS library is distributed under the Apache 2.0 license, and it can be built using an Ada 2022 compliant compiler. With its efficient implementation, modern language features and technologies, and support for tasks such as comparing and sorting strings, the VSS library is a useful tool for developers working with strings and text processing.
In the subsequent articles, we will explore more advanced concepts such as cursors, streams, encoders/decoders, and so on. Stay tuned!