Utilities for text parsing. More...
#include <ParsingToolkit.h>
Classes | |
struct | CCTypeAdapter |
struct | Error |
struct | Params_t |
All parsing parameters. More... | |
struct | SplitView_t |
Record of a split token: pre-separator, separator and post-separator. More... | |
Public Types | |
using | QuotSpec_t = std::pair< std::string, std::string > |
Specification of quotation: opening and closing. More... | |
Public Member Functions | |
ParsingToolkit () | |
Default parsing parameters. More... | |
ParsingToolkit (Params_t params) | |
Creates a parser with the specified parsing parameters. More... | |
Params_t const & | params () const noexcept |
Returns the current parameters of parsing. More... | |
template<typename BIter , typename EIter > | |
std::string_view | findFirstUnquoted (std::string_view sv, BIter beginKey, EIter endKey) const |
Finds the first of the specified keys in the unquoted part of sv . More... | |
template<typename Words > | |
std::vector< std::string > | removeEscapes (Words const &words) const |
Returns a copy of words with all escape characters removed. More... | |
template<typename Words > | |
std::vector< std::string > | removeQuotations (Words const &words) const |
Returns a copy of words with no quotation starts and ends. More... | |
template<typename Iter > | |
bool | isCharacterEscaped (Iter begin, Iter itCh) const |
Input | |
std::pair< std::string, unsigned int > | readMultiline (std::istream &in) const |
Returns a single line of text from the input stream. More... | |
Tokenization | |
template<typename Delim > | |
std::vector< std::string_view > | splitWords (std::string const &s, Delim isDelimiter) const |
Splits a string into words. More... | |
std::vector< std::string_view > | splitWords (std::string const &s) const |
Helper version of splitWords(std::string const&, Delim) . More... | |
template<typename Iter > | |
Iter | findCommentWord (Iter beginWord, Iter endWord) const |
Finds the first word starting with a comment marker. More... | |
template<typename WordType > | |
void | removeCommentLine (std::vector< WordType > &words) const |
Removes all the words from the one starting with a comment marker. More... | |
std::pair< std::string_view, QuotSpec_t const * > | findQuotationStart (std::string_view sv) const |
Finds the start of the next quotation in sv . More... | |
std::string_view | findQuotationEnd (std::string_view sv, std::string const "End) const |
Finds the quotation end in sv . More... | |
bool | isQuotationUnclosed (std::string_view sv) const |
Returns if the sequence sv has unclosed quotation at its end. More... | |
template<typename BIter , typename EIter > | |
std::string_view | findFirstUnescaped (std::string_view sv, BIter beginKey, EIter endKey) const |
Finds the first of the specified keys in sv . More... | |
template<typename Keys > | |
std::string_view | findFirstUnescaped (std::string_view sv, Keys const &keys) const |
Finds the first of the specified keys in sv . More... | |
template<typename Key > | |
std::string_view | findFirstUnescaped (std::string_view sv, std::initializer_list< Key > keys) const |
template<typename Keys > | |
std::string_view | findFirstUnquoted (std::string_view sv, Keys const &keys) const |
Finds the first of the specified keys in the unquoted part of sv . More... | |
template<typename Key > | |
std::string_view | findFirstUnquoted (std::string_view sv, std::initializer_list< Key > keys) const |
Characters | |
bool | isEscape (char ch) const |
Returns whether ch is an escape character. More... | |
template<typename BIter > | |
bool | isCharacterEscaped (BIter begin, BIter itCh) const |
Returns whether the character pointed by itCh is escaped or not. More... | |
template<typename Sel > | |
std::string_view::const_iterator | findNextCharacter (std::string_view s, Sel select) const |
Finds the next character satisfying the specified criterion. More... | |
std::string_view::const_iterator | findNextBlank (std::string_view s) const |
Helper function for findNextCharacter(std::string_view, Sel) . More... | |
template<typename CType > | |
std::string_view | removeTrailingCharacters (std::string_view s, CType charType) const |
Consumes the blank characters a the beginning of s . More... | |
std::string_view | removeTrailingBlanks (std::string_view s) const |
Consumes the blank characters a the beginning of s . More... | |
std::string | removeWordEscapes (std::string &&w) const |
Returns a copy of w with all escape characters removed. More... | |
std::string | removeWordEscapes (std::string_view w) const |
std::string | removeWordEscapes (const char *w) const |
std::string | removeWordQuotations (std::string &&w) const |
Returns a copy of w with no quotation starts and ends. More... | |
std::string | removeWordQuotations (std::string_view w) const |
std::string | removeWordQuotations (const char *w) const |
Static Public Member Functions | |
static SplitView_t | splitOn (std::string_view sv, std::string_view sep) |
Splits the view sv in three: before sep , sep and after sep . More... | |
static std::string_view | make_view (std::string const &s) |
Creates a std::string_view from an entire string s . More... | |
template<typename BIter , typename EIter > | |
static std::string_view | make_view (BIter b, EIter e) |
Creates a std::string_view from two string iterators b and e . More... | |
Static Public Attributes | |
static constexpr CCTypeAdapter <&std::isblank > | isBlank {} |
Adapter for determining if a character is a blank (see std::isblank() ). More... | |
static Params_t const | DefaultParameters |
Private Member Functions | |
void | adoptParams (Params_t params) |
Initializes the parameters and caches. More... | |
Private Attributes | |
Params_t | fParams |
Parsing parameters. More... | |
std::string | fQuoteStarts |
Start characters of all supported quotations. More... | |
Utilities for text parsing.
This "class" is a glorified namespace with some configuration inside.
A quoted string is the content in between an opening quoting sequence and the matching closing sequence. Each sequence may be any string, including but not limited to a one-character long string. Escaping the first character of an opening or closing quotation string will turn it in common string data carrying no quotation meaning.
Any single character following the escape character is "escaped". The escaped characters lose their standard function and are replaced by a substitute character. For example, escaping the first character of a opening quotation makes that a standard character. An escaped escape character is always replaced by the character itself, without its escape function.
Definition at line 54 of file ParsingToolkit.h.
using icarus::ParsingToolkit::QuotSpec_t = std::pair<std::string, std::string> |
Specification of quotation: opening and closing.
Definition at line 63 of file ParsingToolkit.h.
|
inline |
Default parsing parameters.
Creates a parser with the default parsing parameters.
Definition at line 100 of file ParsingToolkit.h.
|
inline |
Creates a parser with the specified parsing parameters.
Definition at line 103 of file ParsingToolkit.h.
|
private |
Initializes the parameters and caches.
Definition at line 220 of file ParsingToolkit.cxx.
Iter icarus::ParsingToolkit::findCommentWord | ( | Iter | beginWord, |
Iter | endWord | ||
) | const |
Finds the first word starting with a comment marker.
Iter | type of iterator to the words |
beginWord | iterator to the first word to consider |
endWord | iterator past the lasy word to consider |
endWord
if not foundThe original list is modified, the word starting with a comment marker and all the following ones are removed.
Definition at line 748 of file ParsingToolkit.h.
std::string_view icarus::ParsingToolkit::findFirstUnescaped | ( | std::string_view | sv, |
BIter | beginKey, | ||
EIter | endKey | ||
) | const |
Finds the first of the specified keys in sv
.
BIter | type of iterator to the keys |
EIter | type of key end-iterator |
sv | string to be parsed |
beginKey | iterator to the first key |
endKey | iterator past the last key |
sv
, empty if noneThe keys
are required to be sorted, longest first, since they are tested in order and the first match is kept (e.g. if the first key is =
and the second is ==
, the second key is never matched since the first one matches first). The first character of the key must not be escaped. Escaped characters in the key are not supported.
If no key
is found, the returned view is zero-length and pointing to the end of sv
.
The quoting in sv
is ignored.
Definition at line 634 of file ParsingToolkit.h.
std::string_view icarus::ParsingToolkit::findFirstUnescaped | ( | std::string_view | sv, |
Keys const & | keys | ||
) | const |
Finds the first of the specified keys in sv
.
BIter | type of iterator to the keys |
EIter | type of key end-iterator |
sv | string to be parsed |
beginKey | iterator to the first key |
endKey | iterator past the last key |
sv
, empty if noneThe keys
are required to be sorted, longest first, since they are tested in order and the first match is kept (e.g. if the first key is =
and the second is ==
, the second key is never matched since the first one matches first). The first character of the key must not be escaped. Escaped characters in the key are not supported.
If no key
is found, the returned view is zero-length and pointing to the end of sv
.
The quoting in sv
is ignored.
Definition at line 667 of file ParsingToolkit.h.
std::string_view icarus::ParsingToolkit::findFirstUnescaped | ( | std::string_view | sv, |
std::initializer_list< Key > | keys | ||
) | const |
Definition at line 677 of file ParsingToolkit.h.
std::string_view icarus::ParsingToolkit::findFirstUnquoted | ( | std::string_view | sv, |
BIter | beginKey, | ||
EIter | endKey | ||
) | const |
Finds the first of the specified keys in the unquoted part of sv
.
BIter | type of iterator to the keys |
EIter | type of key end-iterator |
sv | string to be parsed |
beginKey | iterator to the first key |
endKey | iterator past the last key |
sv
, or empty to its end if noneThe keys
are required to be sorted, longest first, since they are tested in order and the first match is kept (e.g. if the first key is =
and the second is ==
, the second key is never matched since the first one matches first).
If no key
is found, the returned view is zero-length and pointing to the end of sv
.
Definition at line 684 of file ParsingToolkit.h.
std::string_view icarus::ParsingToolkit::findFirstUnquoted | ( | std::string_view | sv, |
Keys const & | keys | ||
) | const |
Finds the first of the specified keys in the unquoted part of sv
.
BIter | type of iterator to the keys |
EIter | type of key end-iterator |
sv | string to be parsed |
beginKey | iterator to the first key |
endKey | iterator past the last key |
sv
, or empty to its end if noneThe keys
are required to be sorted, longest first, since they are tested in order and the first match is kept (e.g. if the first key is =
and the second is ==
, the second key is never matched since the first one matches first).
If no key
is found, the returned view is zero-length and pointing to the end of sv
.
Definition at line 732 of file ParsingToolkit.h.
std::string_view icarus::ParsingToolkit::findFirstUnquoted | ( | std::string_view | sv, |
std::initializer_list< Key > | keys | ||
) | const |
Definition at line 742 of file ParsingToolkit.h.
|
inline |
Helper function for findNextCharacter(std::string_view, Sel)
.
Definition at line 405 of file ParsingToolkit.h.
std::string_view::const_iterator icarus::ParsingToolkit::findNextCharacter | ( | std::string_view | s, |
Sel | select | ||
) | const |
Finds the next character satisfying the specified criterion.
Sel | type of functor determining which character to consider blank |
s | view of the string to be parsed |
select | functor determining which character(s) to look for |
s.end()
if noneBy default, the selected character is a blank character ch
, which has std::isblank(ch)
true
.
Definition at line 778 of file ParsingToolkit.h.
std::string_view icarus::ParsingToolkit::findQuotationEnd | ( | std::string_view | sv, |
std::string const & | quotEnd | ||
) | const |
Finds the quotation end in sv
.
sv | the buffer to look the quotation end into |
quotEnd | the quotation end to be searched |
sv
from the quotation end, included, empty if not foundNote that sv
should not include the quotation start.
Definition at line 107 of file ParsingToolkit.cxx.
auto icarus::ParsingToolkit::findQuotationStart | ( | std::string_view | sv | ) | const |
Finds the start of the next quotation in sv
.
sv | the buffer to look the quotation start into |
sv
starting from the quotation found, empty if none Definition at line 66 of file ParsingToolkit.cxx.
bool icarus::ParsingToolkit::isCharacterEscaped | ( | BIter | begin, |
BIter | itCh | ||
) | const |
Returns whether the character pointed by itCh
is escaped or not.
BIter | iterator type |
begin | iterator to the beginning of the string |
itCh | iterator to the character to be investigated. |
itCh
Note that itCh
may be a end iterator (for an empty string, the result is false
).
bool icarus::ParsingToolkit::isCharacterEscaped | ( | Iter | begin, |
Iter | itCh | ||
) | const |
Definition at line 760 of file ParsingToolkit.h.
|
inline |
Returns whether ch
is an escape character.
Definition at line 375 of file ParsingToolkit.h.
bool icarus::ParsingToolkit::isQuotationUnclosed | ( | std::string_view | sv | ) | const |
Returns if the sequence sv
has unclosed quotation at its end.
Definition at line 128 of file ParsingToolkit.cxx.
|
inlinestatic |
Creates a std::string_view
from an entire string s
.
Definition at line 510 of file ParsingToolkit.h.
|
inlinestatic |
Creates a std::string_view
from two string iterators b
and e
.
Definition at line 515 of file ParsingToolkit.h.
|
inlinenoexcept |
std::pair< std::string, unsigned int > icarus::ParsingToolkit::readMultiline | ( | std::istream & | in | ) | const |
Returns a single line of text from the input stream.
in | the input stream |
Error | on fatal parsing errors |
This function reads entire lines from in
, where a line is defined as in std::getline()
. If the line ends with an unescaped escape character, another line is read and appended (the escape character is dropped). The return value is the merged string with no end-of-line characters, and the number of lines read. If there is no string to be read, it returns an empty string and 0U
.
Definition at line 27 of file ParsingToolkit.cxx.
|
inline |
Removes all the words from the one starting with a comment marker.
words | list of words |
The original list is modified, the word starting with a comment marker and all the following ones are removed.
Definition at line 216 of file ParsingToolkit.h.
std::vector< std::string > icarus::ParsingToolkit::removeEscapes | ( | Words const & | words | ) | const |
Returns a copy of words
with all escape characters removed.
Words | type of list of words |
words | the list of words to change |
removeEscapes(std::string)
The escaping is removed from each of the words
in the list, which are treated as independent. See removeEscapes(std::string)
for the details.
Definition at line 810 of file ParsingToolkit.h.
std::vector< std::string > icarus::ParsingToolkit::removeQuotations | ( | Words const & | words | ) | const |
Returns a copy of words
with no quotation starts and ends.
Words | type of list of words |
words | the list of words to change |
removeQuotations(std::string)
The substitution is applied on each of the words
in the list, which are treated as independent. See removeQuotations(std::string)
for the details.
Definition at line 823 of file ParsingToolkit.h.
|
inline |
Consumes the blank characters a the beginning of s
.
removeTrailingCharacters()
Definition at line 422 of file ParsingToolkit.h.
std::string_view icarus::ParsingToolkit::removeTrailingCharacters | ( | std::string_view | s, |
CType | charType | ||
) | const |
Consumes the blank characters a the beginning of s
.
CType | type of functor determining which type of character to remove |
s | view of the string to be parsed |
charType | functor determining which characters to remove |
s
starting after its trailing charType
characters removeTrailingBlanks()
Definition at line 794 of file ParsingToolkit.h.
std::string icarus::ParsingToolkit::removeWordEscapes | ( | std::string && | w | ) | const |
Returns a copy of w
with all escape characters removed.
w | the string to change |
w
without escaping removeEscapes(Word const&)
The escaping scheme that is applied is just to remove the escape character (no replacement table supported here). An unescaped escape character at the end of the string will not be removed.
It is recommended that this be done as the last step of the parsing, since it changes the meaning of the parsing elements like quotations, comments etc.
Note that applying removeEscapes()
more than once will keep removing characters that in the earlier passes were not considered escapes (for example, four escape characters become two in the first pass, one in the second and disappear in the following passes).
Definition at line 158 of file ParsingToolkit.cxx.
|
inline |
Definition at line 447 of file ParsingToolkit.h.
|
inline |
Definition at line 449 of file ParsingToolkit.h.
std::string icarus::ParsingToolkit::removeWordQuotations | ( | std::string && | w | ) | const |
Returns a copy of w
with no quotation starts and ends.
w | the string to change |
removeQuotations(Words const&)
Escaping is still honored (if present).
Note that applying removeQuotations
more than once will keep removing quotation markings that in the earlier passes were not considered such (for example, `a1 << "b1 << 'c1 << " or " << c2' << b2" << a2will become first
a1 << b1 << 'c1 << or << c2' << b2 << a2, and eventually
a1 << b1 << c1 << or << c2 << b2 << a2`).
Definition at line 176 of file ParsingToolkit.cxx.
|
inline |
Definition at line 485 of file ParsingToolkit.h.
|
inline |
Definition at line 487 of file ParsingToolkit.h.
|
static |
Splits the view sv
in three: before sep
, sep
and after sep
.
sv | view of the string to split |
sep | a subview of sv to split at |
SplitView_t
object with the three parts split, empty if neededThe view sep
is required to be a subview of sv
: it's not enough for it to have as content a substring of sv
. For example, splitOn("a:1", ":")
will not work, because the string "a:1"
does not share data in memory with ":"
.
Even if sep
is empty, it's still required to point with both begin()
and end()
within sv
, and sv
will be split according to that point.
Definition at line 149 of file ParsingToolkit.cxx.
std::vector< std::string_view > icarus::ParsingToolkit::splitWords | ( | std::string const & | s, |
Delim | isDelimiter | ||
) | const |
Splits a string into words.
Delim | type of delimiter functor |
s | the string to be split |
isDelimiter | (default: isblank() ) determines if a character is a word delimiter |
The splitter algorithm defines a word separator as a sequence of one or more unescaped, unquoted delimiter characters, where a delimiter is a character ch
for which isDelimiter(ch)
is true
.
Note that this function does not change the content of the data, and in particular it does not remove escaping nor quoting (although it interprets both).
A character used as delimiter can appear in a word only if escaped or within quotation. Contiguous non-delimiter elements of a string, including quoted strings, belong to the same word (for example, a" and "b
is a single word when delimitation is by blank characters). An empty word can be introduced only in quotations (e.g. ""
).
The Delim
type is a functor so that isDelimiter(ch)
returns something convertible to bool
, true
if the ch
character should be considered a delimiter. Note that no context is provided for the answer, so the use of each character as delimiter is fixed, and modified only by the hard-coded quotation and escaping rules.
The first characters of quotation starts and the escape characters must not be classified as delimiters, or the algorithm will give wrong results.
Definition at line 545 of file ParsingToolkit.h.
|
inline |
Helper version of splitWords(std::string const&, Delim)
.
Definition at line 190 of file ParsingToolkit.h.
|
static |
Definition at line 97 of file ParsingToolkit.h.
|
private |
Parsing parameters.
Definition at line 519 of file ParsingToolkit.h.
|
private |
Start characters of all supported quotations.
Definition at line 524 of file ParsingToolkit.h.
|
static |
Adapter for determining if a character is a blank (see std::isblank()
).
Definition at line 92 of file ParsingToolkit.h.