All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Classes | Public Types | Public Member Functions | Static Public Member Functions | Static Public Attributes | Private Member Functions | Private Attributes | List of all members
icarus::ParsingToolkit Struct Reference

Utilities for text parsing. More...

#include <ParsingToolkit.h>

Classes

struct  CCTypeAdapter
 
struct  Error
 
struct  Params_t
 All parsing parameters. More...
 
struct  SplitView_t
 Record of a split token: pre-separator, separator and post-separator. More...
 

Public Types

using QuotSpec_t = std::pair< std::string, std::string >
 Specification of quotation: opening and closing. More...
 

Public Member Functions

 ParsingToolkit ()
 Default parsing parameters. More...
 
 ParsingToolkit (Params_t params)
 Creates a parser with the specified parsing parameters. More...
 
Params_t const & params () const noexcept
 Returns the current parameters of parsing. More...
 
template<typename BIter , typename EIter >
std::string_view findFirstUnquoted (std::string_view sv, BIter beginKey, EIter endKey) const
 Finds the first of the specified keys in the unquoted part of sv. More...
 
template<typename Words >
std::vector< std::string > removeEscapes (Words const &words) const
 Returns a copy of words with all escape characters removed. More...
 
template<typename Words >
std::vector< std::string > removeQuotations (Words const &words) const
 Returns a copy of words with no quotation starts and ends. More...
 
template<typename Iter >
bool isCharacterEscaped (Iter begin, Iter itCh) const
 
Input
std::pair< std::string,
unsigned int > 
readMultiline (std::istream &in) const
 Returns a single line of text from the input stream. More...
 
Tokenization
template<typename Delim >
std::vector< std::string_view > splitWords (std::string const &s, Delim isDelimiter) const
 Splits a string into words. More...
 
std::vector< std::string_view > splitWords (std::string const &s) const
 Helper version of splitWords(std::string const&, Delim). More...
 
template<typename Iter >
Iter findCommentWord (Iter beginWord, Iter endWord) const
 Finds the first word starting with a comment marker. More...
 
template<typename WordType >
void removeCommentLine (std::vector< WordType > &words) const
 Removes all the words from the one starting with a comment marker. More...
 
std::pair< std::string_view,
QuotSpec_t const * > 
findQuotationStart (std::string_view sv) const
 Finds the start of the next quotation in sv. More...
 
std::string_view findQuotationEnd (std::string_view sv, std::string const &quotEnd) const
 Finds the quotation end in sv. More...
 
bool isQuotationUnclosed (std::string_view sv) const
 Returns if the sequence sv has unclosed quotation at its end. More...
 
template<typename BIter , typename EIter >
std::string_view findFirstUnescaped (std::string_view sv, BIter beginKey, EIter endKey) const
 Finds the first of the specified keys in sv. More...
 
template<typename Keys >
std::string_view findFirstUnescaped (std::string_view sv, Keys const &keys) const
 Finds the first of the specified keys in sv. More...
 
template<typename Key >
std::string_view findFirstUnescaped (std::string_view sv, std::initializer_list< Key > keys) const
 
template<typename Keys >
std::string_view findFirstUnquoted (std::string_view sv, Keys const &keys) const
 Finds the first of the specified keys in the unquoted part of sv. More...
 
template<typename Key >
std::string_view findFirstUnquoted (std::string_view sv, std::initializer_list< Key > keys) const
 
Characters
bool isEscape (char ch) const
 Returns whether ch is an escape character. More...
 
template<typename BIter >
bool isCharacterEscaped (BIter begin, BIter itCh) const
 Returns whether the character pointed by itCh is escaped or not. More...
 
template<typename Sel >
std::string_view::const_iterator findNextCharacter (std::string_view s, Sel select) const
 Finds the next character satisfying the specified criterion. More...
 
std::string_view::const_iterator findNextBlank (std::string_view s) const
 Helper function for findNextCharacter(std::string_view, Sel). More...
 
template<typename CType >
std::string_view removeTrailingCharacters (std::string_view s, CType charType) const
 Consumes the blank characters a the beginning of s. More...
 
std::string_view removeTrailingBlanks (std::string_view s) const
 Consumes the blank characters a the beginning of s. More...
 
std::string removeWordEscapes (std::string &&w) const
 Returns a copy of w with all escape characters removed. More...
 
std::string removeWordEscapes (std::string_view w) const
 
std::string removeWordEscapes (const char *w) const
 
std::string removeWordQuotations (std::string &&w) const
 Returns a copy of w with no quotation starts and ends. More...
 
std::string removeWordQuotations (std::string_view w) const
 
std::string removeWordQuotations (const char *w) const
 

Static Public Member Functions

static SplitView_t splitOn (std::string_view sv, std::string_view sep)
 Splits the view sv in three: before sep, sep and after sep. More...
 
static std::string_view make_view (std::string const &s)
 Creates a std::string_view from an entire string s. More...
 
template<typename BIter , typename EIter >
static std::string_view make_view (BIter b, EIter e)
 Creates a std::string_view from two string iterators b and e. More...
 

Static Public Attributes

static constexpr CCTypeAdapter
<&std::isblank > 
isBlank {}
 Adapter for determining if a character is a blank (see std::isblank()). More...
 
static Params_t const DefaultParameters
 

Private Member Functions

void adoptParams (Params_t params)
 Initializes the parameters and caches. More...
 

Private Attributes

Params_t fParams
 Parsing parameters. More...
 
std::string fQuoteStarts
 Start characters of all supported quotations. More...
 

Detailed Description

Utilities for text parsing.

This "class" is a glorified namespace with some configuration inside.

Quotation

A quoted string is the content in between an opening quoting sequence and the matching closing sequence. Each sequence may be any string, including but not limited to a one-character long string. Escaping the first character of an opening or closing quotation string will turn it in common string data carrying no quotation meaning.

Escaping rules

Any single character following the escape character is "escaped". The escaped characters lose their standard function and are replaced by a substitute character. For example, escaping the first character of a opening quotation makes that a standard character. An escaped escape character is always replaced by the character itself, without its escape function.

Definition at line 54 of file ParsingToolkit.h.

Member Typedef Documentation

using icarus::ParsingToolkit::QuotSpec_t = std::pair<std::string, std::string>

Specification of quotation: opening and closing.

Definition at line 63 of file ParsingToolkit.h.

Constructor & Destructor Documentation

icarus::ParsingToolkit::ParsingToolkit ( )
inline

Default parsing parameters.

Creates a parser with the default parsing parameters.

Definition at line 100 of file ParsingToolkit.h.

void adoptParams(Params_t params)
Initializes the parameters and caches.
static Params_t const DefaultParameters
icarus::ParsingToolkit::ParsingToolkit ( Params_t  params)
inline

Creates a parser with the specified parsing parameters.

Definition at line 103 of file ParsingToolkit.h.

103 { adoptParams(std::move(params)); }
void adoptParams(Params_t params)
Initializes the parameters and caches.
Params_t const & params() const noexcept
Returns the current parameters of parsing.

Member Function Documentation

void icarus::ParsingToolkit::adoptParams ( Params_t  params)
private

Initializes the parameters and caches.

Definition at line 220 of file ParsingToolkit.cxx.

220  {
221 
222  fParams = std::move(params);
223 
224  // sort the quotations by length
225  auto const byOpeningLength = [](QuotSpec_t const& a, QuotSpec_t const& b)
226  {
227  std::size_t const al = a.first.length(), bl = b.first.length();
228  return (al != bl)? (al > bl): (a < b);
229  };
230  std::sort(fParams.quotes.begin(), fParams.quotes.end(), byOpeningLength);
231 
232  // collect the first character of each of the opening quotes
233  // (sorted and with no duplicates)
234  for (QuotSpec_t const& quotSpec: fParams.quotes)
235  fQuoteStarts += quotSpec.first.front();
236  std::sort(fQuoteStarts.begin(), fQuoteStarts.end());
237  fQuoteStarts.erase
238  (std::unique(fQuoteStarts.begin(), fQuoteStarts.end()), fQuoteStarts.end());
239 
240 } // icarus::ParsingToolkit::adoptParams()
Params_t fParams
Parsing parameters.
std::vector< QuotSpec_t > quotes
List of matching start and end of quote.
process_name gaushit a
std::pair< std::string, std::string > QuotSpec_t
Specification of quotation: opening and closing.
Params_t const & params() const noexcept
Returns the current parameters of parsing.
std::string fQuoteStarts
Start characters of all supported quotations.
template<typename Iter >
Iter icarus::ParsingToolkit::findCommentWord ( Iter  beginWord,
Iter  endWord 
) const

Finds the first word starting with a comment marker.

Template Parameters
Itertype of iterator to the words
Parameters
beginWorditerator to the first word to consider
endWorditerator past the lasy word to consider
Returns
an iterator to the comment word, or endWord if not found

The original list is modified, the word starting with a comment marker and all the following ones are removed.

Definition at line 748 of file ParsingToolkit.h.

749 {
750  for (auto it = beginWord; it != endWord; ++it) {
751  if (std::equal(fParams.comment.begin(), fParams.comment.end(), begin(*it)))
752  return it;
753  } // for
754  return endWord;
755 } // icarus::ParsingToolkit::findCommentWord()
std::string comment
Word introducing a comment.
Params_t fParams
Parsing parameters.
auto begin(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:573
bool equal(double a, double b)
Comparison tolerance, in centimeters.
template<typename BIter , typename EIter >
std::string_view icarus::ParsingToolkit::findFirstUnescaped ( std::string_view  sv,
BIter  beginKey,
EIter  endKey 
) const

Finds the first of the specified keys in sv.

Template Parameters
BItertype of iterator to the keys
EItertype of key end-iterator
Parameters
svstring to be parsed
beginKeyiterator to the first key
endKeyiterator past the last key
Returns
a view of the key found within sv, empty if none

The keys are required to be sorted, longest first, since they are tested in order and the first match is kept (e.g. if the first key is = and the second is ==, the second key is never matched since the first one matches first). The first character of the key must not be escaped. Escaped characters in the key are not supported.

If no key is found, the returned view is zero-length and pointing to the end of sv.

The quoting in sv is ignored.

Definition at line 634 of file ParsingToolkit.h.

635 {
636 
637  typename std::iterator_traits<BIter>::value_type const* key = nullptr;
638  std::size_t keyPos = std::string_view::npos;
639 
640  for (auto iKey = beginKey; iKey != endKey; ++iKey) {
641  // find where this key is (unescaped)
642  std::size_t pos = 0;
643  while (pos < sv.length()) {
644  pos = sv.find(*iKey, pos);
645  if (!isCharacterEscaped(sv.begin(), sv.begin() + pos)) break;
646  ++pos;
647  }
648  // is this the first among the keys?
649  if (pos >= std::min(keyPos, sv.length())) continue;
650  key = &*iKey;
651  keyPos = pos;
652  } // for keys
653 
654  // return a substring of sv, not key
655  if (key) {
656  using std::begin, std::end;
657  std::size_t const keyLength = make_view(*key).length();
658  return { sv.data() + keyPos, keyLength };
659  }
660  else return { sv.data() + sv.length(), 0 };
661 } // icarus::ParsingToolkit::findFirstUnescaped()
bool isCharacterEscaped(BIter begin, BIter itCh) const
Returns whether the character pointed by itCh is escaped or not.
static std::string_view make_view(std::string const &s)
Creates a std::string_view from an entire string s.
auto end(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:585
auto begin(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:573
template<typename Keys >
std::string_view icarus::ParsingToolkit::findFirstUnescaped ( std::string_view  sv,
Keys const &  keys 
) const

Finds the first of the specified keys in sv.

Template Parameters
BItertype of iterator to the keys
EItertype of key end-iterator
Parameters
svstring to be parsed
beginKeyiterator to the first key
endKeyiterator past the last key
Returns
a view of the key found within sv, empty if none

The keys are required to be sorted, longest first, since they are tested in order and the first match is kept (e.g. if the first key is = and the second is ==, the second key is never matched since the first one matches first). The first character of the key must not be escaped. Escaped characters in the key are not supported.

If no key is found, the returned view is zero-length and pointing to the end of sv.

The quoting in sv is ignored.

Definition at line 667 of file ParsingToolkit.h.

668 {
669  using std::begin, std::end;
670  return findFirstUnescaped(sv, begin(keys), end(keys));
671 } // icarus::ParsingToolkit::findFirstUnescaped(Keys)
std::string_view findFirstUnescaped(std::string_view sv, BIter beginKey, EIter endKey) const
Finds the first of the specified keys in sv.
auto end(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:585
auto begin(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:573
template<typename Key >
std::string_view icarus::ParsingToolkit::findFirstUnescaped ( std::string_view  sv,
std::initializer_list< Key >  keys 
) const

Definition at line 677 of file ParsingToolkit.h.

678  { return findFirstUnescaped(sv, keys.begin(), keys.end()); }
std::string_view findFirstUnescaped(std::string_view sv, BIter beginKey, EIter endKey) const
Finds the first of the specified keys in sv.
template<typename BIter , typename EIter >
std::string_view icarus::ParsingToolkit::findFirstUnquoted ( std::string_view  sv,
BIter  beginKey,
EIter  endKey 
) const

Finds the first of the specified keys in the unquoted part of sv.

Template Parameters
BItertype of iterator to the keys
EItertype of key end-iterator
Parameters
svstring to be parsed
beginKeyiterator to the first key
endKeyiterator past the last key
Returns
the view pointing to the key in sv, or empty to its end if none

The keys are required to be sorted, longest first, since they are tested in order and the first match is kept (e.g. if the first key is = and the second is ==, the second key is never matched since the first one matches first).

If no key is found, the returned view is zero-length and pointing to the end of sv.

Definition at line 684 of file ParsingToolkit.h.

685 {
686 
687  // if a key is found between `b` and `e`, returns `sv` split around the key;
688  // otherwise, all `sv` is in post
689  auto findKey = [this,beginKey,endKey]
690  (std::string_view::const_iterator b, std::string_view::const_iterator e)
691  { return findFirstUnescaped(make_view(b, e), beginKey, endKey); };
692 
693  std::string_view key{ sv.data() + sv.length(), 0 };
694  while (!sv.empty()) {
695 
696  // find the next quotation
697  auto const [ fromQ, qptr ] = findQuotationStart(sv);
698 
699  // search in the unquoted part
700  key = findKey(sv.begin(), fromQ.begin());
701  if (!key.empty()) break;
702 
703  // skip the quotation; if there is no quotation, we are done
704  if (!qptr) break;
705 
706  sv = fromQ;
707  sv.remove_prefix(qptr->first.length()); // skip the quotation start
708 
709  // find the end of quotation
710  std::string_view const afterQ = findQuotationEnd(sv, qptr->second);
711 
712  if (afterQ.empty()) { // begin of quotation, but no end: no good
713  // so we don't consider this as quotation: search in the "quoted" part
714  key = findKey(fromQ.begin(), fromQ.end());
715  break;
716  } // if
717 
718  // skip the quoted material, and the quotation end too
719  sv = afterQ;
720  sv.remove_prefix(qptr->second.length());
721 
722  } // while
723 
724  return key;
725 
726 } // icarus::ParsingToolkit::findFirstUnquoted(Iter)
std::pair< std::string_view, QuotSpec_t const * > findQuotationStart(std::string_view sv) const
Finds the start of the next quotation in sv.
std::string_view findFirstUnescaped(std::string_view sv, BIter beginKey, EIter endKey) const
Finds the first of the specified keys in sv.
static std::string_view make_view(std::string const &s)
Creates a std::string_view from an entire string s.
std::string_view findQuotationEnd(std::string_view sv, std::string const &quotEnd) const
Finds the quotation end in sv.
do i e
template<typename Keys >
std::string_view icarus::ParsingToolkit::findFirstUnquoted ( std::string_view  sv,
Keys const &  keys 
) const

Finds the first of the specified keys in the unquoted part of sv.

Template Parameters
BItertype of iterator to the keys
EItertype of key end-iterator
Parameters
svstring to be parsed
beginKeyiterator to the first key
endKeyiterator past the last key
Returns
the view pointing to the key in sv, or empty to its end if none

The keys are required to be sorted, longest first, since they are tested in order and the first match is kept (e.g. if the first key is = and the second is ==, the second key is never matched since the first one matches first).

If no key is found, the returned view is zero-length and pointing to the end of sv.

Definition at line 732 of file ParsingToolkit.h.

733 {
734  using std::begin, std::end;
735  return findFirstUnquoted(sv, begin(keys), end(keys));
736 } // icarus::ParsingToolkit::findFirstUnquoted(Keys)
std::string_view findFirstUnquoted(std::string_view sv, BIter beginKey, EIter endKey) const
Finds the first of the specified keys in the unquoted part of sv.
auto end(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:585
auto begin(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:573
template<typename Key >
std::string_view icarus::ParsingToolkit::findFirstUnquoted ( std::string_view  sv,
std::initializer_list< Key >  keys 
) const

Definition at line 742 of file ParsingToolkit.h.

743  { return findFirstUnquoted(sv, keys.begin(), keys.end()); }
std::string_view findFirstUnquoted(std::string_view sv, BIter beginKey, EIter endKey) const
Finds the first of the specified keys in the unquoted part of sv.
std::string_view::const_iterator icarus::ParsingToolkit::findNextBlank ( std::string_view  s) const
inline

Helper function for findNextCharacter(std::string_view, Sel).

Definition at line 405 of file ParsingToolkit.h.

406  { return findNextCharacter(s, isBlank); }
std::string_view::const_iterator findNextCharacter(std::string_view s, Sel select) const
Finds the next character satisfying the specified criterion.
then echo File list $list not found else cat $list while read file do echo $file sed s
Definition: file_to_url.sh:60
static constexpr CCTypeAdapter<&std::isblank > isBlank
Adapter for determining if a character is a blank (see std::isblank()).
template<typename Sel >
std::string_view::const_iterator icarus::ParsingToolkit::findNextCharacter ( std::string_view  s,
Sel  select 
) const

Finds the next character satisfying the specified criterion.

Template Parameters
Seltype of functor determining which character to consider blank
Parameters
sview of the string to be parsed
selectfunctor determining which character(s) to look for
Returns
an iterator to the first character, s.end() if none

By default, the selected character is a blank character ch, which has std::isblank(ch) true.

Definition at line 778 of file ParsingToolkit.h.

779 {
780  auto const sbegin = s.begin(), send = s.end();
781  auto it = sbegin;
782  while (it != send) {
783  it = std::find_if(it, send, selector);
784  if (!isCharacterEscaped(sbegin, it)) return it;
785  ++it; // skip the escaped character and move on
786  } // while
787  return send;
788 } // icarus::ParsingToolkit::findNextCharacter()
bool isCharacterEscaped(BIter begin, BIter itCh) const
Returns whether the character pointed by itCh is escaped or not.
then echo File list $list not found else cat $list while read file do echo $file sed s
Definition: file_to_url.sh:60
std::string_view icarus::ParsingToolkit::findQuotationEnd ( std::string_view  sv,
std::string const &  quotEnd 
) const

Finds the quotation end in sv.

Parameters
svthe buffer to look the quotation end into
quotEndthe quotation end to be searched
Returns
a view of sv from the quotation end, included, empty if not found

Note that sv should not include the quotation start.

Definition at line 107 of file ParsingToolkit.cxx.

108 {
109  while (!sv.empty()) {
110 
111  std::size_t const pos = sv.find(quotEnd);
112  if (pos == std::string_view::npos) break;
113 
114  if (!isCharacterEscaped(sv.begin(), sv.begin() + pos)) {
115  sv.remove_prefix(pos);
116  return sv;
117  }
118 
119  sv.remove_prefix(pos + 1);
120 
121  } // while
122 
123  return make_view(sv.end(), sv.end());
124 } // icarus::ParsingToolkit::findQuotationEnd()
bool isCharacterEscaped(BIter begin, BIter itCh) const
Returns whether the character pointed by itCh is escaped or not.
static std::string_view make_view(std::string const &s)
Creates a std::string_view from an entire string s.
auto icarus::ParsingToolkit::findQuotationStart ( std::string_view  sv) const

Finds the start of the next quotation in sv.

Parameters
svthe buffer to look the quotation start into
Returns
a subview of sv starting from the quotation found, empty if none

Definition at line 66 of file ParsingToolkit.cxx.

68 {
69 
70  while (!sv.empty()) {
71 
72  // look for a character that could start a quotation opening
73  std::size_t const startPos = sv.find_first_of(fQuoteStarts);
74 
75  // no such character found:
76  if (startPos == std::string_view::npos) break;
77 
78  // if the character is escaped, this is not a quotation opening:
79  if (isCharacterEscaped(sv.begin(), sv.begin() + startPos)) {
80  sv.remove_prefix(std::min(startPos + 1, sv.length()));
81  continue;
82  }
83 
84  sv.remove_prefix(std::min(startPos, sv.length()));
85 
86  // try all the opening quotes
87  // (may be optimized by grouping them by first character)
88  for (auto const& qSpec: fParams.quotes) {
89 
90 // if (sv.starts_with(qSpec.first)) return { sv, &qSpec }; // C++20
91  if (sv.compare(0, qSpec.first.length(), qSpec.first) == 0)
92  return { sv, &qSpec };
93 
94  } // for quotes
95 
96  // nope, just a character; remove it and keep looking
97  sv.remove_prefix(1);
98 
99  } // while sv
100 
101  return { make_view(sv.end(), sv.end()), nullptr };
102 } // icarus::ParsingToolkit::findQuotationStart()
Params_t fParams
Parsing parameters.
std::vector< QuotSpec_t > quotes
List of matching start and end of quote.
bool isCharacterEscaped(BIter begin, BIter itCh) const
Returns whether the character pointed by itCh is escaped or not.
static std::string_view make_view(std::string const &s)
Creates a std::string_view from an entire string s.
std::string fQuoteStarts
Start characters of all supported quotations.
template<typename BIter >
bool icarus::ParsingToolkit::isCharacterEscaped ( BIter  begin,
BIter  itCh 
) const

Returns whether the character pointed by itCh is escaped or not.

Template Parameters
BIteriterator type
Parameters
beginiterator to the beginning of the string
itChiterator to the character to be investigated.
Returns
whether there is an unescaped escape character before itCh

Note that itCh may be a end iterator (for an empty string, the result is false).

template<typename Iter >
bool icarus::ParsingToolkit::isCharacterEscaped ( Iter  begin,
Iter  itCh 
) const

Definition at line 760 of file ParsingToolkit.h.

761 {
762  unsigned int nEscapes = 0U;
763  while (itCh-- != begin) {
764 
765  if (!isEscape(*itCh)) break;
766  ++nEscapes;
767 
768  } // while
769 
770  return (nEscapes & 1) == 1; // odd number of escapes means escaped
771 
772 } // icarus::ParsingToolkit::isCharacterEscaped()
bool isEscape(char ch) const
Returns whether ch is an escape character.
auto begin(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:573
bool icarus::ParsingToolkit::isEscape ( char  ch) const
inline

Returns whether ch is an escape character.

Definition at line 375 of file ParsingToolkit.h.

375 { return ch == fParams.escape; }
char escape
Escape character.
Params_t fParams
Parsing parameters.
bool icarus::ParsingToolkit::isQuotationUnclosed ( std::string_view  sv) const

Returns if the sequence sv has unclosed quotation at its end.

Definition at line 128 of file ParsingToolkit.cxx.

128  {
129 
130  while (!sv.empty()) {
131 
132  auto [ qsv, qptr ] = findQuotationStart(sv);
133  if (!qptr) return false;
134 
135  qsv.remove_prefix(qptr->first.length()); // remove the opening quote
136 
137  qsv = findQuotationEnd(qsv, qptr->second);
138  if (qsv.empty()) return true;
139 
140  qsv.remove_prefix(qptr->second.length());
141  sv = qsv;
142  } // while
143 
144  return false;
145 } // icarus::ParsingToolkit::isQuotationUnclosed()
std::pair< std::string_view, QuotSpec_t const * > findQuotationStart(std::string_view sv) const
Finds the start of the next quotation in sv.
std::string_view findQuotationEnd(std::string_view sv, std::string const &quotEnd) const
Finds the quotation end in sv.
static std::string_view icarus::ParsingToolkit::make_view ( std::string const &  s)
inlinestatic

Creates a std::string_view from an entire string s.

Definition at line 510 of file ParsingToolkit.h.

511  { return make_view(s.begin(), s.end()); }
static std::string_view make_view(std::string const &s)
Creates a std::string_view from an entire string s.
then echo File list $list not found else cat $list while read file do echo $file sed s
Definition: file_to_url.sh:60
template<typename BIter , typename EIter >
static std::string_view icarus::ParsingToolkit::make_view ( BIter  b,
EIter  e 
)
inlinestatic

Creates a std::string_view from two string iterators b and e.

Definition at line 515 of file ParsingToolkit.h.

516  { return { &*b, static_cast<std::size_t>(std::distance(b, e)) }; }
double distance(geo::Point_t const &point, CathodeDesc_t const &cathode)
Returns the distance of a point from the cathode.
do i e
Params_t const& icarus::ParsingToolkit::params ( ) const
inlinenoexcept

Returns the current parameters of parsing.

Definition at line 111 of file ParsingToolkit.h.

111 { return fParams; }
Params_t fParams
Parsing parameters.
std::pair< std::string, unsigned int > icarus::ParsingToolkit::readMultiline ( std::istream &  in) const

Returns a single line of text from the input stream.

Parameters
inthe input stream
Returns
the string read, and the number of lines read
Exceptions
Erroron fatal parsing errors

This function reads entire lines from in, where a line is defined as in std::getline(). If the line ends with an unescaped escape character, another line is read and appended (the escape character is dropped). The return value is the merged string with no end-of-line characters, and the number of lines read. If there is no string to be read, it returns an empty string and 0U.

Special behaviour

  • If the line ends while a quotation is still open, the next line is also merged, and the line break is kept; to merge quoted lines without preserving the line break character, end the quote on the first line, immediately break the line escaping it, and then next line should immediately start with opening a quotation.
  • If the line ends while a quotation is still open, it is a parsing error to have the line break character escaped (an exception will be thrown) merged, and the line break is kept.
  • If the file ends while a quotation is still open, the line is preserved as such.

Definition at line 27 of file ParsingToolkit.cxx.

28 {
29 
30  std::string fullLine;
31  std::string openQuoteLine;
32  unsigned int nLines = 0U;
33  while (in) {
34 
35  std::string line;
36  std::getline(in, line, in.widen(fParams.EOL));
37  bool const isEOF = in.eof();
38  if (!isEOF || !line.empty()) ++nLines;
39  openQuoteLine.append(line);
40 
41  if (isQuotationUnclosed(make_view(openQuoteLine))) {
42  if (isCharacterEscaped(line.begin(), line.end())) {
43  fullLine.append(openQuoteLine);
44  throw Error{ "Parser error: escaped end-of-line inside a quotation:\n"
45  + fullLine + "\n" };
46  }
47  // if the newline is quoted, it's preserved
48  if (!isEOF) openQuoteLine += fParams.EOL;
49  continue;
50  }
51  fullLine.append(openQuoteLine);
52  openQuoteLine.clear();
53 
54  if (!isCharacterEscaped(fullLine.begin(), fullLine.end())) break;
55 
56  fullLine.pop_back(); // remove the escape character
57 
58  } // while
59  fullLine.append(openQuoteLine); // usually empty
60 
61  return { std::move(fullLine), nLines };
62 } // icarus::ParsingToolkit::readMultiline()
bool isQuotationUnclosed(std::string_view sv) const
Returns if the sequence sv has unclosed quotation at its end.
Params_t fParams
Parsing parameters.
bool isCharacterEscaped(BIter begin, BIter itCh) const
Returns whether the character pointed by itCh is escaped or not.
static std::string_view make_view(std::string const &s)
Creates a std::string_view from an entire string s.
if &&[-z"$BASH_VERSION"] then echo Attempting to switch to bash bash shellSwitch exit fi &&["$1"= 'shellSwitch'] shift declare a IncludeDirectives for Dir in
template<typename WordType >
void icarus::ParsingToolkit::removeCommentLine ( std::vector< WordType > &  words) const
inline

Removes all the words from the one starting with a comment marker.

Parameters
wordslist of words

The original list is modified, the word starting with a comment marker and all the following ones are removed.

Definition at line 216 of file ParsingToolkit.h.

217  { words.erase(findCommentWord(words.begin(), words.end()), words.end()); }
Iter findCommentWord(Iter beginWord, Iter endWord) const
Finds the first word starting with a comment marker.
template<typename Words >
std::vector< std::string > icarus::ParsingToolkit::removeEscapes ( Words const &  words) const

Returns a copy of words with all escape characters removed.

Template Parameters
Wordstype of list of words
Parameters
wordsthe list of words to change
Returns
the list of words without escaping
See Also
removeEscapes(std::string)

The escaping is removed from each of the words in the list, which are treated as independent. See removeEscapes(std::string) for the details.

Definition at line 810 of file ParsingToolkit.h.

811 {
812  using std::size;
813  std::vector<std::string> nv;
814  nv.reserve(size(words));
815  for (auto const& word: words) nv.push_back(removeWordEscapes(word));
816  return nv;
817 } // icarus::ParsingToolkit::removeEscapes()
std::string removeWordEscapes(std::string &&w) const
Returns a copy of w with all escape characters removed.
std::size_t size(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:561
template<typename Words >
std::vector< std::string > icarus::ParsingToolkit::removeQuotations ( Words const &  words) const

Returns a copy of words with no quotation starts and ends.

Template Parameters
Wordstype of list of words
Parameters
wordsthe list of words to change
Returns
the list of words without quotations
See Also
removeQuotations(std::string)

The substitution is applied on each of the words in the list, which are treated as independent. See removeQuotations(std::string) for the details.

Definition at line 823 of file ParsingToolkit.h.

824 {
825  using std::size;
826  std::vector<std::string> nv;
827  nv.reserve(size(words));
828  for (auto const& word: words) nv.push_back(removeWordQuotations(word));
829  return nv;
830 } // icarus::ParsingToolkit::removeEscapes()
std::size_t size(FixedBins< T, C > const &) noexcept
Definition: FixedBins.h:561
std::string removeWordQuotations(std::string &&w) const
Returns a copy of w with no quotation starts and ends.
std::string_view icarus::ParsingToolkit::removeTrailingBlanks ( std::string_view  s) const
inline

Consumes the blank characters a the beginning of s.

See Also
removeTrailingCharacters()

Definition at line 422 of file ParsingToolkit.h.

423  { return removeTrailingCharacters(s, isBlank); }
std::string_view removeTrailingCharacters(std::string_view s, CType charType) const
Consumes the blank characters a the beginning of s.
then echo File list $list not found else cat $list while read file do echo $file sed s
Definition: file_to_url.sh:60
static constexpr CCTypeAdapter<&std::isblank > isBlank
Adapter for determining if a character is a blank (see std::isblank()).
template<typename CType >
std::string_view icarus::ParsingToolkit::removeTrailingCharacters ( std::string_view  s,
CType  charType 
) const

Consumes the blank characters a the beginning of s.

Template Parameters
CTypetype of functor determining which type of character to remove
Parameters
sview of the string to be parsed
charTypefunctor determining which characters to remove
Returns
a view of s starting after its trailing charType characters
See Also
removeTrailingBlanks()

Definition at line 794 of file ParsingToolkit.h.

795 {
796  // REQUIREMENT: escape character must not be classified as delimiter
797  assert(!charType(fParams.escape));
798 
799  while (!s.empty()) {
800  if (!charType(s.front())) break; // escape character triggers this too
801  s.remove_prefix(1U);
802  } // while
803  return s;
804 } // icarus::ParsingToolkit::removeTrailingCharacters()
char escape
Escape character.
Params_t fParams
Parsing parameters.
then echo File list $list not found else cat $list while read file do echo $file sed s
Definition: file_to_url.sh:60
std::string icarus::ParsingToolkit::removeWordEscapes ( std::string &&  w) const

Returns a copy of w with all escape characters removed.

Parameters
wthe string to change
Returns
a copy of w without escaping
See Also
removeEscapes(Word const&)

The escaping scheme that is applied is just to remove the escape character (no replacement table supported here). An unescaped escape character at the end of the string will not be removed.

It is recommended that this be done as the last step of the parsing, since it changes the meaning of the parsing elements like quotations, comments etc.

Note that applying removeEscapes() more than once will keep removing characters that in the earlier passes were not considered escapes (for example, four escape characters become two in the first pass, one in the second and disappear in the following passes).

Definition at line 158 of file ParsingToolkit.cxx.

158  {
159 
160  // replace in place
161  std::string::const_iterator iSrc = s.begin(), send = s.end();
162  std::string::iterator iDest = s.begin();
163 
164  // if the last character is an escape, it's kept
165  while (iSrc != send) {
166  char const ch = *iSrc++;
167  *iDest++ = (isEscape(ch) && (iSrc != send))? *iSrc++: ch;
168  } // while
169 
170  s.erase(iDest, send);
171  return std::move(s);
172 } // icarus::ParsingToolkit::removeWordEscapes()
bool isEscape(char ch) const
Returns whether ch is an escape character.
then echo File list $list not found else cat $list while read file do echo $file sed s
Definition: file_to_url.sh:60
std::string icarus::ParsingToolkit::removeWordEscapes ( std::string_view  w) const
inline

Definition at line 447 of file ParsingToolkit.h.

448  { return removeWordEscapes(std::string{ w }); }
std::string removeWordEscapes(std::string &&w) const
Returns a copy of w with all escape characters removed.
std::string icarus::ParsingToolkit::removeWordEscapes ( const char *  w) const
inline

Definition at line 449 of file ParsingToolkit.h.

450  { return removeWordEscapes(std::string{ w }); }
std::string removeWordEscapes(std::string &&w) const
Returns a copy of w with all escape characters removed.
std::string icarus::ParsingToolkit::removeWordQuotations ( std::string &&  w) const

Returns a copy of w with no quotation starts and ends.

Parameters
wthe string to change
Returns
the word without quotations
See Also
removeQuotations(Words const&)

Escaping is still honored (if present).

Note that applying removeQuotations more than once will keep removing quotation markings that in the earlier passes were not considered such (for example, `a1 << "b1 << 'c1 << " or " << c2' << b2" << a2will become first a1 << b1 << 'c1 << or << c2' << b2 << a2, and eventually a1 << b1 << c1 << or << c2 << b2 << a2`).

Definition at line 176 of file ParsingToolkit.cxx.

177 {
178  std::string_view sv = make_view(s);
179  std::string::iterator iDest = s.begin();
180 
181  while (!sv.empty()) {
182 
183  // find the next quotation
184  auto const [ fromQ, qptr ] = findQuotationStart(sv);
185 
186  // copy the material until the next quotation
187  iDest = std::copy(sv.begin(), fromQ.begin(), iDest);
188  sv = fromQ;
189 
190  if (!qptr) break; // if there is no quotation, we are done
191 
192  sv.remove_prefix(qptr->first.length()); // skip the quotation start
193 
194  // find the end of quotation
195  std::string_view const afterQ = findQuotationEnd(sv, qptr->second);
196 
197  if (afterQ.empty()) { // begin of quotation, but no end: no good
198  // leave the "begin of quotation" as is
199  iDest = std::copy(fromQ.begin(), fromQ.end(), iDest);
200  sv.remove_prefix(sv.length()); // note: quote start was already removed
201  break;
202  }
203 
204  // copy the quoted material
205  iDest = std::copy(sv.begin(), afterQ.begin(), iDest);
206  sv = afterQ;
207 
208  sv.remove_prefix(qptr->second.length()); // skip the quotation end
209 
210  } // while
211 
212  assert(sv.empty());
213 
214  s.erase(iDest, s.end());
215  return std::move(s);
216 } // icarus::ParsingToolkit::removeWordQuotations()
std::pair< std::string_view, QuotSpec_t const * > findQuotationStart(std::string_view sv) const
Finds the start of the next quotation in sv.
static std::string_view make_view(std::string const &s)
Creates a std::string_view from an entire string s.
std::string_view findQuotationEnd(std::string_view sv, std::string const &quotEnd) const
Finds the quotation end in sv.
then echo File list $list not found else cat $list while read file do echo $file sed s
Definition: file_to_url.sh:60
T copy(T const &v)
std::string icarus::ParsingToolkit::removeWordQuotations ( std::string_view  w) const
inline

Definition at line 485 of file ParsingToolkit.h.

486  { return removeWordQuotations(std::string{ w }); }
std::string removeWordQuotations(std::string &&w) const
Returns a copy of w with no quotation starts and ends.
std::string icarus::ParsingToolkit::removeWordQuotations ( const char *  w) const
inline

Definition at line 487 of file ParsingToolkit.h.

488  { return removeWordQuotations(std::string{ w }); }
std::string removeWordQuotations(std::string &&w) const
Returns a copy of w with no quotation starts and ends.
auto icarus::ParsingToolkit::splitOn ( std::string_view  sv,
std::string_view  sep 
)
static

Splits the view sv in three: before sep, sep and after sep.

Parameters
svview of the string to split
sepa subview of sv to split at
Returns
a SplitView_t object with the three parts split, empty if needed

The view sep is required to be a subview of sv: it's not enough for it to have as content a substring of sv. For example, splitOn("a:1", ":") will not work, because the string "a:1" does not share data in memory with ":".

Even if sep is empty, it's still required to point with both begin() and end() within sv, and sv will be split according to that point.

Definition at line 149 of file ParsingToolkit.cxx.

151 {
152  return
153  { make_view(sv.begin(), sep.begin()), sep, make_view(sep.end(), sv.end()) };
154 } // icarus::ParsingToolkit::splitOn()
static std::string_view make_view(std::string const &s)
Creates a std::string_view from an entire string s.
template<typename Delim >
std::vector< std::string_view > icarus::ParsingToolkit::splitWords ( std::string const &  s,
Delim  isDelimiter 
) const

Splits a string into words.

Template Parameters
Delimtype of delimiter functor
Parameters
sthe string to be split
isDelimiter(default: isblank()) determines if a character is a word delimiter
Returns
a sequence of views, one per word

The splitter algorithm defines a word separator as a sequence of one or more unescaped, unquoted delimiter characters, where a delimiter is a character ch for which isDelimiter(ch) is true.

Note that this function does not change the content of the data, and in particular it does not remove escaping nor quoting (although it interprets both).

A character used as delimiter can appear in a word only if escaped or within quotation. Contiguous non-delimiter elements of a string, including quoted strings, belong to the same word (for example, a" and "b is a single word when delimitation is by blank characters). An empty word can be introduced only in quotations (e.g. "").

The Delim type is a functor so that isDelimiter(ch) returns something convertible to bool, true if the ch character should be considered a delimiter. Note that no context is provided for the answer, so the use of each character as delimiter is fixed, and modified only by the hard-coded quotation and escaping rules.

The first characters of quotation starts and the escape characters must not be classified as delimiters, or the algorithm will give wrong results.

Definition at line 545 of file ParsingToolkit.h.

546 {
547  // REQUIREMENT: escape character must not be classified as delimiter
548  assert(!isDelimiter(fParams.escape));
549  // REQUIREMENT: the first character of no quotation start must be classified
550  // as delimiter
551  assert(
552  std::count_if(fQuoteStarts.cbegin(), fQuoteStarts.cend(), isDelimiter) == 0
553  );
554 
555 
556  // helper class:
557  // stores the word as collected so far, updates `sv` and starts new words
558  class WordTracker {
559  ParsingToolkit const& tk;
560  Delim const& isDelimiter;
561  std::string_view& sv;
562  std::vector<std::string_view> words;
563  std::string_view::const_iterator wStart;
564  public:
565  WordTracker(ParsingToolkit const& tk, Delim const& d, std::string_view& sv)
566  : tk{ tk }, isDelimiter{ d }, sv{ consumeDelim(sv) }, wStart{ sv.begin() }
567  {}
568  void startNew()
569  {
570  words.push_back(make_view(wStart, sv.begin()));
571  wStart = consumeDelim().begin();
572  }
573  void moveEndTo(std::string_view::const_iterator it)
574  { moveEndBy(it - sv.begin()); }
575  void moveEndBy(std::size_t n) { sv.remove_prefix(n); }
576  std::vector<std::string_view> finish()
577  { if (wStart != sv.begin()) startNew(); return std::move(words); }
578  std::string_view& consumeDelim(std::string_view& s) const
579  { return s = tk.removeTrailingCharacters(s, isDelimiter); }
580  std::string_view& consumeDelim() { return consumeDelim(sv); }
581  }; // WordTracker
582 
583  std::string_view sv = make_view(s);
584  WordTracker words { *this, isDelimiter, sv }; // shares sv management
585 
586  // sv.begin() is kept updated to the candidate end of word;
587  // the beginning of the current word is always cached as words.wStart
588  while (!sv.empty()) {
589 
590  // process up to the next quotation
591  auto const [ qsv, qptr ] = findQuotationStart(sv);
592 
593  // parse and split until the quotation start:
594  auto const qstart = qsv.begin();
595  while(true) {
596 
597  // find next space;
598  // if next space is past the quotation, stop to the quotation instead
599  words.moveEndTo
600  (findNextCharacter(make_view(sv.begin(), qstart), isDelimiter));
601 
602  if (sv.begin() == qstart) break;
603 
604  // not the quote? it's a delimiter! new word found:
605  words.startNew();
606 
607  } // while(true)
608 
609  // handle the quoted part
610  if (qptr) {
611  assert(sv.substr(0, qptr->first.length()) == qptr->first);
612 
613  words.moveEndBy(qptr->first.length());
614 
615  // find the end of the quote, and swallow it into the current word
616  std::string_view const quotEnd = findQuotationEnd(sv, qptr->second);
617  words.moveEndTo(quotEnd.begin());
618 
619  // if we have found a end of quote, swallow it too (otherwise it's over)
620  if (!quotEnd.empty()) words.moveEndBy(qptr->second.length());
621 
622  } // if quotation found
623 
624  } // while
625 
626  return words.finish();
627 
628 } // icarus::ParsingToolkit::splitWords()
std::pair< std::string_view, QuotSpec_t const * > findQuotationStart(std::string_view sv) const
Finds the start of the next quotation in sv.
char escape
Escape character.
ParsingToolkit()
Default parsing parameters.
Params_t fParams
Parsing parameters.
static std::string_view make_view(std::string const &s)
Creates a std::string_view from an entire string s.
std::string_view::const_iterator findNextCharacter(std::string_view s, Sel select) const
Finds the next character satisfying the specified criterion.
std::string_view findQuotationEnd(std::string_view sv, std::string const &quotEnd) const
Finds the quotation end in sv.
then echo File list $list not found else cat $list while read file do echo $file sed s
Definition: file_to_url.sh:60
std::string fQuoteStarts
Start characters of all supported quotations.
std::vector<std::string_view> icarus::ParsingToolkit::splitWords ( std::string const &  s) const
inline

Helper version of splitWords(std::string const&, Delim).

Definition at line 190 of file ParsingToolkit.h.

191  { return splitWords(s, isBlank); }
std::vector< std::string_view > splitWords(std::string const &s, Delim isDelimiter) const
Splits a string into words.
then echo File list $list not found else cat $list while read file do echo $file sed s
Definition: file_to_url.sh:60
static constexpr CCTypeAdapter<&std::isblank > isBlank
Adapter for determining if a character is a blank (see std::isblank()).

Member Data Documentation

icarus::ParsingToolkit::Params_t const icarus::ParsingToolkit::DefaultParameters
static

Definition at line 97 of file ParsingToolkit.h.

Params_t icarus::ParsingToolkit::fParams
private

Parsing parameters.

Definition at line 519 of file ParsingToolkit.h.

std::string icarus::ParsingToolkit::fQuoteStarts
private

Start characters of all supported quotations.

Definition at line 524 of file ParsingToolkit.h.

constexpr CCTypeAdapter<&std::isblank> icarus::ParsingToolkit::isBlank {}
static

Adapter for determining if a character is a blank (see std::isblank()).

Definition at line 92 of file ParsingToolkit.h.


The documentation for this struct was generated from the following files: