2025-04-29 02:51:27 -04:00

231 lines
5.5 KiB
Markdown

<!--
Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
SPDX-License-Identifier: curl
-->
# String parsing with `strparse`
The functions take input via a pointer to a pointer, which allows the
functions to advance the pointer on success which then by extension allows
"chaining" of functions like this example that gets a word, a space and then a
second word:
~~~c
if(Curl_str_word(&line, &word1, MAX) ||
Curl_str_singlespace(&line) ||
Curl_str_word(&line, &word2, MAX))
fprintf(stderr, "ERROR\n");
~~~
The input pointer **must** point to a null terminated buffer area or these
functions risk continuing "off the edge".
## Strings
The functions that return string information does so by populating a
`struct Curl_str`:
~~~c
struct Curl_str {
char *str;
size_t len;
};
~~~
Access the struct fields with `Curl_str()` for the pointer and `Curl_strlen()`
for the length rather than using the struct fields directly.
## `Curl_str_init`
~~~c
void Curl_str_init(struct Curl_str *out)
~~~
This initiates a string struct. The parser functions that store info in
strings always init the string themselves, so this stand-alone use is often
not necessary.
## `Curl_str_assign`
~~~c
void Curl_str_assign(struct Curl_str *out, const char *str, size_t len)
~~~
Set a pointer and associated length in the string struct.
## `Curl_str_word`
~~~c
int Curl_str_word(char **linep, struct Curl_str *out, const size_t max);
~~~
Get a sequence of bytes until the first space or the end of the string. Return
non-zero on error. There is no way to include a space in the word, no sort of
escaping. The word must be at least one byte, otherwise it is considered an
error.
`max` is the longest accepted word, or it returns error.
On a successful return, `linep` is updated to point to the byte immediately
following the parsed word.
## `Curl_str_until`
~~~c
int Curl_str_until(char **linep, struct Curl_str *out, const size_t max,
char delim);
~~~
Like `Curl_str_word` but instead of parsing to space, it parses to a given
custom delimiter non-zero byte `delim`.
`max` is the longest accepted word, or it returns error.
The parsed word must be at least one byte, otherwise it is considered an
error.
## `Curl_str_untilnl`
~~~c
int Curl_str_untilnl(char **linep, struct Curl_str *out, const size_t max);
~~~
Like `Curl_str_untilnl` but instead parses until it finds a "newline byte".
That means either a CR (ASCII 13) or an LF (ASCII 10) octet.
`max` is the longest accepted word, or it returns error.
The parsed word must be at least one byte, otherwise it is considered an
error.
## `Curl_str_cspn`
~~~c
int Curl_str_cspn(const char **linep, struct Curl_str *out, const char *cspn);
~~~
Get a sequence of characters until one of the bytes in the `cspn` string
matches. Similar to the `strcspn` function.
## `Curl_str_quotedword`
~~~c
int Curl_str_quotedword(char **linep, struct Curl_str *out, const size_t max);
~~~
Get a "quoted" word. This means everything that is provided within a leading
and an ending double quote character. No escaping possible.
`max` is the longest accepted word, or it returns error.
The parsed word must be at least one byte, otherwise it is considered an
error.
## `Curl_str_single`
~~~c
int Curl_str_single(char **linep, char byte);
~~~
Advance over a single character provided in `byte`. Return non-zero on error.
## `Curl_str_singlespace`
~~~c
int Curl_str_singlespace(char **linep);
~~~
Advance over a single ASCII space. Return non-zero on error.
## `Curl_str_passblanks`
~~~c
void Curl_str_passblanks(char **linep);
~~~
Advance over all spaces and tabs.
## `Curl_str_trimblanks`
~~~c
void Curl_str_trimblanks(struct Curl_str *out);
~~~
Trim off blanks (spaces and tabs) from the start and the end of the given
string.
## `Curl_str_number`
~~~c
int Curl_str_number(char **linep, curl_size_t *nump, size_t max);
~~~
Get an unsigned decimal number not larger than `max`. Leading zeroes are just
swallowed. Return non-zero on error. Returns error if there was not a single
digit.
## `Curl_str_numblanks`
~~~c
int Curl_str_numblanks(char **linep, curl_size_t *nump);
~~~
Get an unsigned 63-bit decimal number. Leading blanks and zeroes are skipped.
Returns non-zero on error. Returns error if there was not a single digit.
## `Curl_str_hex`
~~~c
int Curl_str_hex(char **linep, curl_size_t *nump, size_t max);
~~~
Get an unsigned hexadecimal number not larger than `max`. Leading zeroes are
just swallowed. Return non-zero on error. Returns error if there was not a
single digit. Does *not* handled `0x` prefix.
## `Curl_str_octal`
~~~c
int Curl_str_octal(char **linep, curl_size_t *nump, size_t max);
~~~
Get an unsigned octal number not larger than `max`. Leading zeroes are just
swallowed. Return non-zero on error. Returns error if there was not a single
digit.
## `Curl_str_newline`
~~~c
int Curl_str_newline(char **linep);
~~~
Check for a single CR or LF. Return non-zero on error */
## `Curl_str_casecompare`
~~~c
int Curl_str_casecompare(struct Curl_str *str, const char *check);
~~~
Returns true if the provided string in the `str` argument matches the `check`
string case insensitively.
## `Curl_str_cmp`
~~~c
int Curl_str_cmp(struct Curl_str *str, const char *check);
~~~
Returns true if the provided string in the `str` argument matches the `check`
string case sensitively. This is *not* the same return code as `strcmp`.
## `Curl_str_nudge`
~~~c
int Curl_str_nudge(struct Curl_str *str, size_t num);
~~~
Removes `num` bytes from the beginning (left) of the string kept in `str`. If
`num` is larger than the string, it instead returns an error.