The Library
Help/Info
Current Release
Sourceforge









Get dlib C++ Library at SourceForge.net. Fast, secure and Free Open Source software downloads


Last Modified:
Nov 17, 2012

Parsing



This page documents the objects and functions that in some way deal with parsing or otherwise manipulating text. Everything here follows the same conventions as the rest of the library.


Objects
Global Functions
[top]

base64



This object allows you to encode and decode data to and from the Base64 Content-Transfer-Encoding defined in section 6.8 of rfc2045.

#include <dlib/base64.h>
Detailed Documentation
C++ Example Programs: file_to_code_ex.cpp

[top]

basic_utf8_ifstream



This object represents an input file stream much like the normal std::ifstream except that it knows how to read UTF-8 data. So when you read characters out of this stream it will automatically convert them from the UTF-8 multibyte encoding into a fixed width wide character encoding.

There are also two typedefs of this object. The first is utf8_wifstream which is a typedef for wchar_t as the wide character to read into. The second is utf8_uifstream which uses unichar instead of wchar_t.



#include <dlib/unicode.h>
Detailed Documentation

[top]

cast_to_string



cast_to_string is a templated function which makes it easy to convert arbitrary objects to std::string strings. The types supported are any types that can be written to std::ostream via operator<<.

#include <dlib/string.h>
Detailed Documentation

[top]

cast_to_wstring



cast_to_string is a templated function which makes it easy to convert arbitrary objects to std::wstring strings. The types supported are any types that can be written to std::wostream via operator<<.

#include <dlib/string.h>
Detailed Documentation

[top]

cmd_line_parser



This object allows you to easily parse a command line. Note that the documentation for the cmd_line_parser_option (the object returned by the parser's .option() function) is in a separate file.

Note also that there are standard typedefs for the ASCII and wide character versions of the cmd_line_parser template. These are the command_line_parser and wcommand_line_parser types respectively.



#include <dlib/cmd_line_parser.h>
Detailed Documentation
C++ Example Programs: compress_stream_ex.cpp, train_object_detector.cpp


Extensions to cmd_line_parser

get_option

This extension provides a convenience function for accessing the options to a command line argument or a config_reader. It is automatically #included when using the command line parser or config reader.

Detailed Documentation
[top]

config_reader



This object represents something which is intended to be used to read text configuration files.

#include <dlib/config_reader.h>
Detailed Documentation
C++ Example Programs: config_reader_ex.cpp


Extensions to config_reader

config_reader_thread_safe

This object extends a normal config_reader by simply wrapping all its member functions inside mutex locks to make it safe to use in a threaded program.

Detailed Documentation
[top]

convert_utf8_to_utf32



This is a global function that can convert UTF-8 strings into strings of 32bit unichar characters.

#include <dlib/unicode.h>
Detailed Documentation

[top]

cpp_pretty_printer



This object represents an HTML pretty printer for C++ source code.

#include <dlib/cpp_pretty_printer.h>
Detailed Documentation

Implementations:
cpp_pretty_printer_kernel_1:
This is implemented by using the cpp_tokenizer object. This is the pretty printer I use on all the source in this library. It applies a color scheme, turns include directives such as #include "file.h" into links to file.h.html and puts HTML anchor points on function and class declarations. It also looks for comments starting with /*!A and puts an anchor before the comment using the word following the A as the name of the anchor.
kernel_1a
is a typedef for cpp_pretty_printer_kernel_1
cpp_pretty_printer_kernel_2:
This is implemented by using the cpp_tokenizer object. It applies a black and white color scheme suitable for printing on a black and white printer. It also places the document title prominently at the top of the pretty printed source file.
kernel_2a
is a typedef for cpp_pretty_printer_kernel_2
[top]

cpp_tokenizer



This object represents a simple tokenizer for C++ source code.

#include <dlib/cpp_tokenizer.h>
Detailed Documentation

Implementations:
cpp_tokenizer_kernel_1:
This is implemented by using the tokenizer object in the obvious way.
kernel_1a
is a typedef for cpp_tokenizer_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
[top]

is_combining_char



This is a global function that can tell you if a character is a Unicode combining character or not.

#include <dlib/unicode.h>
Detailed Documentation

[top]

left_substr



This is a function to return the part of a string to the left of a user supplied delimiter.

#include <dlib/string.h>
Detailed Documentation

[top]

lpad



This is a function to pad whitespace (or user specified characters) onto the left most end of a string.

#include <dlib/string.h>
Detailed Documentation

[top]

ltrim



This is a function to remove the whitespace (or user specified characters) from the left most end of a string.

#include <dlib/string.h>
Detailed Documentation

[top]

narrow



This is a function for converting a string of type std::string or std::wstring to a plain std::string.

#include <dlib/string.h>
Detailed Documentation

[top]

pad



This is a function to pad whitespace (or user specified characters) onto the ends of a string.

#include <dlib/string.h>
Detailed Documentation

[top]

pad_int_with_zeros



Converts an integer into a string and pads it with leading zeros.

#include <dlib/string.h>
Detailed Documentation

[top]

right_substr



This is a function to return the part of a string to the right of a user supplied delimiter.

#include <dlib/string.h>
Detailed Documentation

[top]

rpad



This is a function to pad whitespace (or user specified characters) onto the right most end of a string.

#include <dlib/string.h>
Detailed Documentation

[top]

rtrim



This is a function to remove the whitespace (or user specified characters) from the right most end of a string.

#include <dlib/string.h>
Detailed Documentation

[top]

split



Breaks a string into a sequence of substrings delimited by a user specified set of characters.

#include <dlib/string.h>
Detailed Documentation

[top]

strings_equal_ignore_case



This is a pair of functions to do a case insensitive comparison between strings.

#include <dlib/string.h>
Detailed Documentation

[top]

string_assign



string_assign is an object which makes it easy to convert strings to other types. The types supported are any types that can be read by the basic_istream operator>>. It also supports casting between wstring, string, and ustring objects. Since string_assign is a simple stateless object there is a global instance of it called dlib::sa.

#include <dlib/string.h>
Detailed Documentation
C++ Example Programs: config_reader_ex.cpp

[top]

string_cast



string_cast is a templated function which makes it easy to convert strings to other types. The types supported are any types that can be read by the basic_istream operator>>. It also supports casting between wstring, string, and ustring objects.

#include <dlib/string.h>
Detailed Documentation

[top]

tokenizer



This object represents a simple tokenizer for textual data.

#include <dlib/tokenizer.h>
Detailed Documentation

Implementations:
tokenizer_kernel_1:
This is implemented in the obvious way.
kernel_1a
is a typedef for tokenizer_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
[top]

tolower



This is a function to convert a string to all lowercase.

#include <dlib/string.h>
Detailed Documentation

[top]

toupper



This is a function to convert a string to all uppercase.

#include <dlib/string.h>
Detailed Documentation

[top]

trim



This is a function to remove the whitespace (or user specified characters) from the ends of a string.

#include <dlib/string.h>
Detailed Documentation

[top]

unichar



This is a typedef for an unsigned 32bit integer which we use to store Unicode values.

#include <dlib/unicode.h>
Detailed Documentation

[top]

ustring



This is a typedef for a std::basic_string<unichar>. That is, it is a typedef for a string object that stores unichar Unicode characters.

#include <dlib/unicode.h>
Detailed Documentation

[top]

wrap_string



wrap_string is a function that takes a string and breaks it into a number of lines of a given length. You can use this to make a string fit nicely into a command prompt window for example.

#include <dlib/string.h>
Detailed Documentation

[top]

xml_parser



This object represents a simple SAX style event driven XML parser. It takes its input from an input stream object and sends events to all registered document_handler and error_handler objects.

The xml_parser object also uses the interface classes document_handler and error_handler. Subclasses of these classes are passed to the xml_parser which generates events while it's parsing and sends them to the appropriate handler.

#include <dlib/xml_parser.h>
Detailed Documentation
C++ Example Programs: xml_parser_ex.cpp