Dlib is principally a C++ library, however, you can use a number of its tools from python applications. This page documents the python API for working with these dlib tools. If you haven’t done so already, you should probably look at the python example programs first before consulting this reference. These example programs are little mini-tutorials for using dlib from python. They are listed on the left of the main dlib web page.
This object represents a 1D array of floating point numbers. Moreover, it binds directly to the C++ type std::vector<double>.
cost.nr() == cost.nc() (i.e. the input must be a square matrix)
Interprets cost as a cost assignment matrix. That is, cost[i][j] represents the cost of assigning i to j.
Interprets assignment as a particular set of assignments. That is, i is assigned to assignment[i].
returns the cost of the given assignment. That is, returns a number which is:
sum over i: cost[i][assignment[i]]
This function performs a canonical correlation analysis between the vectors in L and R. That is, it finds two transformation matrices, Ltrans and Rtrans, such that row vectors in the transformed matrices L*Ltrans and R*Rtrans are as correlated as possible (note that in this notation we interpret L as a matrix with the input vectors in its rows). Note also that this function tries to find transformations which produce num_correlations dimensional output vectors.
Note that you can easily apply the transformation to a vector using apply_cca_transform(). So for example, like this:
- apply_cca_transform(Ltrans, some_sparse_vector)
returns a structure containing the Ltrans and Rtrans transformation matrices as well as the estimated correlations between elements of the transformed vectors.
This function assumes the data vectors in L and R have already been centered (i.e. we assume the vectors have zero means). However, in many cases it is fine to use uncentered data with cca(). But if it is important for your problem then you should center your data before passing it to cca().
This function works with reduced rank approximations of the L and R matrices. This makes it fast when working with large matrices. In particular, we use the dlib::svd_fast() routine to find reduced rank representations of the input matrices by calling it as follows: svd_fast(L, U,D,V, num_correlations+extra_rank, q) and similarly for R. This means that you can use the extra_rank and q arguments to cca() to influence the accuracy of the reduced rank approximation. However, the default values should work fine for most problems.
The dimensions of the output vectors produced by L*#Ltrans or R*#Rtrans are ordered such that the dimensions with the highest correlations come first. That is, after applying the transforms produced by cca() to a set of vectors you will find that dimension 0 has the highest correlation, then dimension 1 has the next highest, and so on. This also means that the list of estimated correlations returned from cca() will always be listed in decreasing order.
This function performs the ridge regression version of Canonical Correlation Analysis when regularization is set to a value > 0. In particular, larger values indicate the solution should be more heavily regularized. This can be useful when the dimensionality of the data is larger than the number of samples.
A good discussion of CCA can be found in the paper “Canonical Correlation Analysis” by David Weenink. In particular, this function is implemented using equations 29 and 30 from his paper. We also use the idea of doing CCA on a reduced rank approximation of L and R as suggested by Paramveer S. Dhillon in his paper “Two Step CCA: A new spectral method for estimating vector models of words”.
This is a tool for tracking moving objects in a video stream. You give it the bounding box of an object in the first frame and it attempts to track the object in the box from frame to frame. This tool is an implementation of the method described in the following paper:
Danelljan, Martin, et al. ‘Accurate scale estimation for robust visual tracking.’ Proceedings of the British Machine Vision Conference BMVC. 2014.
returns the predicted position of the object under track.
- requires
- image is a numpy ndarray containing either an 8bit grayscale or RGB image.
- bounding_box.is_empty() == false
- ensures
- This object will start tracking the thing inside the bounding box in the given image. That is, if you call update() with subsequent video frames then it will try to keep track of the position of the object inside bounding_box.
- #get_position() == bounding_box
- requires
- image is a numpy ndarray containing either an 8bit grayscale or RGB image.
- get_position().is_empty() == false (i.e. you must have started tracking by calling start_track())
- ensures
- performs: return update(img, get_position())
cross_validate_ranking_trainer( (svm_rank_trainer_sparse)trainer, (sparse_ranking_pairs)samples, (int)folds) -> _ranking_test
cross_validate_sequence_segmenter( (sparse_vectorss)samples, (rangess)segments, (int)folds [, (segmenter_params)params=<BIO,highFeats,signed,win=5,threads=4,eps=0.1,cache=40,non-verbose,C=100>]) -> segmenter_test
cross_validate_trainer( (svm_c_trainer_sparse_radial_basis)trainer, (sparse_vectors)x, (array)y, (int)folds) -> _binary_test
cross_validate_trainer( (svm_c_trainer_histogram_intersection)trainer, (vectors)x, (array)y, (int)folds) -> _binary_test
cross_validate_trainer( (svm_c_trainer_sparse_histogram_intersection)trainer, (sparse_vectors)x, (array)y, (int)folds) -> _binary_test
cross_validate_trainer( (svm_c_trainer_linear)trainer, (vectors)x, (array)y, (int)folds) -> _binary_test
cross_validate_trainer( (svm_c_trainer_sparse_linear)trainer, (sparse_vectors)x, (array)y, (int)folds) -> _binary_test
cross_validate_trainer_threaded( (svm_c_trainer_sparse_radial_basis)trainer, (sparse_vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
cross_validate_trainer_threaded( (svm_c_trainer_histogram_intersection)trainer, (vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
cross_validate_trainer_threaded( (svm_c_trainer_sparse_histogram_intersection)trainer, (sparse_vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
cross_validate_trainer_threaded( (svm_c_trainer_linear)trainer, (vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
cross_validate_trainer_threaded( (svm_c_trainer_sparse_linear)trainer, (sparse_vectors)x, (array)y, (int)folds, (int)num_threads) -> _binary_test
Compute the dot product between two dense column vectors.
This object represents a rectangular area of an image with floating point coordinates.
contains( (drectangle)arg1, (int)x, (int)y) -> bool
contains( (drectangle)arg1, (drectangle)rectangle) -> bool
This object represents a sliding window histogram-of-oriented-gradients based object detector.
Save a simple_object_detector to the provided path.
Returns found candidate objects requires
- image == an image object which is a numpy ndarray
- len(kvals) == 3
- kvals should be a tuple that specifies the range of k values to use. In particular, it should take the form (start, end, num) where num > 0.
This function takes an input image and generates a set of candidate rectangles which are expected to bound any objects in the image. It does this by running a version of the segment_image() routine on the image and then reports rectangles containing each of the segments as well as rectangles containing unions of adjacent segments. The basic idea is described in the paper:
Segmentation as Selective Search for Object Recognition by Koen E. A. van de Sande, et al.
Note that this function deviates from what is described in the paper slightly. See the code for details.
The basic segmentation is performed kvals[2] times, each time with the k parameter (see segment_image() and the Felzenszwalb paper for details on k) set to a different value from the range of numbers linearly spaced between kvals[0] to kvals[1].
When doing the basic segmentations prior to any box merging, we discard all rectangles that have an area < min_size. Therefore, all outputs and subsequent merged rectangles are built out of rectangles that contain at least min_size pixels. Note that setting min_size to a smaller value than you might otherwise be interested in using can be useful since it allows a larger number of possible merged boxes to be created.
There are max_merging_iterations rounds of neighboring blob merging. Therefore, this parameter has some effect on the number of output rectangles you get, with larger values of the parameter giving more output rectangles.
This function appends the output rectangles into #rects. This means that any rectangles in rects before this function was called will still be in there after it terminates. Note further that #rects will not contain any duplicate rectangles. That is, for all valid i and j where i != j it will be true that:
- #rects[i] != rects[j]
This object represents the location of an object in an image along with the positions of each of its constituent parts.
The number of parts of the object.
A single part of the object as a dlib point.
A vector of dlib points representing all of the parts.
The bounding box of the parts.
Returns the default face detector
Asks the user to hit enter to continue and pauses until they do so.
This is a GUI window capable of showing images on the screen.