2013 Keynote: Optimizing the Emergent Structures of C++ -- Chandler Carruth

42:45:

You don't need output parameters, we have value semantics in C++ [...]. Anytime you see someone arguing that "nonono I'm not going to return by value because copy would cost too much", someone working on an optimizer says they're wrong. All right? I have never yet seen a piece of code where that argument was correct. [...] People don't realize how important value semantics are to the optimizer because it completely clarifies the aliasing scenarios.

import re
for i in range(1000000):
    re.split(' +', 'a b c')

The previous python code is twice as fast than the following c++ code (which uses libc++'s regex implementation):

#include <regex>
#include <vector>
#include <string>

using namespace std;

vector<string> split(const string &s, const regex &r) {
    return {
        sregex_token_iterator(begin(s), end(s), r, -1),
        sregex_token_iterator()
    };
}

int main() {
    const regex r(" +");
    for(int i = 0; i < 1000000; ++i)
       split("a b c", r);
    return 0;
}

It's easy to spot the performance problem here, the c++ version allocates strings to fill a vector which itself is allocated/destroyed in each iteration of the loop, while the python version should just be doing the right thing: building string slices over the single original string.

The trivial naive way to solve the issue with the c++ code is using parameters by reference for output:

#include <regex>
#include <vector>
#include <string>

using namespace std;

void split(const string &s, const regex &r, vector<string> &v) {
    auto rit = cregex_token_iterator(s.data(), s.data() + s.length(), r, -1);
    auto rend = cregex_token_iterator();
    v.clear();
    while(rit != rend) {
        v.push_back(*rit);
        ++rit;
    }
}

int main() {
    const regex r(" +");
    vector<string> v;
    for(int i = 0; i < 1000000; ++i)
       split("a b c", r, v);
    return 0;
}

This avoids all the abusive string allocations/destructions, the vector grows its memory just once. Still, we could have string slices over the original string instead of allocating memory for the token substrings and there's now this awkward interface, it's not as natural as the python version of split which uses a return value as output.

How do we solve this?

Here's a sample solution that's built upon three key components, a string_view implementation, ranges and an iterator adaptor:

#include <string>
#include <string_view>
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/iterator/transform_iterator.hpp>

using namespace std;
using namespace std::experimental;
using namespace boost;

string_view stringfier(const cregex_token_iterator::value_type &match) {
    return {match.first, static_cast<size_t>(match.length())};
}

using string_view_iterator =
    transform_iterator<decltype(&stringfier), cregex_token_iterator>;

iterator_range<string_view_iterator> split(string_view s, const regex &r) {
    return {
        string_view_iterator(
            cregex_token_iterator(s.begin(), s.end(), r, -1),
            stringfier
        ),
        string_view_iterator()
    };
}

int main() {
    const regex r(" +");
    for (int i = 0; i < 1000000; ++i) {
        split("a b c", r);
    }
}

A transform_iterator is an adaptation of another base iterator which changes of behavior only when being dereferenced. When dereferenced, what is obtained is the result of a call to a given function over the original dereferentiation. Its use here is to adapt a cregex_token_iterator to get string_views at dereferentiation, instead of match objects or full-fledged strings. With our special string_view_iterator operating through the stringfier function we can use it to initialize an iterator_range which builds a range over a pair of iterators. We end up with a range of string_views which are lazy matches, since the matches will actually happen only when iterating over the range. This version is much faster (even if we filled a vector using the returning range) and doesn't lose for the output parameter version in performance.

In context to Chandler's talk, this solution shouldn't make a difference in helping the optimizer clarify the aliasing scenarios, but it obtains perfomance on par or better than output through parameter by reference.

The final result is faster than python as expected. I'm not sure I can call this value semantics because it's all references over the place, the original inefficient solution returning a vector of strings seems more into the semantic of values.

Notice also that Boost.Regex is currently twice as fast as libc++ regex implementation.

More information in the following references, particularly the std::split proposal which is similar to the solution presented here.

References