on value semantics
2013 Keynote: Optimizing the Emergent Structures of C++ -- Chandler Carruth
42:45:
You don't need output parameters, we have value semantics in C++ [...]. Anytime you see someone arguing that "nonono I'm not going to return by value because copy would cost too much", someone working on an optimizer says they're wrong. All right? I have never yet seen a piece of code where that argument was correct. [...] People don't realize how important value semantics are to the optimizer because it completely clarifies the aliasing scenarios.
import re
for i in range(1000000):
re.split(' +', 'a b c')
The previous python code is twice as fast than the following c++ code (which uses libc++'s regex implementation):
#include <regex>
#include <vector>
#include <string>
using namespace std;
vector<string> split(const string &s, const regex &r) {
return {
sregex_token_iterator(begin(s), end(s), r, -1),
sregex_token_iterator()
};
}
int main() {
const regex r(" +");
for(int i = 0; i < 1000000; ++i)
split("a b c", r);
return 0;
}
It's easy to spot the performance problem here, the c++ version allocates strings to fill a vector which itself is allocated/destroyed in each iteration of the loop, while the python version should just be doing the right thing: building string slices over the single original string.
The trivial naive way to solve the issue with the c++ code is using parameters by reference for output:
#include <regex>
#include <vector>
#include <string>
using namespace std;
void split(const string &s, const regex &r, vector<string> &v) {
auto rit = cregex_token_iterator(s.data(), s.data() + s.length(), r, -1);
auto rend = cregex_token_iterator();
v.clear();
while(rit != rend) {
v.push_back(*rit);
++rit;
}
}
int main() {
const regex r(" +");
vector<string> v;
for(int i = 0; i < 1000000; ++i)
split("a b c", r, v);
return 0;
}
This avoids all the abusive string allocations/destructions, the vector grows
its memory just once. Still, we could have string slices over the original
string instead of allocating memory for the token substrings and there's now
this awkward interface, it's not as natural as the python version of split
which uses a return value as output.
How do we solve this?
Here's a sample solution that's built upon three key components, a string_view
implementation, ranges and an iterator adaptor:
#include <string>
#include <string_view>
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/iterator/transform_iterator.hpp>
using namespace std;
using namespace std::experimental;
using namespace boost;
string_view stringfier(const cregex_token_iterator::value_type &match) {
return {match.first, static_cast<size_t>(match.length())};
}
using string_view_iterator =
transform_iterator<decltype(&stringfier), cregex_token_iterator>;
iterator_range<string_view_iterator> split(string_view s, const regex &r) {
return {
string_view_iterator(
cregex_token_iterator(s.begin(), s.end(), r, -1),
stringfier
),
string_view_iterator()
};
}
int main() {
const regex r(" +");
for (int i = 0; i < 1000000; ++i) {
split("a b c", r);
}
}
A transform_iterator
is an adaptation of another base iterator which changes
of behavior only when being dereferenced. When dereferenced, what is obtained is
the result of a call to a given function over the original dereferentiation. Its
use here is to adapt a cregex_token_iterator
to get string_view
s at
dereferentiation, instead of match objects or full-fledged strings.
With our special string_view_iterator
operating through the stringfier
function we can use it to initialize an iterator_range
which builds a range
over a pair of iterators. We end up with a range of string_view
s which are
lazy matches, since the matches will actually happen only when iterating over
the range. This version is much faster (even if we filled a vector
using the
returning range) and doesn't lose for the output parameter version in
performance.
In context to Chandler's talk, this solution shouldn't make a difference in helping the optimizer clarify the aliasing scenarios, but it obtains perfomance on par or better than output through parameter by reference.
The final result is faster than python as expected. I'm not sure I can call this value semantics because it's all references over the place, the original inefficient solution returning a vector of strings seems more into the semantic of values.
Notice also that Boost.Regex is currently twice as fast as libc++ regex implementation.
More information in the following references, particularly the std::split
proposal which is similar to the solution presented here.