[Note] Effective STL

Only you know enough about the software you’re writing, the environment in which it will run, and the context in which it’s being created to determine whether it’s reasonable to violate the guidelines I present. Most of the time, it won’t be, and the discussions that accompany each Item explain why. In a few cases, it will. Slavish devotion to the guidelines isn’t appropriate, but neither is cavalier disregard. Before venturing off on your own, you should make sure you have a good reason.

Item 1. Choose your containers with care.

  • An overview of containers:
    • Sequence containers: vector, string, deque, and list.
    • Associative containers: set, multiset, map and multimap.
    • Nonstandard sequence containers: slist and rope.
    • Nonstandard associative containers: hash_set, hash_multiset, hash_map, and hash_multimap.
    • vector<char> as a replacement for string.
    • vector as a replacement for the standard associative containers.
    • Standard non-STL containers: arrays, bitset, valarray, stack, queue, and priority_queue.
  • A way of categorizing the STL containers:
    • Contiguous-memory containers (also known as array-based containers): vector, string, deque, and nonstandard rope.
    • Node-based containers: list, slist, all the standard associative containers and the nonstandard hashed containers.
  • Some of the questions most relevant when choosing among containers:
    • Do you need to be able to insert a new element at an arbitrary position in the container? If so, you need a sequence container: associative containers won't do.
    • Do you care how elements are ordered in the container? If not, a hashed container becomes a viable choice. Otherwise, you'll want to avoid hashed containers.
    • Must the container be part of standard C++? If so, that eliminates hashed containers, slist, and rope.
    • What category of iterators do you require? If they must be random access iterators, you're technically limited to vector, deque, and string, but you'd probably want to consider rope, too. If bidirectional iterators are required, you must avoid slist as well as one common implementation of the hashed containers.
    • Is it important to avoid movement of existing container elements when insertions or erasures take place? If so, you'll need to stay away from contiguous-memory containers.
    • Does the data in the container need to be layout-compatible with C? If so, you're limited to vectors.
    • Is lookup speed a critical consideration? If so, you'll want to look at hashed containers, sorted vectors, and the standard associative containers — probably in that order.
    • Do you mind if the underlying container uses reference counting? If so, you'll want to steer clear of string, because many string implementations are reference-counted. You'll need to avoid rope, too, because the definitive rope implementation is based on reference counting. You have to represent your strings somehow, of course, so you'll want to consider vector<char>.
    • Do you need transactional semantics for insertions and erasures? That is, do you require the ability to reliably roll back insertions and erasures? If so, you'll want to use a node-based container. If you need transactional semantics for multiple-element insertions (e.g., the range form), you'll want to choose list, because list is the only standard container that offers transactional semantics for multiple-element insertions. Transactional semantics are particularly important for programmers interested in writing exception-safe code.
    • Do you need to minimize iterator, pointer, and reference invalidation? If so, you'll want to use node-based containers, because insertions and erasures on such containers never invalidate iterators, pointers, or references (unless they point to an element you are erasing). In general, insertions or erasures on contiguous-memory containers may invalidate all iterators, pointers, and references into the container.
    • Would it be helpful to have a sequence container with random access iterators where pointers and references to the data are not invalidated as long as nothing is erased and insertions take place only at the ends of the container? This is a very special case, but if it's your case, deque is the container of your dreams. (Interestingly, deque's iterators may be invalidated when insertions are made only at the ends of the container, deque is the only standard STL container whose iterators may be invalidated without also invalidating its pointers and references.)

Item 2. Beware the illusion of container-independent code.

  • The different containers are different, and they have strengths and weaknesses that vary in significant ways. They're not designed to be interchangeable, and there's little you can do to paper that over.
  • When you change container types:
    • You need to examine all the code using the container to see what needs to be changed in light of the new container's performance characteristics and rules for invalidation of iterators, pointers, and references.
    • If you switch from a vector to something else, you'll also have to make sure you're no longer relying on vector's C-compatible memory layout.
    • If you switch to a vector, you'll have to ensure that you're not using it to store bools.
  • Given the inevitability of having to change container types from time to time, you can facilitate such changes in the usual manner: by encapsulating. One of the easiest ways to do this is through the liberal use of typedefs for container and iterator types.
  • To limit the code that may require modification if you replace one container type with another, hide the container in a class, and limit the amount of container-specific information visible through the class interface.

Item 3. Make copying cheap and correct for objects in containers.

  • An easy way to make copying efficient, correct, and immune to the slicing problem is to create containers of pointers instead of containers of objects.
  • Compared to arrays, STL containers are much more civilized. They create (by copying) only as many objects as you ask for, they do it only when you direct them to, and they use a default constructor only when you say they should.

Item 4. Call empty instead of checking size() against zero.

  • You should prefer the construct using empty, and the reason is simple: empty is a constant-time operation for all standard containers, but for some list implementations, size takes linear time.
  • List can't offer a constant-time size due to its unique splicing functions. If splice is made constant-time, then size becomes a linear-time operation. One or the other can be a constant-time operation, but not both.

Item 5. Prefer range member functions to their single-element counterparts.

  • Whenever you have to completely replace the contents of a container, you should think of assignment. If you're just copying one container to another of the same type, operator= is the assignment function of choice, but assign is available for the times when you want to give a container a completely new set of values, but operator= won't do what you want.
  • A range member function is a member function that, like STL algorithms, uses two iterator parameters to specify a range of elements over which something should be done.
  • Reasons to prefer range member functions to their single-element counterparts:
    • It's generally less work to write the code using the range member functions.
    • Range member functions tend to lead to code that is clearer and more straightforward.
    • When dealing with the standard sequence containers, application of single-element member functions makes more demands on memory allocators, copies objects more frequently, and/or performs redundant operations compared to range member functions that achieve the same end.
  • Almost all uses of copy where the destination range is specified using an insert iterator should be replaced with calls to range member functions.
  • Member functions that support ranges:
    • Range construction.
    • Range insertion.
    • Range erasure.
    • Range assignment.

Item 6. Be alert for C++'s most vexing parse.

  • Suppose you have a file of ints and you'd like to copy those ints into a list:

    1
    2
    ifstream dataFile("ints.dat");
    list<int> data(istream_iterator<int>(dataFile), istream_iterator<int>());

  • This declares a function, data, whose return type is list<int>. The function data takes two parameters:

    • The first parameter is named dataFile. It's type is istream_iterator<int>. The parentheses around dataFile are superfluous and are ignored.
    • The second parameter has no name. Its type is pointer to function taking nothing and returning an istream_iterator<int>.
  • By adding a pair of parentheses, we force compilers to see things our way:

    1
    2
    // note new parens around first argument to list's constructor
    list<int> data((istream_iterator<int>(dataFile)), istream_iterator<int>());

  • A better solution is to step back from the trendy use of anonymous istream_iterator objects in data's declaration and simply give those iterators names:

    1
    2
    3
    4
    ifstream dataFile("ints.dat");
    istream_iterator<int> dataBegin(dataFile);
    istream_iterator<int> dataEnd;
    list<int> data(dataBegin, dataEnd);

Item 7. When using containers of newed pointers, remember to delete the pointers before the container is destroyed.

  • To avoid resource leaks when you have containers of pointers that should be deleted, you must either replace the pointers with smart reference-counting pointer objects (such as Boost's shared_ptr) or you must manually delete each pointer in the container before the container is destroyed.

Item 8. Never create containers of auto_ptrs.

  • Containers of auto_ptr (COAPs) aren't portable. The Standard for C++ forbids them, and better STL platforms already enforce this.
  • When you copy an auto_ptr, ownership of the object pointed to by the auto_ptr is transferred to the copying auto_ptr, and the copied auto_ptr is set to NULL.
  • One or more of the auto_ptrs in the container may have been set to NULL during quicksort.

Item 9. Choose carefully among erasing options.

  • To eliminate all objects in a container that have a particular value:
    • If the container is a vector, string, or deque, use the erase-remove idiom.
    • If the container is a list, use list::remove.
    • If the container is a standard associative container, use its erase member function.
  • To eliminate all objects in a container that satisfy a particular predicate:
    • If the container is a vector, string, or deque, use the erase-remove_if idiom.
    • If the container is a list, use list::remove_if.
    • If the container is a standard associative container, use remove_copy_if and swap, or write a loop to walk the container elements, being sure to postincrement your iterator when you pass it to erase.
  • To do something inside the loop (in addition to erasing objects):
    • If the container is a standard sequence container, write a loop to walk the container elements, being sure to update your iterator with erase's return value each time you call it.
    • If the container is a standard associative container, write a loop to walk the container elements, being sure to postincrement your iterator when you pass it to erase.

Item 10. Be aware of allocator conventions and restrictions.

  • Make your allocator a template, with the template parameter T representing the type of objects for which you are allocating memory.
  • Provide the typedefs pointer and reference, but always have pointer be T* and reference be T&.
  • Never give your allocators per-object state. In general, allocators should have no nonstatic data members.
  • Remember that an allocator's allocate member functions are passed the number of objects for which memory is required, not the number of bytes needed. Also remember that these functions return T* pointers via the pointer typedef, even though no T objects have yet been constructed.
  • Be sure to provide the nested rebind template on which standard containers depend.

Item 11. Understand the legitimate uses of custom allocators.

  • You've benchmarked, profiled, and experimented your way to the conclusion that the default STL memory manager (i.e., allocator<T>) is too slow, wastes memory, or suffers excessive fragmentation for your STL needs, and you're certain you can do a better job yourself.
  • You discover that allocator<T> takes precautions to be thread-safe, but you're interested only in single-threaded execution and you don't want to pay for the synchronization overhead you don't need.
  • You know that objects in certain containers are typically used together, so you'd like to place them near one another in a special heap to maximize locality of reference.
  • You'd like to set up a unique heap that corresponds to shared memory, then put one or more containers in that memory so they can be shared by other processes.

Item 12. Have realistic expectations about the thread safety of STL containers.

  • The gold standard in support for multithreading in STL containers:

    • Multiple readers are safe. Multiple threads may simultaneously read the contents of a single container, and this will work correctly. Naturally, there must not be any writers acting on the container during the reads.
    • Multiple writers to different containers are safe. Multiple threads may simultaneously write to different containers.

    This is what you can hope for, not what you can expect. Some implementations offer these guarantees, but some do not.

  • You can't hope for the library to eliminate the need for manual concurrency control, and you can't rely on any thread support at all.

Item 13. Prefer vector and string to dynamically allocated arrays.

  • Any time you find yourself getting ready to dynamically allocate an array (i.e.. plotting to write "new T[...]"), you should consider using a vector or a string instead.
  • Only one legitimate cause for concern in replacing dynamically allocated arrays with vectors or strings, and it applies only to strings: many string implementations employ reference counting behind the scenes. If you're using reference-counted strings in a multithreaded environment, then, it makes sense to keep an eye out for performance problems arising from their support for thread safety.
  • If so, you have at least three reasonable choices:
    • Check to see if your library implementation is one that makes it possible to disable reference counting.
    • Find or develop an alternative string implementation (or partial implementation) that doesn't use reference counting.
    • Consider using a vector<char> instead of a string.

Item 14. Use reserve to avoid unnecessary reallocations.

  • Four interrelated member functions that are sometimes confused:
    • size() tells you how many elements are in the container.
    • capacity() tells you how many elements the container can hold in the memory it has already allocated.
    • resize(size_t n) forces the container to change to n the number of elements it holds.
    • reserve(size_t n) forces the container to change its capacity to at least n, provided n is no less than the current size.
  • The reserve member function allows you to minimize the number of reallocations that must be performed, thus avoiding the costs of reallocation and iterator/pointer/reference invalidation.
  • There are two common ways to use reserve to avoid unneeded reallocations:
    • The first is applicable when you know exactly or approximately how many elements will ultimately end up in your container. In that case, you simply reserve the appropriate amount of space in advance.
    • The second way is to reserve the maximum space you could ever need, then, once you've added all your data, trim off any excess capacity. (See Item 17 for the trimming part)

Item 15. Be aware of variations in string implementations.

  • Virtually every string implementation holds the following information:
    • The size of the string, i.e., the number of characters it contains.
    • The capacity of the memory holding the string's characters.
    • The value of the string, i.e., the characters making up the string.
    • A copy of its allocator (optional).
    • The reference count for the value. (For string implementations that depend on reference counting)
  • If you expect to have lots of short strings, either (1) your release environment has very little memory or (2) you are concerned about locality of reference and want to cluster strings on as few pages as possible.
  • Let's summarize the things that vary:
    • string values may or may not be reference counted. By default, many implementations do use reference counting, but they usually offer a way to turn it off, often via a preprocessor macro.
    • string objects may range in size from one to at least seven times the size of char* pointers.
    • Creation of a new string value may require zero, one, or two dynamic allocations.
    • string objects may or may not share information on the string's size and capacity.
    • strings may or may not support per-object allocators.
    • Different implementations have different policies regarding minimum allocations for character buffers.

Item 16. Know how to pass vector and string data to legacy APIs.

  • If you have a vector v and you need to get a pointer to the data in v that can be viewed as an array, just use &v[0]. For a string s, the corresponding incantation is simply s.c_str().

  • The approach to getting a pointer to container data that works for vectors isn't reliable for strings, because (1) the data for strings are not guaranteed to be stored in contiguous memory, and (2) the internal representation of a string is not guaranteed to end with a null character.

  • string objects don't care if they contain null characters, but char*-based C APIs do.

  • For a string, you can only read its data (not modify it), because there is no guarantee that c_str yields a pointer to the internal representation of the string data: it could return a pointer to an unmodifiable copy of the string's data, one that's correctly formatted for a C API.

  • For a vector, you have a little more flexibility. If you pass it to a C API that modifies its elements, that's typically okay, but the called routine must not attempt to change the number of elements in the vector. However, some vectors have extra constraints on their data, and if you pass a vector to an API that modifies the vector's data, you must ensure that the additional constraints continue to be satisfied. (e.g. sorted vectors)

  • If you have a vector that you'd like to initialize with elements from a C API, you can take advantage of the underlying layout compatibility of vectors and arrays by passing to the API the storage for the vector's elements:

    1
    2
    3
    4
    5
    6
    // C API: this function takes a pointer to an array of at most arraySize
    // doubles and writes data to it. It returns the number of doubles written,
    // which is never more than maxNumDoubles.
    size_t fillArray(double *pArray, size_t arraySize);
    vector<double> vd(maxNumDoubles);
    vd.resize(fillArray(&vd[0], vd.size()));

    This technique works only for vectors, because only vectors are guaranteed to have the same underlying memory layout as arrays.

  • If you want to initialize a string with data from a C API, however, you can have the API put the data into a vector<char>, then copy the data from the vector to the string:

    1
    2
    3
    4
    size_t fillString(char *pArray, size_t arraySize);
    vector<char> vc(maxNumChars);
    size_t charsWritten = fillString(&vc[0], vc.size());
    string s(vc.begin(), vc.begin() + charsWritten);

  • In fact, the idea of having a C API put data into a vector and then copying the data into the STL container you really want it in always works:

    1
    2
    3
    4
    5
    6
    7
    size_t fillArray(double *pArray, size_t arraySize);
    vector<double> vd(maxNumDoubles);
    vd.resize(fillArray(&vd[0], vd.size()));

    deque<double> d(vd.begin(), vd.end());
    list<double> l(vd.begin(), vd.end());
    set<double> s(vd.begin(), vd.end());

  • Furthermore, this hints at how STL containers other than vector or string can pass their data to C APIs. Just copy each container's data into a vector, then pass it to the API:

    1
    2
    3
    4
    5
    void doSomething(const int* pints, size_t numlnts);
    set<int> intSet;
    ...
    vector<int> v(intSet.begin(), intSet.end());
    if (!v.empty()) doSomething(&v[0], v.size());

Item 17. Use "the swap trick" to trim excess capacity.

  • Calling a range form of erase does a fine job of reducing the size of the vector, but it does nothing to reduce its capacity.

  • "The swap trick" to trim excess capacity (shrink-to-fit):

    1
    2
    3
    4
    class Contestant {...};
    vector<Contestant> contestants;
    ...
    vector<Contestant>(contestants).swap(contestants);

    1
    2
    3
    string s;
    ...
    string(s).swap(s);

  • The language police makes no guarantee that this technique will truly eliminate excess capacity. Implementers are free to give vectors and strings excess capacity if they want to, and sometimes they want to. This approach to "shrink-to-fit," then, doesn't really mean "make the capacity as small as possible". It means "make the capacity as small as this implementation is willing to make it given the current size of the container."

  • A variant of the swap trick can be used both to clear a container and to reduce its capacity to the minimum your implementation offers. You simply do the swap with a temporary vector or string that is default-constructed:

    1
    2
    3
    4
    5
    vector<Contestant> v;
    string s;
    ...
    vector<Contestant>().swap(v);
    string().swap(s);

  • Swapping the contents of two containers also swaps their iterators, pointers, and references. Iterators, pointers, and references that used to point to elements in one container remain valid and point to the same elements — but in the other container— after the swap.

Item 18. Avoid using vector<bool>.

  • vector<bool> is not an STL container, and it doesn't hold bools.
  • vector<bool> is a pseudo-container that contains not actual bools, but a packed representation of bools that is designed to save space. In a typical implementation, each "bool" stored in the "vector" occupies a single bit, and an eight-bit byte holds eight "bools." Internally, vector<bool> uses the moral equivalent of bitfields to represent the bools it pretends to be holding.
  • You may create a pointer to a real bool, but pointers (and references) to individual bits are forbidden.
  • vector<bool>::operator[] returns an object that acts like a reference to a bit, a so-called proxy object.
  • What should you use when you need a vector<bool>? Two alternatives suffice almost all the time:
    • deque<bool>, which is an STL container that really contains bools.
    • bitset, which isn't an STL container, but is part of the standard C++ library.

Item 19. Understand the difference between equality and equivalence.

  • The non-member find algorithm's definition of "the same" is equality, which is based on operator==. set::find or set::insert's definition of "the same" is equivalence, which is usually based on operators.
  • If the expression "x == y" returns true, x and y have equal values, otherwise they don't. Just because x and y have equal values does not necessarily imply that all of their data members have equal values.
  • Equivalence is based on the relative ordering of object values in a sorted range. Two objects x and y have equivalent values with respect to the sort order used by an associative container c if neither precedes the other in c's sort order.
  • For sorted associative containers, member and non-member find may return different results.
  • There are two common designs for nonstandard associative containers based on hash tables. One design is based on equality, while the other is based on equivalence.

Item 20. Specify comparison types for associative containers of pointers.

  • A standard associative container of pointers will be sorted by the values of the pointers. Only rarely will this be what you want, so you'll almost always want to create your own functor class to serve as a comparison type.

  • Most of the time, your comparison type will just dereference the pointers and compare the pointed-to objects. That being the case, you might as well keep a template for such comparison functors close at hand.

    1
    2
    3
    4
    5
    6
    7
    8
    struct DereferenceLess {
    template <typename PtrType>
    bool operator()(PtrType pT1, PtrType pT2) const
    {
    return *pT1 < *pT2;
    }
    };
    set<string*, DereferenceLess> ssp;

  • This Item is about associative containers of pointers, but it applies equally well to containers of objects that act like pointers, e.g., smart pointers and iterators.

Item 21. Always have comparison functions return false for equal values.

  • The return value of a comparison function indicates whether one value precedes another in the sort order defined by that function. Equal values never precede one another, so comparison functions should always return false for equal values, regardless of whether the containers are allowed to store duplicates.
  • Technically speaking, comparison functions used to sort associative containers must define a “strict weak ordering” over the objects they compare. Any function defining a strict weak ordering must return false if it's passed two copies of the same value. (Comparison functions passed to algorithms like sort are similarly constrained.)

Item 22. Avoid in-place key modification in set and multiset.

  • In-place key modification is impossible for map and multimap (unless you use a cast), but it may be possible for set and multiset.

  • If portability is not a concern, you want to change the value of an element in a set or multiset, and your STL implementation will let you get away with it, go ahead and do it. Just be sure not to change a key part of the element, i.e., a part of the element that could affect the sortedness of the container.

  • If you value portability, assume that the elements in sets and multisets cannot be modified, at least not without a cast.

  • To change a non-key portion of an element in a set or a multiset correctly and portably, you must cast to a reference to avoid creating a new object.

    1
    2
    3
    4
    5
    6
    7
    8
    typedef set<Employee, IDNumberLess> EmplDSet;
    EmplDSet se;
    Employee selectedID;
    ...
    EmplDSet::iterator i = se.find(selectedID);
    if (i != se.end()) {
    const_cast<Employee&>(*i).setTitle("Corporate Deity");
    }

  • A map<K, V> or a multimap<K, V> contains elements of type pair<const K, V>. That const means that the first component of the pair is defined to be const, and that means that attempts to cast away its constness are undefined. If you're a stickler for following the rules laid down by the Standard, you'll never try to cast away the constness of a map or multimap key.

  • If you want to change an element in a set, multiset, map, or multimap in a way that always works and is always safe, do it in five simple steps:

    1. Locate the container element you want to change.
    2. Make a copy of the element to be modified. In the case of a map or multimap, be sure not to declare the first component of the copy const. After all, you want to change it!
    3. Remove the element from the container, typically via a call to erase.
    4. Modify the copy so it has the value you want to be in the container.
    5. Insert the new value into the container. If the location of the new element in the container's sort order is likely to be the same or adjacent to that of the removed element, use the "hint" form of insert to improve the efficiency of the insertion from logarithmic-time to constant-time. Use the iterator you got from Step 1 as the hint.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    EmplDSet se;
    Employee selectedID;
    ...
    EmplDSet::iterator i = se.find(selectedID);
    if (i != se.end()) {
    Employee e(*i);
    se.erase(i++);
    e.setTitle("Corporate Deity");
    se.insert(i, e);
    }

Item 23. Consider replacing associative containers with sorted vectors.

  • The standard associative containers are typically implemented as balanced binary search trees. A balanced binary search tree is a data structure that is optimized for a mixed combination of insertions, erasures, and lookups.

  • Many applications use their data structures in a less chaotic manner, which falls into three distinct phases:

    1. Setup
    2. Lookup
    3. Reorganize

    where lookups are almost never mixed with insertions and erasures. For these applications, a sorted vector working with lookup algorithms is likely to offer better performance (in both time and space) than an associative container.

  • Storing data in a sorted vector is likely to consume less memory than storing the same data in a standard associative container, and searching a sorted vector via binary search is likely to be faster than searching a standard associative container when page faults are taken into account.

  • When using a vector to emulate a map<K, V>, then:

    • The type of the data stored in the vector will be pair<K, V>, not pair<const K, V>, because when you sort the vector, the values of its elements will get moved around via assignment, and that means that both components of the pair must be assignable.
    • maps and multimaps keep their elements in sorted order only by the key part of the element (the first component of the pair), and you must do the same when sorting a vector. You'll need to write a custom comparison function for your pairs, because pair's operator< looks at both components of the pair.
    • You'll need a second comparison function for performing lookups, which must take an object of the key type (the value being searched for) and a pair (one of the pairs stored in the vector) — two different types. As an additional twist, you can't know whether the key value or the pair will be passed as the first argument, so you really need two comparison functions for lookups: one where the key value is passed first and one where the pair is passed first.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    typedef pair<string, int> Data;
    class DataCompare {
    public:
    bool operator()(const Data& lhs, const Data& rhs) const
    {
    return keyLess(lhs.first, rhs.first);
    }
    bool operator()(const Data& Ihs, const Data::first_type& k) const
    {
    return keyLess(lhs.first, k);
    }
    bool operator()(const Data::first_type& k, const Data& rhs) const
    {
    return keyLess(k, rhs.first);
    }
    private:
    bool keyLess(const Data::first_type& k1, const Data::first_type& k2) const
    {
    return k1 < k2;
    }
    };

Item 24. Choose carefully between map::operator[] and map-insert when efficiency is important.

  • map::operator[] is designed to facilitate "add or update" functionality.

  • When an "add" is performed, insert is more efficient than operator[].

  • When an "update" is performed, the situation is reversed.

  • A function that offers efficient add-or-update functionality in a syntactically attractive package:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    template<typename MapType, typename KeyArgType, typename ValueArgtype>
    typename MapType::iterator
    efficientAddOrUpdate(MapType& m, const KeyArgType& k, const ValueArgtype& v)
    {
    typename MapType::iterator lb = m.lower_bound(k);
    if (lb != m.end() && !(m.key_comp()(k, lb->first))) {
    lb->second = v;
    return lb;
    } else {
    typedef typename MapType::value_type MVT;
    return m.insert(lb, MVT(k, v)); // insert with hint, in constant time
    }
    }

Item 25. Familiarize yourself with the nonstandard hashed containers.

  • Though the STL itself lacks hashed containers. STL-compatible hashed containers (with varying interfaces, capabilities, and behavioral trade-offs) are not difficult to come by. (e.g. SGI and Dinkumware)

Item 26. Prefer iterator to const iterator, reverse_iterator, and const_reverse_iterator.

  • This diagram displays the conversions that exist among iterator types:

  • Why it often makes sense to prefer iterators to const and reverse iterators:

    • Some versions of insert and erase require iterators. If you want to call those functions, you're going to have to produce iterators, const and reverse iterators won't do.
    • It's not possible to implicitly convert a const iterator to an iterator, and the technique for generating an iterator from a const_iterator is neither universally applicable nor guaranteed to be efficient.
    • Conversion from a reverse_iterator to an iterator may require iterator adjustment after the conversion.
  • Stay away from const_iterators to avoid potential implementation shortcomings when mixing iterators and const_iterators (or reverse_iterators and const_reverse_iterators) in the same expression.

Item 27. Use distance and advance to convert a container's const_iterators to iterators.

  • For deque, list, set, multiset, map, multimap and hashed containers, const_iterators can't be cast to iterators because they are different classes.

  • The cast might compile if the iterators' container were a vector or a string, because it is common for implementations of these containers to use pointers as iterators. However, its portability is doubtful, so casting const iterators to iterators is still ill-advised even for vector and string.

  • If you have access to the container a const_iterator came from, there is a safe, portable way to get its corresponding iterator, and it involves no circumvention of the type system: create a new iterator at the beginning of the container, then move it forward it until it's as far from the beginning of the container as the const_iterator is.

    1
    2
    3
    4
    5
    6
    7
    8
    typedef deque<int> IntDeque;
    typedef IntDeque::iterator Iter;
    typedef IntDeque::const_iterator Constlter;
    IntDeque d;
    Constlter ci;
    ... // make ci point into d
    Iter i(d.begin());
    advance(i, distance<ConstIter>(i, ci)); // explicitly specify the type parameter

  • This technique is as efficient as the iterators allow it to be. For random access iterators (such as those sported by vector, string, and deque), it's a constant-time operation. For bidirectional iterators (i.e., those for all other standard containers and for some implementations of the hashed containers), it's a linear-time operation.

Item 28. Understand how to use a reverse_iterator's base iterator.

  • This picture displays the characteristic offset of a reverse_iterator and its corresponding base iterator that mimics the offset of rbegin() and rend() with respect to begin() and end():

  • To emulate insertion at a position specified by a reverse_iterator ri, insert at the position ri.base() instead. For purposes of insertion, ri and ri.base() are equivalent, and ri.base() is truly the iterator corresponding to ri.

  • To emulate erasure at a position specified by a reverse_iterator ri, erase at the position preceding ri.base() instead. For purposes of erasure, ri and ri.base() are nor equivalent, and ri.base() is nor the iterator corresponding to ri.

    1
    2
    3
    4
    5
    vector<int> v;
    ... // put 1 -5 in v
    vector<int>::reverse_iterator ri = // make ri point to the 3
    find(v.rbegin(), v.rend(), 3);
    v.erase((++ri).base()); // erase the element pointed to by ri

Item 29. Consider istreambuf_iterators for character-by-character input.

  • istream_iterators use operator<< functions to do the actual reading, which skip whitespace by default, and must undertake a fair amount of work to perform formatted input.

  • If you need to read the characters in a stream one by one, you don't need the power of formatted input, and you care about how long it takes to read the stream, typing three extra characters per iterator is a small price to pay for what is often a significant increase in performance. For unformatted character-by-character input, you should always consider istreambuf_iterators.

    1
    2
    ifstream inputFile("interestingData.txt");
    string fileData((istreambuf_iterator<char>(inputFile)), istreambuf_iterator<char>());

  • You should also consider ostreambuf_iterators for the corresponding unformatted character-by-character output operations. They avoid the overhead (and flexibility) of their ostream_iterator cousins, so they generally outperform them, too.

Item 30. Make sure destination ranges are big enough.

  • Internally, the iterator returned by back_inserter causes push_back to be called, so you may use back_inserter with any container offering push_back (i.e., any of the standard sequence containers: vector, string, deque, and list).

  • If you'd prefer to have an algorithm insert things at the front of a container you can use front_inserter. Internally, front_inserter makes use of push_front, so front_inserter works only for the containers offering that member function (i.e. deque and list).

  • If you want transform to put its output at the front of results, but you also want the output to be in the same order as the corresponding objects in values, just iterate over values in reverse order:

    1
    2
    3
    4
    5
    6
    7
    int transmogrify(int x);
    vector<int> values;
    ...
    list<int> results;
    transform(values.rbegin(), values.rend(),
    front_inserter(results),
    transmogrify);

  • inserter allows you to force algorithms to insert their results into containers at arbitrary locations.

  • Regardless of whether you use back_inserter, front_inserter, or inserter, each insertion into the destination range is done one object at a time. Item 5 explains that this can be expensive for contiguous-memory containers (vector, string, and deque), but Item 5's suggested solution (using range member functions) can't be applied when it's an algorithm doing the inserting.

  • When the container into which you're inserting is a vector or a string, you can minimize the expense by following the advice of Item 14 and calling reserve in advance.

  • Using reserve without also using an insert iterator leads to undefined behavior inside algorithms as well as to corrupted containers.

  • If you want to overwrite the values of existing container elements instead of inserting new ones, you don't need an insert iterator, but you still need to follow the advice of this Item and make sure your destination range is big enough.

Item 31. Know your sorting options.

  • If you need to perform a full sort on a vector, string, deque, or array, you can use sort or stable_sort.
  • If you have a vector, string, deque, or array and you need to put only the top n elements in order, partial_sort is available.
  • If you have a vector, string, deque, or array and you need to identify the element at position n or you need to identify the top n elements without putting them in order, nth_element is at your beck and call.
  • If you need to separate the elements of a standard sequence container or an array into those that do and do not satisfy some criterion, you're probably looking for partition or stable_partition.
  • If your data is in a list, you can use partition and stable_partition directly, and you can use list-sort in place of sort and stable_sort. If you need the effects offered by partial_sort or nth_element, you'll have to approach the task indirectly, but there are a number of options:
    • Copy the elements into a container with random access iterators, then apply the desired algorithm to that.
    • Create a container of list::iterators, use the algorithm on that container, then access the list elements via the iterators.
    • Use the information in an ordered container of iterators to iteratively splice the list's elements into the positions you'd like them to be in.
  • In addition, you can keep things sorted at all times by storing your data in a standard associative container. You might also consider the standard non-STL container priority_queue, which also keeps its elements ordered all the time.
  • Broadly speaking, algorithms that do more work take longer to do it, and algorithms that must sort stably take longer than algorithms that can ignore stability. We can order the algorithms we've discussed in this Item as follows, with algorithms that tend to use fewer resources (time and space) listed before those that require more:
    1. partition
    2. stable_partition
    3. nth_element
    4. partial_sort
    5. sort
    6. stable_sort
  • Make your selection based on what you need to accomplish, not on performance considerations. If you choose an algorithm that does only what you need to do (e.g., a partition instead of a full sort), you're likely to end up with code that's not only the clearest expression of what you want to do, it's also the most efficient way to accomplish it using the STL.

Item 32. Follow remove-like algorithms by erase if you really want to remove something.

  • Because the only way to eliminate an element from a container is to invoke a member function on that container, and because remove cannot know the container holding the elements on which it is operating, it is not possible for remove to eliminate elements from a container. Removeing elements from a container never changes the number of elements in the container.

  • remove moves elements in the range it's given until all the "unremoved" elements are at the front of the range (in the same relative order they were in originally). It returns an iterator pointing one past the last "unremoved" element.

  • remove doesn't change the order of the elements in a range so that all the "removed" ones are at the end, it arranges for all the "unremoved" values to be at the beginning. Though the Standard doesn't require it, the elements beyond the new logical end of the range typically retain their old values.

  • You should follow remove by erase if you really want to remove something.

    1
    2
    vector<int> v;
    v.erase(remove(v.begin(), v.end(), 99), v.end());

  • remove and erase are so closely allied, the two are merged in the list member function remove. This is the only function in the STL named remove that eliminates elements from a container.

  • There are two other "remove-like" algorithms: remove_if and unique.

Item 33. Be wary of remove-like algorithms on containers of pointers.

  • You should try to avoid using remove and similar algorithms (i.e., remove_if and unique) on containers of dynamically allocated pointers, because it will lead to resource leak. In many cases, you'll find that the partition algorithm is a reasonable alternative.

  • If you can't avoid using remove on such containers, one way to eliminate this problem is to delete the pointers and set them to null prior to applying the erase-remove idiom, then eliminate all the null pointers in the container:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    void delAndNullifyUncertified(Widget*& pWidget)
    {
    if (!pWidget->isCertified()) {
    delete pWidget;
    pWidget = 0;
    }
    }

    for_each(v.begin(), v.end(), delAndNullifyUncertified);
    // 0 must be cast to a ptr so C++ correctly deduces the type of remove's 3rd param
    v.erase(remove(v.begin(), v.end(), static_cast<Widget*>(0)), v.end());

  • If you're willing to replace the container of pointers with a container of smart pointers that perform reference counting, the remove-related difficulties wash away, and you can use the erase-remove idiom directly.

Item 34. Note which algorithms expect sorted ranges.

  • Algorithms that require the data on which they operate to be sorted:

    • binary_search, lower_bound, upper_bound, equal_range: promise logarithmic-time lookup only when they are passed random access iterators.
    • set_union, set_intersection, set_difference, set_symmetric_difference: offer linear-time performance of the set-theoretical operations their names suggest.
    • merge, inplace_merge: perform a single pass of the mergesort algorithm in linear time.
    • includes: determines whether all the objects in one range are also in another range in linear time.
  • Algorithms that are typically used with sorted ranges though don't require them:

    • unique, unique_copy: eliminates all but the first element from every consecutive group of equal elements.

      In practice, unique is usually employed to eliminate all duplicate values from a range, so you'll almost always want to make sure that the range you pass unique (or unique_copy) is sorted.

  • Because the STL allows you to specify comparison functions to be used during sorting, different ranges may be sorted in different ways. If you pass a sorted range to an algorithm that also takes a comparison function, be sure that the comparison function you pass behaves the same as the one you used to sort the range.

    1
    2
    3
    4
    5
    vector<int> v;
    ...
    sort(v.begin(), v.end(), greater<int>());
    ...
    bool a5Exists = binary_search(v.begin(), v.end(), 5, greater<int>());

  • All the algorithms that require sorted ranges determine whether two values are "the same" by using equivalence, just like the standard associative containers. In contrast, the default way in which unique and unique_copy determine whether two objects are "the same" is by using equality, though you can override this default by passing these algorithms a predicate defining an alternative definition of "the same."

Item 35. Implement simple case-insensitive string comparisons via mismatch or lexicographical compare.

  • Programmers desiring case-insensitive string comparisons often need two different calling interfaces, one similar to strcmp (which returns a negative number, zero, or a positive number), the other akin to operators (which returns true or false).

  • First, we need a way to determine whether two characters are the same, except for their case.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    int ciCharCompare(char c1, char c2)
    {
    // cast to ensure the value representable as an unsigned char
    int lc1 = tolower(static_cast<unsigned char>(c1));
    int lc2 = tolower(static_cast<unsigned char>(c2));

    if (lc1 < lc2) return -1;
    if (lc1 > lc2) return 1;
    return 0;
    }

  • The first case-insensitive string comparison function offering a strcmp-like interface, which is built around the mismatch algorithm:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    int ciStringCompareImpl(const string& si, const strings s2)
    {
    typedef pair<string::const_iterator, string::const_iterator> PSCI;
    PSCI p = mismatch(
    s1.begin(), s1.end(),
    s2.begin(),
    not2(ptr_fun(ciCharCompare))); // return true when the characters match

    if (p.first == s1.end()) {
    if (p.second == s2.end()) return 0;
    else return -1;
    }
    return ciCharCompare(*p.first, *p.second);
    }

    int ciStringCompare(const string& s1, const string& s2)
    {
    if (s1.size() <= s2.size())
    return ciStringCompareImpl(s1, s2);
    else
    return -ciStringCompareImpl(s2, s1);
    }

  • The second approach to ciStringCompare yields a conventional STL predicate, which could be used as a comparison function in associative containers:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    bool ciCharLess(char c1, char c2)
    {
    return tolower(static_cast<unsigned char>(c1)) <
    tolower(static_cast<unsigned char>(c2));
    }

    bool ciStringCompare(const string& s1, const string& s2)
    {
    return lexicographical_compare(s1.begin(), s1.end(),
    s2.begin(), s2.end(),
    ciCharLess);
    }

  • If you're willing to sacrifice some portability, you know that your strings never contain embedded nulls, and you don't care about internationalization, you may find that the easiest way to implement a case-insensitive string comparison doesn't use the STL at all. Instead, it converts both strings to const char* pointers and then uses stricmp or strcmpi on the pointers:

    1
    2
    3
    4
    int ciStringCompare(const string& s1, const string& s2)
    {
    return stricmp(s1.c_str(), s2.c_str());
    }

    stricmp/strcmpi, being optimized to do exactly one thing, typically run much faster on long strings than do the general-purpose algorithms mismatch and lexicographical_compare.

Item 36. Understand the proper implementation of copy_if.

  • There is no copy_if in the STL. If you simply want to copy the elements of a range that satisfy a predicate, you're on your own.

  • If copy_if existed, you could simply do this:

    1
    2
    3
    4
    5
    6
    7
    bool isDefective(const Widget& w);

    vector<Widget> widgets;
    ...
    copy_if(widgets.begin(), widgets.end(),
    ostream_iterator<Widget>(cerr, "\n"),
    isDefective);

  • A trivial implementation of copy_if:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    template<typename Inputlterator, typename Outputlterator, typename Predicate>
    Outputlterator copy_if(Inputlterator begin,
    Inputlterator end,
    Outputlterator destBegin,
    Predicate p)
    {
    while (begin != end) {
    if (p(*begin)) *destBegin++ = *begin;
    ++begin;
    }
    return destBegin;
    }

Item 37. Use accumulate or for_each to summarize ranges.

  • Accumulate exists in two forms. The form taking a pair of iterators and an initial value returns the initial value plus the sum of the values in the range demarcated by the iterators:

    1
    2
    3
    list<double> ld;
    ...
    double sum = accumulate(ld.begin(), ld.end(), 0.0); // initial value must be 0.0, not 0

    1
    2
    3
    4
    cout << "The sum of the ints on the standard input is"
    << accumulate(istream_iterator<int>(cin),
    istream_iterator<int>(),
    0);

  • When accumulate is used in its alternate form, one taking an initial summary value and an arbitrary summarization function, it becomes much more general:

    1
    2
    3
    4
    5
    6
    7
    8
    string::size_type stringLengthSum(string::size_type sumSoFar, const string& s)
    {
    return sumSoFar + s.size();
    }

    set<string> ss;
    ...
    string::size_type lengthSum = accumulate(ss.begin(), ss.end(), 0, stringLengthSum);

    1
    2
    3
    vector<float> vf;
    ...
    float product = accumulate(vf.begin(), vf.end(), 1.0, multiplies<float>());

  • Like accumulate, for_each takes a range and a function (typically a function object) to invoke on each element of the range, but the function passed to for_each receives only a single argument (the current range element), and for_each returns its function when it's done. Significantly, the function passed to (and later returned from) for_each may have side effects.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    struct Point {
    Point(double initX, double initY) : x(initX), y(initY) {}
    double x, y;
    };

    class PointAverage : public unary_function<Point, void> {
    public:
    PointAverage() : xSum(0), ySum(0), numPoints(0) {}

    void operator()(const Point& p)
    {
    ++numPoints;
    xSum += p.x;
    ySum += p.y;
    }

    Point result() const
    {
    return Point(xSum/numPoints, ySum/numPoints);
    }
    private:
    size_t numPoints;
    double xSum;
    double ySum;
    };

    list<Point> lp;
    ...
    Point avg = for_each(lp.begin(), lp.end(), PointAverage()).result();

  • For summarizing, accumulate most clearly expresses what is going on, but for_each works, too, and the issue of side effects doesn't dog for_each as it does accumulate. Both algorithms can be used to summarize ranges. Use the one that suits you best.

Item 38. Design functor classes for pass-by-value.

  • The standard libraries for both C and C++ follow the rule that function pointers are passed by value. STL function objects are modeled after function pointers, so the convention in the STL is that function objects, too, are passed by value (i.e.. copied) when passed to and from functions.
  • Make sure that your function objects behave well when passed and returned by value:
    • First, your function objects need to be small. Otherwise they will be too expensive to copy.
    • Second, your function objects must be monomorphic (i.e., not polymorphic) — they must not use virtual functions.
  • There's a way to let function objects be big and/or polymorphic: take the data and/or the polymorphism you'd like to put in your functor class, and move it into a different class, then give your functor class a pointer to this new class.
  • Functor classes using this technique must support copying in a reasonable fashion. Perhaps the simplest reasonable thing would be to reference count it, using something like Boost's shared_ptr.

Item 39. Make predicates pure functions.

  • Declare your operator() functions const in predicate classes. If you do that, your compilers won't let you change any class data members. This is necessary, but not sufficient.
  • Even const member functions may access mutable data members, non-const local static objects, non-const class static objects, non-const objects at namespace scope, and non-const global objects. A well-designed predicate class ensures that its operator() functions are independent of those kinds of objects, too.
  • Functions in predicate classes should be pure functions. This restriction extends to predicate functions, too.

Item 40. Make functor classes adaptable.

  • The only thing ptr_fun does is make some typedefs available, which are required by not1, and that's why applying not1 to ptr_fun works, but applying not1 to isInteresting directly doesn't work. Being a lowly function pointer, isInteresting lacks the typedefs that not1 demands.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    list<Widget*> widgetPtrs;
    bool isInteresting(const Widget *pw);

    list<Widget*>::iterator i =
    find_if(widgetPtrs.begin(), widgetPtrs.end(),
    not1(ptr_func(isInteresting)));
    if (i != widgetPtrs.end()) {
    ...
    }

  • Each of the four standard function adapters (not1, not2, bind1st, and bind2nd) requires the existence of certain typedefs, as do any non-standard STL-compatible adapters written by others.

  • Function objects that provide the necessary typedefs are said to be adaptable, while function objects lacking these typedefs are not adaptable. Adaptable function objects can be used in more contexts than can function objects that are not adaptable, so you should make your function objects adaptable whenever you can.

  • The typedefs in question are argument_type, first_argument_type, second_argument_type, and result_type. The conventional way to provide them is to inherit them from a base struct.

    • For functor classes whose operator() takes one argument, the struct to inherit from is std::unary_function.
    • For functor classes whose operator() takes two arguments, the struct to inherit from is std::binary_function.
  • unary_function and binary_function are templates, you must inherit from structs they generate.

    • For unary_function, specify the type of parameter taken by your functor class's operator(), as well as its return type.
    • For binary_function, specify the types of your operator's first and second parameters, and your operator's return type.
  • In general, non-pointer types passed to unary_function or binary_function have consts and references stripped off. For functor classes taking or returning pointers, pass to unary_function or binary_function whatever types operator() takes or returns.

  • Sometimes it makes sense to give a functor class multiple invocation forms (thereby abandoning adaptability). Such functor classes are the exception, however, not the rule. Adaptability is important, and you should strive to facilitate it each time you write a functor class.

Item 41. Understand the reasons for ptr_fun, mem_fun, and mem_fun_ref.

  • If I have a function f and an object x, I wish to invoke f on x, and I'm outside x's member functions. C++ gives me three different syntaxes for making the call:

    1
    2
    3
    f(x);	// Syntax #1: when f is a non-member function
    x.f(); // Syntax #2: when f is a member function and x is an object or a reference to an object
    p->f(); // syntax #3: when f is a member function and p is a pointer to x

  • A universal convention in the STL: functions and function objects are always invoked using the syntactic form for non-member functions.

  • mem_fun and mem_fun_ref arrange for member functions (which must ordinarily be called using Syntax #2 or #3) to be called using Syntax #1. They also provide important typedefs, just like the objects produced by ptr_fun.

  • If you get confused about when to use ptr_fun and when not to, consider using it every time you pass a function to an STL component. The STL won't care, and there is no runtime penalty. An alternative strategy with respect to ptr_fun is to use it only when you're forced to. If you omit it when the typedefs are necessary, your compilers will balk at your code. Then you'll have go back and add it.

  • The situation with mem_fun and mem_fun_ref is fundamentally different. You must employ them whenever you pass a member function to an STL component, because, in addition to adding typedefs (which may or may not be necessary), they adapt the calling syntaxes from the ones normally used with member functions to the one used everywhere in the STL.

Item 42. Make sure less<T> means operator<

  • Don't mislead all those programmers by playing games with the definition of less. If you use less (explicitly or implicitly), make sure it means operator<. If you want to sort objects using some other criterion, create a special functor class that's not called less.

Item 43. Prefer algorithm calls to hand-written loops.

  • Calling an algorithm is usually preferable to any hand-written loop. There are three reasons:
    • Efficiency: Algorithms are often more efficient than the loops programmers produce.
      • Algorithms eliminate redundant computations.
      • Library implementers can take advantage of their knowledge of container implementations to optimize traversals in a way that no library user ever could.
      • All but the most trivial STL algorithms use computer science algorithms that are more sophisticated — sometimes much more sophisticated — than anything the average C++ programmer will be able to come up with.
    • Correctness: Writing loops is more subject to errors than is calling algorithms.
    • Maintainability: Algorithm calls often yield code that is clearer and more straightforward than the corresponding explicit loops.
  • If you need to do something an algorithm already does, or if you need to do something very similar to what an algorithm does, the algorithm call is clearer.
  • If you need a loop that does something fairly simple, but would require a confusing tangle of binders and adapters or would require a separate functor class if you were to use an algorithm, you're probably better off just writing the loop.
  • Finally, if you need to do something fairly long and complex inside the loop, the scales tilt back toward algorithms, because long, complex computations should generally be moved into separate functions, anyway.

Item 44. Prefer member functions to algorithms with the same names.

  • Most of the time, you'll want to use the member functions instead of the algorithms. There are two reasons for this:
    • First, the member functions are faster.
    • Second, they integrate better with the containers (especially the associative containers) than do the algorithms.
  • For the standard associative containers, choosing member functions over algorithms with the same names offers several benefits:
    • First, you get logarithmic-time instead of linear-time performance.
    • Second, you determine whether two values are "the same" using equivalence, which is the natural definition for associative containers.
    • Third, when working with maps and multimaps, you automatically deal only with key values instead of with (key, value) pairs.
  • For list:
    • Each of the algorithms that list specializes (remove, remove_if, unique, sort, merge, and reverse) copies objects, but list-specific versions copy nothing: they simply manipulate the pointers connecting list nodes. The algorithmic complexity of the algorithms and the member functions is the same, but, under the assumption that manipulating pointers is less expensive than copying objects, list's versions of these functions should offer better performance.
    • Calls to the algorithms remove, remove_if, and unique must be followed by calls to erase if you really want to eliminate objects from a container, but list's remove, remove_if, and unique member functions honestly get rid of elements: no subsequent call to erase is necessary.
    • A significant difference between the sort algorithm and list's sort function is that the former can't be applied to lists.
    • The merge algorithm isn't permitted to modify its source ranges, but list-merge always modifies the lists it works on.

Item 45. Distinguish among count, find, binary search, lower_bound, upper_bound, and equal_range.

  • With lower_bound and upper_bound, it's too easy to fall back on equality tests, but with equal_range, testing only for equivalence is the natural thing to do.
  • For the multi containers, find is not guaranteed to identify the first element in the container with a given value if more than one is present: its charter is only to identify one of those elements. If you really need to find the first object with a given value, you'll want to employ lower_bound, and you'll have to manually perform the second half of the equivalence test. (You could avoid the manual equivalence test by using equal_range, but calling equal_range is more expensive than calling lower_bound.)

Item 46. Consider function objects instead of functions as algorithm parameters.

  • Passing STL function objects to algorithms typically yields code that is more efficient than passing real functions.
    • If a function object's operator() function has been declared inline (either explicitly via inline or implicitly by defining it in its class definition), the body of that function is available to compilers, and most compilers will happily inline that function during template instantiation of the called algorithm.
    • When we try to pass a function as a parameter, compilers silently convert the function into a pointer to that function. Most compilers won't try to inline calls to functions that are invoked through function pointers, even if such functions have been declared inline.
  • The fact that function pointer parameters inhibit inlining explains an observation that long-time C programmers often find hard to believe: C++'s sort virtually always embarrasses C's qsort when it comes to speed.
  • It's not uncommon for STL platforms to reject perfectly valid code, either through shortcomings in the compiler or the library or both. For example, a particular STL platform has a bug in its handling of const member functions (such as string::size). A workaround is to use a function object instead.
  • Another reason to prefer function objects to functions is that they can help you avoid subtle language pitfalls. There are situations, for example, when the name of an instantiation of a function template is not equivalent to the name of a function, thus being rejected by compilers.

Item 47. Avoid producing write-only code.

  • As you write the code, it seems straightforward, because it's a natural outgrowth of some basic ideas. Readers, however, have great difficulty in decomposing the final product back into the ideas on which it is based. That's the calling card of write-only code: it's easy to write, but it's hard to read and understand.
  • It's a software engineering truism that code is read more often than it is written. Equally well established is that software spends far more time in maintenance than it does in development. Software that cannot be read and understood cannot be maintained, and software that cannot be maintained is hardly worth having. The more you work with the STL, the more comfortable you'll become with it, and the more you'll feel the pull to nest function calls and create function objects on the fly. There's nothing wrong with that, but always bear in mind that the code you write today will be read by somebody — possibly you — someday in the future. Prepare for that day.

Item 48. Always #include the proper headers.

  • Any time you refer to elements of namespace std, you are responsible for having #included the appropriate headers. If you omit them, your code might compile anyway, but you'll still be missing necessary headers, and other STL platforms may justly reject your code.
  • To help you remember what's required when, here's a quick summary of what's in each standard STL-related header:
    • Almost all the containers are declared in headers of the same name, i.e., vector is declared in <vector>, list is declared in <list>, etc. The exceptions are <set> and <map>. <set> declares both set and multiset, and <map> declares both map and multimap.
    • All but four algorithms are declared in <algorithm>. The exceptions are accumulate, inner_product, adjacent_difference, and partial_sum. Those algorithms are declared in <numeric>.
    • Special kinds of iterators, including istream_iterators and istreambuf_iterators, are declared in <iterator>.
    • Standard functors (e.g., less<T>) and functor adapters (e.g., not1, bind2nd) are declared in <functional>.

A few hints that should help you make sense of STL-related compiler messages:

  • You can almost always reduce compiler diagnostics to something comprehensible by replacing lengthy template-based type names with shorter mnemonics. In many cases, all you have to do is replace typedef expansions with typedef names you're already using.
  • For vector and string, iterators are usually pointers, so compiler diagnostics will likely refer to pointer types if you've made a mistake with an iterator.
  • Messages mentioning back_insert_iterator, front_insert_iterator, or insert_iterator almost always mean you've made a mistake calling back_inserter, front_inserter, or inserter, respectively. If you didn't call these functions, some function you called (directly or indirectly) did.
  • Similarly, if you get a message mentioning binder1st or binder2nd, you've probably made a mistake using bind1st or bind2nd.
  • Output iterators (e.g., ostream_iterators, ostreambuf_iterators, and the iterators returned from back_inserter, front_inserter, and inserter) do their outputting or inserting work inside assignment operators, so if you've made a mistake with one of these iterator types, you're likely to get a message complaining about something inside an assignment operator you've never heard of.
  • If you get an error message originating from inside the implementation of an STL algorithm, there's probably something wrong with the types you're trying to use with that algorithm. For example, you may be passing iterators of the wrong category.
  • If you're using a common STL component like vector, string, or the for_each algorithm, and a compiler says it has no idea what you're talking about, you've probably failed to #include a required header file.
  • The SGI STL site, http://www.sgi.com/tech/stl/.
  • The STLport site, http://www.stlport.org/.
  • The Boost site, http://www.boost.org/.