Clear C++ Avoids Human Inlining

Sean Parent (Adobe) has given a number of talks that I’ve enjoyed on techniques for writing clearer code. In this post, I’ll discuss one of his points from the talk “C++ Seasonings” given at Going Native 2013: no raw loops. (talk on YouTube)

Before getting into the details, it’s important to know that achieving clear C++ is more about having a set of heuristics for evolving code rather than a set of hard-and-fast rules about C++. What I mean by “set of heuristics” is that there’s a set of characteristics about code which rate an implementation as better (or worse) than another implementation. Thus we can apply a process of evolving existing code into better code, guided by suggestions (the heuristics) for what “better code” looks like. For those interested in a GREAT set of heuristics in general, check out chapter 17 of the book Clean Code by Robert Martin.

Back to the topic at hand: what is a “raw loop” exactly? Sean Parent defines it as “a loop which exists in a function where the function is doing more than the job of the loop” (slightly paraphrased). I think of this in terms of an implicit assumption I make about code: EVERY loop is doing a named algorithm. The heuristic I use to avoid raw loops is to avoid human inlining of code.

Thinking of the problem in terms of “inlining” has been helpful for me to understand the remedy better. Compilers are really good at taking function definitions it can see at compile time and placing them at the place of the caller, allowing me as a programmer to keep a logical separation in my code for me, while the compiler gets to smash my code together so optimization passes can be more fruitful. Thus if I find a code snippet inside a function which is solving a sub-problem to the function’s implementation, I should extract it and, if possible, make sure it can get inlined into where I just extracted it.

I’ll pick on myself with an example. The code below comes from code I wrote to do some research on multi-hit ray tracing (paper here if curious), where all tests were implemented as components in OSPRay. This was implemented as a callback function where this code is called whenever the ray hits something.

void multiHitFilter(MultiHitInfo *mhi, RTCRay &ray)
{
  if (ray.geomID < 0 || mhi->numHits >= MAX_HITS_PER_TRACE) return;

  // Record hit point
  MHTKHit hit;
  hit.primID = ray.primID;
  hit.geomID = ray.geomID;
  hit.t      = ray.t;
  hit.Ng.x   = ray.Ng.x;
  hit.Ng.y   = ray.Ng.y;
  hit.Ng.z   = ray.Ng.z;

  int i = reduce_max(mhi->numHits)-1;
  int j = i;

  /* Bubble up the hit point into the correctly sorted pos */
  for (; i >= 0; --i, --j)
  {
    if (i < mhi->numHits)
    {
      if (mhi->hitArray[i].t > hit.t)
        mhi->hitArray[i+1] = mhi->hitArray[i];
      else
        break;
    }
  }

  mhi->hitArray[j+1] = hit;
}

I hope you read that code and felt that it could have been much clearer, as the code itself would require some study to understand without the comments. Even with the comments, there’s no guarantee that they are up to day, though here it’s probably not too diffucult to verify that they make sense. Right away, I see a loop which has been “inlined” by a human being (me), which is better done by the compiler.

To be fair, this code is actually ISPC, which is basically just C with SPMD constructs added to the language (a post on ISPC may come one day!). To prevent extra confusion, I removed the ISPC specific keyword “uniform” which was sprinkled in various places as it isn’t important to my point in the example. There are a couple of things that lead me to writing this code in the first place: no STL available in ISPC and almost no C++ features to work with. Regardless, the “manual inlining” heuristic triggers in my head, creating a desire for a better version of the code…I see a raw loop!

Even though I don’t have std::sort() available to me, the remedy is all the same: instead of having a loop which bubbles up the hit point into the right position in the hit array, I should encapsulate the problem into a function which solves only that problem. Thus an improved version might look something like this:

void insert_sorted(MultiHitInfo *mhi, MHTKHit hit)
{
  int i = reduce_max(mhi->numHits)-1;
  int j = i;

  for (; i >= 0; --i, --j)
  {
    if (i < mhi->numHits)
    {
      if (mhi->hitArray[i].t > hit.t)
        mhi->hitArray[i+1] = mhi->hitArray[i];
      else
        break;
    }
  }

  mhi->hitArray[j+1] = hit;
}

void multiHitFilter(MultiHitInfo *mhi, RTCRay &ray)
{
  if (ray.geomID < 0 || mhi->numHits >= MAX_HITS_PER_TRACE) return;

  // Record hit point
  MHTKHit hit;
  hit.primID = ray.primID;
  hit.geomID = ray.geomID;
  hit.t      = ray.t;
  hit.Ng.x   = ray.Ng.x;
  hit.Ng.y   = ray.Ng.y;
  hit.Ng.z   = ray.Ng.z;

  insert_sorted(mhi, hit);
}

Aside from making this code much easier to read and comprehend, there’s a secondary benefit to the new structure: the implementation of insert_sorted() can be easily substituted with different versions which use different insertion/sorting techniques, making the code easier to update for new experiements (in this case, a good outcome for research).

Interestingly, the process doesn’t stop here as the recording of the hit point also looks “human inlined” despite not being a loop. In ISPC, this can be further reduced to:

void insert_sorted(MultiHitInfo *mhi, MHTKHit hit)
{
  int i = reduce_max(mhi->numHits)-1;
  int j = i;

  for (; i >= 0; --i, --j)
  {
    if (i < mhi->numHits)
    {
      if (mhi->hitArray[i].t > hit.t)
        mhi->hitArray[i+1] = mhi->hitArray[i];
      else
        break;
    }
  }

  mhi->hitArray[j+1] = hit;
}

MHTKHit extract_hit_info(RTCRay &ray)
{
  MHTKHit hit;

  hit.primID = ray.primID;
  hit.geomID = ray.geomID;
  hit.t      = ray.t;
  hit.Ng.x   = ray.Ng.x;
  hit.Ng.y   = ray.Ng.y;
  hit.Ng.z   = ray.Ng.z;
  
  return hit;
}

void multiHitFilter(MultiHitInfo *mhi, RTCRay &ray)
{
  if (ray.geomID < 0 || mhi->numHits >= MAX_HITS_PER_TRACE) return;

  MHTKHit hit = extract_hit_info(ray);
  insert_sorted(mhi, hit);
}

Which if done in C++ where constructors were available, would end up looking more like:

// insert_sorted() from above code listings
// ...

void multiHitFilter(MultiHitInfo *mhi, RTCRay &ray)
{
  if (ray.geomID < 0 || mhi->numHits >= MAX_HITS_PER_TRACE) return;

  insert_sorted(mhi, MHTKHit{ray});
}

Wow, what a difference between the first version and the last one! That one line can be understood almost instantaneously, where examining the rest of the callback function those lines are in is now MUCH less overhead for the reader. Furthermore, extracting smaller sub-problems like this will promote less copy-paste issues as you will be able to reuse the code you’ve extracted in other places in your code.

I hope Sean Parent’s advice makes sense and that now you have a different lens for seeing code. While the above example may feel contrived to some of you, the problem gets complicated VERY quickly if you don’t actively look for ways to keep code complexity managed. If you didn’t already, I’d encourage you to watch at least the “no raw loops” section of the video I linked to at the beginning of the post. In that talk you will see a MUCH better example of the power of this code improvement heuristic…don’t just take my word for it.

Simply put: write code for human beings.

Until next time, happy coding!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s