Scientifically Tested Code Reading Skills
I’ve run across a paper [pdf] researching how we read code. The article describes an eye tracking device that can identify what people were focusing on when they were given a piece of code. Each subject was given six snippets of code, each with a built-in error, and was asked to analyze the code and find the error. Here’s a sample from the article:
On the left, a small snippet of code that should sum numbers for 1 to a given maximum. On the right, the eye movement of the subject as he reads the code. Notice how the eye first scans the function headers, then briefly scans the function’s body and lastly focuses on where the problem is most likely to appear in real code (the loop).
Reading Code is Like Reading the Talmud
The following Talmud-reading tactics are, I think, also useful for code-reading:
- Work in pairs, thinking out loud to one another.
- Argue. If your partner says “this means X”, and you either don’t understand why or you have another opinion, demand an explanation.
- Sometimes, when dealing with a chunk of text, it’s easier to figure out the middle *after* you understand what’s on both ends. Therefore, if a fragment of text has you stumped, try skipping over it and seeing if you can come back to it later. (But you still have to come back to it eventually.)
- Read the text both “inside” and “outside”. An inside reading translates the text into English (or whatever your native language is) phrase-by-phrase; an outside reading translates a larger chunk into an idiomatic paragraph. If you only read inside, you can miss the forest for the trees; if you only read outside, you can fool yourself by making broad guesses and not verifying them with details.
Seth describes two types of code. Some code is easier to read and understand first, and is the basis to the rest of the program (Type I). The second type of code is dependent of other code, and so is easier to understand in context (Type II). This fits with what I look for when I need to learn a new piece of code:
- Class and interface dependency: look for the base classes, which depend on no other class. These, usually, are small and simple objects, with the purpose of holding a little bit of state. They may do some calculations, but are mostly built around an internal storage and a bunch of get-set functions. These are classes of Type I. Skim through them now.
After you are familiar with the basic building blocks, you can look for the manager classes. Most of the time there will be a bunch of very big, hairy classes that are the core engine of the software. These are classes of Type II. Make sure you understand, from the function names, what each of these managers is supposed to do. After that, you can delve in. Manager objects usually only make sense when they interact with each other, so you have to have an idea about how all of them are designed before going into any of them too deeply. (Tip: The class called Manager is not always the manager. The class called from the main() usually is).
- Assume a lot: Assume that function and variable names matter. If you would have written the code, what would the GetNextIndex do? Assume it does exactly that. If a function has a name that sounds obvious enough, skip it. You’ll get back to it if you ever need to use it. Why waste your time on something you don’t need right now? There’s only this much detail you can keep in mind at this point. If you dig too deep, you’ll start forgetting things you’ve already read. Focus on purpose and structure, not on implementation.
- Learn by debugging: Unit tests matter, but real code matters more. If you really want to understand how the code works, the fastest way is to use it. Fire up your IDE and write a quick and dirty demo of what you wanted to use the code for. Chances are your code is bust. I’ll be surprised if you can get it to work on the first go. That’s OK. Now, start debugging. You’ll learn the inner workings of the code (and its bugs) much faster, and you’ll only focus on the parts that matter to your task.
Writing Readable Code Matters
- Function names matter: we’ve known this for a long time, but now we know that we scan function names first. A good name, and you don’t need to focus on the implementation all that much.
- Variable names matter: again, nothing new. Notice, though, how the eye goes back to the variable definition and assignments. We constantly look for reminders of the state of the variable.
- Loops are the source of all evil: well, that’s not entirely true, but they are worse than all other statements. Notice how figuring out the loop is the most time consuming bit of the code. It’s where most pitfalls are (Oh, the index should be smaller OR EQUAL. Right. I forgot). Commenting can help some, but just keeping the conditions simple and readable is key. Break complexity down around the loop.
- Short code is easier to process, so get your refactoring tools out, everybody, it’s time to split these functions.
My favorite list of tips for readable code is uncle Bob’s guide to Clean Code: A Handbook of Agile Software Craftsmanship. Even if on some occasions it gets tedious and overzealous, it is still a great read for the practical coder, dealing with many issues of writing readable code and refactoring.
The research certainly doesn’t prove much about the way we read code, as it was only tested on 5 (that’s right – five) people. To me it seems like more could have been done.
I mean, going through the lengths of creating a really awesome eye tracking device, carefully calibrated so the machine can tell which line of code is being inspected at any moment, and then test it only on 5 people? Is getting more people into the test room so much hard work? I’d love to be a test subject and learn about my code reading habits. Next time, pick me.