Personal tools
You are here: Home technical Writing Readable code
Document Actions

Writing Readable code

by Ramarao Kanneganti last modified 2006-01-01 15:36

Set of common sense guidelines for writing readable code.


Best Practices in Writing Readable Code

Ramarao Kanneganti
Oct 26, 2004

About
Best practices in writing readable code: format, structure, modularization and conventions
Audience
Suitable for programmers with 0-3 years of experience.
Purpose
At the end, the audience will understand the principles behind writing readable code. They all will learn the tools and techniques to practice those principles.

Revision History

  1. Oct 26, 2004: Document is created.

Introduction

Ultimately, the only way a piece of software can impact the world is by doing something useful. People learn that fact in colleges and try to make working software. For them, the important aspects of code become, in the order of priority, getting the task done, and getting it done correctly. As they get more experience, they realize that there are other important aspects of software:

  1. It costs more to maintain and modify the software than develop the software.
  2. A long living software is read more often than it is written.
  3. Usually, correctness of the code is correlated to the readability of the code.

All in all, we feel that the primary purpose of the code is to communicate to the other developers within the confines of a compiler specified medium.  Even though readability is an auxiliary goal, aiming for it lets us get to the primary goal: working code that is correct.

Normally, as soon as people start a project, they agree on a set of coding guidelines. These coding guidelines spell out, in elaborate detail, what the syntactic conventions are: how to use underscores, camelHump style, and how to indent. While these guidelines are of good intentions, it ends up getting ignored for several reasons: they do not explain why these rules make sense; they do not say which rules are more important; they do not equip the coder with the rationale behind the rules.

In this document, I am going to lay out the general principles of coding guidelines. I will provide concrete examples for as an illustration of these principles, which can readily be used. In addition, wherever possible, I will provide the tools and mechanisms to enforce these principles.

Writing Readable Code: Format of the Code

Most of the code is read on the computers these days. People use their editor or their IDE to read and modify the code. Sometimes, they use the browser to read the code as well. People who write good code understand that. They make it easy to read code on the computer.

In fact, most coding guidelines are aimed at providing good readability. They capture the principles of readability in the form of specific rules of formatting, indenting, and naming rules. Naturally, such excessive specifications are violated. Or, even if people confirm to the rules, the intent behind the rules is violated. That is, the underlying principles of consistency and clarity that helps readability is violated.

Thankfully, these problems are easy to solve. These days we have tools that makes it easy to follow formatting guidelines. We are going to provide a configuration in the appendix. We also will explain the intent and the logic behind the rules so that the writers can make sure the final goals are met.

Visual elements of the code

One of the aspects of formatting of the code is visual appeal of the code. It applies to spacing, grouping, and indenting of information. This issue is important enough that some languages attach meaning to indentation (example: python). The principle behind the visual appeal of a code is this: can you looking at the code on the screen find what you are looking for? Can you open a file and figure out if this is the file you need to look at? All our guidelines are around that principle. We are going to enumerate them here.

80 columns per line

These days people rarely print out code. They read the code in their editor or IDE. Even if the screen resolution is very good, people may have several windows open such that the code may be less than 80 columns. 

Aha, you say. What about using the wordwrap feature?

Two problems: One is that the reader may want to print it out. Some printers will cut the characters beyond the specified width. The other is, some editors may not wrap the lines. For example, a browser does not wrap lines. Also, the wordwrap feature will not be able to do the right alignment. A structured program becomes a flattened prosaic statement. The readability of the code is reduced.

Then, what about the lines exceeding 80 characters?

More often than not, such code is a suspect. Either the depth of the code is too much, in which case, you are better off making it more modular, or the indirection is too much, in which case, you are better off introducing temporary, meaningful variables. This is a heuristic, but a darn good one.

In the rare cases where it is needed, you can go the next line and align the argument, or operator appropriately. When in doubt, run the Unix program indent, and it will produce the indented program. It has gazillion options for you to tinker to produce the kind of output you want. For Java, there is jalopy that does all this and much more.

So, how would you limit the line width? Of course, in Emacs, you do the following:

  1. (set-fill-column 72) in the .emacs or evaluate the statement via esc-:.
  2. To know which column you are at, you can set the column-number-mode with esc-x column-number-mode.
In other editors, there must be a way. If there isn't, switch to emacs. (http://www.xemacs.org/). The following customizations work for emacs for developing C/C++ code along the guidelines I wrote.
;;Customizations for all of c-mode, c++-mode, and objc-mode
(defun my-c-mode-common-hook ()
;; other customizations
(setq tab-width 4
;; this will make sure spaces are used instead of tabs
indent-tabs-mode nil)
;; we like auto-newline and hungry-delete
(c-toggle-auto-hungry-state 1)
;; keybindings for all supported languages. We can put these in
;; c-mode-base-map because c-mode-map, c++-mode-map, objc-mode-map,
;; java-mode-map, and idl-mode-map inherit from it.
(define-key c-mode-base-map "\C-m" 'newline-and-indent))
(add-hook 'c-mode-common-hook 'my-c-mode-common-hook)

Try it. You will be amazed. When you press semicolon or brace or newline, it will do the newline automatically! Better yet, when you press backspace, it erases all that it inserted and takes you back where you were!

As usual, all this Emacs stuff works on NT, Unix, and other operating systems as well. Oh, it costs exactly $0 too!

Number of Statements for a line

Leaving blank statements helps to focus the reader. For example, you can club all the related statements in a visual block. On the flip side, leaving blank lines where they are not needed is a bad idea. Excessive blank lines do not make code more readable. In fact, if any, the linguistic affinity between the elements get lost. Compare the following:

With blank lines
Without Excessive blank lines
for (i = n; i > 0; i --)
{

sq_n += 2*i - 1;

if (sq_n > MAX_VAL)
{
printf("Exceeded size");

exit(1);
}

}
int sq_n = 0;
// Calculate the square of n.
for (i = n; i > 0; i --) {
sq_n += 2*i - 1;
if (sq_n > MAX_VAL) {
printf("Size Exceeded");
exit(1);
}
}

Blank lines create the effect of paragraphs in the code; thus they should be used to group all the statements that belong in a logical unit. If you can write a comment on what the next group doing and why, then perhaps that group of statements deserves to be together.

Excessive blank lines rob the legitimacy of the needed breaks. In addition, they take up valuable real estate on the screen.

Along with the "zero" statement lines, we need to be concerned with more than one statement for a line also. Conventional wisdom states that number of statements to a line should be restricted to one. The exceptions are simple initializations and simple statements. For example, int i=10; int j=20 can be on one line. In fact, leaving those statements on one line focuses the reader better. However, if the initializations need explanations, place them on a separate line.

Indentation

There are many religious wars about indentation. Ultimately, there are two biggest properties you look for in any rules about indentation style: Consistency and readability. Within these parameters, the following suggestions can be made:

  1. Do not use tab characters in your code. That is, do not use tabs to indent your files. Fortunately, these days most editors offer an option to convert tabs into spaces as you type. Check your editor on how to turn that option on.

    Why is this important? Tabs expand to different number of spaces on different editors. For example, you can set tab length as 4 chars in one editor. When you open the file in another editor, it will show 8 spaces causing misalignment.

  2. Align the new line with the beginning of the expression at the same level on the previous line. i.e. Do block related statements together. Braces are useful for compilers, but humans still rely on indentation and blank lines to group statements together. Use as appropriate number of spaces for indentation: I personally use 2 spaces. It means, that I can keep my lines to 80 columns without resorting to continuation lines. Anything less than two spaces do not provide the visual cues needed for grouping statements.

These formatting issues can best be explained by an indenting program. Please look in the appendix for such configuration files. If you follow these guidelines, you get these benefits.

  1. The reader can read most code on the modern terminals and on the paper.
  2. The reader will see the code as you intended (tabs to spaces).
  3. The reader can see the logic through the concrete structure of the code.

Naming the entities

For a compiler, as long as we are sticking to the language rules, it does not matter what kind of names we use for the entities in our programs: files, classes, objects, and variables. However, for readers, it matters how the variables are named. They are used to communicate the concepts at a domain level. Here are some of the do's and don'ts.

  1. It is OK to name loop variables as i and j. It is a long standing convention that treat i,j,k as loop variables.
  2. For common suffixes and prefixes, agree on a standard. For example, for number, people use "no" or "num" or use it as prefix or suffix. For example, number of books can be referred to as "booknum", "numbooks" and some other combinations. Establishing a common notation and stems would help come with names that are readily understandable.
  3. Avoid complicated Hungarian notation. You are not a type checker. It is the job of a compiler. Use the language that is closest to what people speak. For reference, Hungarian notation encodes type information of a variable in the name. For example, they use "p" as a prefix for variable name, if the variable is a pointer. Or, "s" for String. I do not believe "ppiName" is a good variable name, as the type information is more important for a compiler than the reader of a program.
  4. Avoid using generic names such as User, customer. Use specific names just as in writing prose. If a system has several kind of users, use appropriate name as in administrator or clerk and so on. If a function computes averages, name it as "computeAverages" instead of "doTask". If a boolean variable tests if the cart is empty, call it "isCartEmpty" instead of "flag".
  5. As far as camelHump notation (popular in Java world, where people use Capital letters to denote word break in the name), or word_underscore notation (popular in C world), use whichever the standard is in that language. I personally find underscores make the word break real and visible, but I am not going to fight the establishment on that one.

The real test is always this: If you can read the code over the phone so that listener can comprehend it, then the naming has achieved its objective. For example, naming the procedure as a verb like "setupChannel" reads better. Naming a function such as a = average(1,2,3) reads better than a = computeAverage(1,2,3).

Writing Code: Structure of the program

The preceding section described the ways in which you can use formatting to communicate the program better to the readers. Programming languages have mechanisms such as functions to create groups of statements. These functional, object decomposition has several purposes including reuse and ???. 

The physical structure of a large program is exposed through files. Each file again structured through functions (even in the OO world). So, to properly structure the program we need to structure the files. Here, I am not going to describe only the physical structure of the files. In the later section I will describe the logical structure of the files.

Breaking the program into files

How do we break program into files? In most OO languages, each class gets a file. In non OO language, each cohesive set of functions get a file. In either case, we need to follow these guidelines.

  1. File length should not exceed a few hundred lines. Normal recommendation is 200 or so. However, if logic demands that the file should be larger, try using auxiliary classes or auxiliary courses.
  2. Make sure each file contains at least sizable information. Excessive fragmentation of a program through large number of files can make it difficult to read.
  3. If a group of developers are working on the project, make sure that a file is under control of one developer. That is, to understand and work with that file it should not take knowledge from disparate areas of the the domain or programming. For example, if a file contains knowledge about SQL, parsing, and numerical computing, there are not many people who know enough about these areas to own the file.

In addition to these reasons, a proper decomposition of the code into files help version control in the groups. Most version control systems treat files as a unit and provide versioning at only file level.

Comments in the code

Several good books have been written about this subject out of which I would recommend "Code Complete". The fundamental principle is that in general code should convey all the meaning to the readers. However, syntactic constraints may not make it possible. More over, unlike a book where we can recast the matter for different audiences, we cannot write the code for different audience. In several such cases, you can add comments to the code as an extension to the code.

Writing comments should not provide an excuse to write poorly structured code. Comments can only add information at a different level. Well written code is its own ??? 

  • Comments should explain the domain portion, not the code.
  • Comments should tell the reader why and what you are doing it, rather than how.
  • In case the code is tricky, explain the how, and tell them explicitly why it is tricky.
  • Use commenting style that is easy to maintain.

Comments should not explain Code

The worst kind of comments are the ones that explain the code. E.g.: a = a +1 // Increment a by 1. Well written code is self explanatory. However, code cannot explain why something is being done. Comments can explain that part. For example, the earlier statement may become: a = a + 1 // Add a bonus point to the customer if he makes it here. Thus, comments can explain the intent, the domain portion.

In some cases, we may have to explain the code. For example, if you are implementing a complex search algorithm, you may have to explain in the comments what you are doing. If your code contains unexpected or unseen effects, you should use comments to explain that to the reader. E.g.:

isCartEmpty= true; 
if (! isCartEmpty) { ....} // Since other threads may be accessing
// isCartEmpty, we need to do this test

Btw, the same principle can be stated as: Comments should not repeat information. If it does, there is every chance of the code and comments being out of sync, confusing the reader.

Comments should be maintainable

We often see comments that are formatted beautifully, often by hand. For example, see this:

/***************************
* Author: Rama *
***************************/

If the name of the author changes, then the alignment can get messed up. So follow a simple style of comments with using blank lines and spaces to communicate the ideas. Do not depend on complex formatting. You can correct the previous code with:

// Author: Rama

Comments should be brief

Since reading any text takes time, we should make comments as brief as possible. You can refer to other portions of the code to simplify the comments. As much as possible, use the comments to explain the domain relating to the requirements and use cases.

Use Comment markers to help the readers

Several experienced coders use standard markers in the code to denote certain standard concepts in the code. These markers can easily be picked out by programs such as "grep" so that they can be acted upon. Most popular markers are:

  1. TODO marker: It is used to identify the tasks yet to be done. In fact, some IDEs like Eclipse can even show a list of TODO tasks. It is an excellent way to keep track of things to be done in the code. Make sure that all of the TODO is one line, so that it can be "grepped".

    It is also a good practice not to write to many TODO's clubbed at one place. For example, the following is not preferred.

    //TODO: Add Street name
    //TODO: Add second phone number
    //TODO: Add mobile phone number
    Instead, the following works well. It is one line and it clubs all of them into one TODO.
    //TODO: Add second address field to the customer class.
  2. FIXME marker: In the code, there are several places, you may take short cuts knowing fully well that it needs to be fixed later. In such cases, you can use FIXME marker to denote that activity. While TODO denotes unfinished activity, FIXME denotes code that needs fixing later on. For example, consider the following:
    //FIXME: Used only 50 chars for the name. Make it dynamic string.
    char name[50];
  3. XXX Marker: This marker is private marker. It is generally used only for the developer to be deleted by the time he is done. It is generally used as note to the developer.

Modularity

There are several ways the application can be made modular. Files, classes, and other structural elements help to organize the program.

One important purpose in organizing the program is to isolate the moving parts. That is, any program contains mature code and the code that is undergoing rapid transformation. It contains fixed entities and the entities that need customizing. A good coding style separates these so that customizing, modifying, and extending programs becomes easy.

The following are some of the guidelines for developing flexible code.

Do not use literals in the code

That is, unless they are cosmic constants (say pi or e), people want to change the parameters in the code. So, do not use constants in there. For example, the following is bad.

streetName char[50]

We can correct it by:

#define FIFTY 50
streetName char[FIFTY]

Still bad. What if we use the length in multiple places and we want to change it to 75? It would read #define FIFTY 75. We can make it more readable by:

#define arrayLen 50
streetName char[arrayLen];

Even this can be made better by referring to the constant by its semantic significance. For example, it can be street name length. In which case, it be made:

#define STREETNAMELENGTH 50
streetName char[STREETNAMELENGTH];

As the next step, you collect all these constants into a separate file and through #include mechanism to use in other parts of the code. Of course, you should use language facilities to define constants instead of #define. So the next version will be:

const int STREETNAMELENGTH 50 // To be placed in a central file.
streetName char[STREETNAMELENGTH] // Declare it where used.

We can make it even better by using namespaces etc.

Isolate the flexible portions of the code

Since customization and extensions of the code typically deals with a small subset of the code, it is a good idea to isolate that code. We can use the following techniques.

  1. Use separate files and include them in the code.
  2. Use separate classes that encapsulates the changing code.

We will discuss these more in the next section.

Writing Code: Content of the Program

Have you read any good code lately? Code that makes you feel like exploring where you can progressively zoom into the topics you need to understand, and glance through the topics you already understand? Or, the code that lets you skip portions of unneeded code so that you can get to the portion you need to read?

Well, if you have not read such code, you should read code. Reading code is one of the best ways to write code, just like prose. Some samples of code I liked over the years, for example, include ghostscript (A software postscript interpreter), Scheme interpreter from Rice University, several code packages from GNU software.

There are two skills that are required to write good programs: OO analysis and Structural decomposition. If the program is too small, then OO does not really help. If the program is too large, then there are other techniques you need to use. For example, you would create your own layers, systems as an edifice to make your program possible.

OO Analysis

There is a lot of hype surrounding OO technologies. Without going into all the theory, I want to cover only one area. Given a program, how do you go about designing your objects and how do you code them?

First of all there are different purposes to objects. We can broadly classify them into the following:

  1. Domain objects
  2. Linguisitic Objects
  3. Glue Objects

The domain objects are the ones that you encounter in your domain. Suppose you are designing a patient record history system. You can get the all the domain objects by listing all the concrete and abstract nouns in there. All the activities become  methods on objects.

Linguistic objects are the ones that are imposed by the language. That is, these are the objects you wish are part of the language, but not. Almost all data structures fall into this category. Almost all the classes that abstract some utility function fall into these classes.

Glue objects hold several similar activities together. These are abstract entities that provide convenient holders.

For example, patient, hospital, ward, doctor, nurse, examination, record, patient-file all these become domain objects. Classes such as patient set etc are linguistic objects. Glue object can be HIPPA (Some compliance rules) that have methods to evaluate if a system is HIPPA compiant.

Structural Decomposition

Once you start reading code, you realize on basic tenet: at most your brain will comprehend code spread over 25 or so lines. Anything beyond, your brain will try to break into chunks. When writing prose, will you have a paragraph stretching pages? Think of the code same way.

Different programming languages offer different compositional units. The most common one is a functional or procedural decomposition. In this model, you take a large function and break it down into sub-functions. The reasons for decomposition is for two reasons:

  1. Reuse of the code: If properly decomposed, the parts of abstraction can be reused elsewhere. For example, suppose you refer to a patient by SSN (Social Security Number, a unique identifier in the US), then there must be code that pulls up the patient record given SSN. That code, if isolated, can be referred through proper function.
  2. Ease of understanding: Even if there is no reuse, sometimes, breaking the code into smaller parts makes it more readable. It lets the writer code at a uniform level, preferably that of the domain, hiding the details of exceptional logic and language objects into functions that do not clutter the mainline code. 

Normally, people do not decompose the code as an afterthought. They write the code from top down, resulting in a code that is naturally readable, and naturally well decomposed. Here is an example.

Consider the case of patient record system. When a patient walks in for an appointment, here is what the system might do: First his SSN is entered. If there is a previous history, that is pulled up. If there no history, his information is collected and entered into the system. After that, the purpose of his visit is entered into the system. Next, the name of the doctor is entered. His insurance information may be verified and the name of the verifier may be noted. The patient is advised of his rights and that fact should be noted in the system.

Suppose, we write the entire process in one function, clubbing the database retrieval, and adding the information to the database, and validation of SSN -- the system can get unreadable. We will lose the information at the domain level (as presented in the preceding paragraph). Here is a possible rendition of the code:

//This code is executed when the patient visits for the first time:
Get Patent Information
If the patient is not there in the system, add him to the system.
Obtain the purpose of the visit
Enter the name of the provider
Verify the insurance Information
Obtain the proof of advising of the rights.

Turning it into more "code like":

// This code gets executed when the patient visits for the first time
// Get the patient information
Patient = getPatient(getSSN());
// If the patient is not there in the system, add him to the system.
if (Patient == null) { createPatient(InputSystem.getSSN()); }
// Obtain the purpose of the visit
recordPurposeOfVisit(Patient);
//Enter the name of the provider
enterProvidersName(Patient);
// Verify the insurance Information
verifyInsurance(Patient);
//Obtain the proof of advising of the rights.
adviseTheRights(Patient);

Of course, this code is not real. However, it helps you in two ways. You can retain the comments that help us show what we are trying to do. It also tells us more about the kind of objects we need. Here, we have two nouns: Patient and Visit. We also may have a linguistic object like session that deals with the current session with the patient. We also may have a glue object like PatientRights.

With that in mind, we will redo the code.

// This code gets executed when the patient visits for the first time
// Get the patient information
currentPatient = Patient.getPatient(currentSession.getSSN());
// If the patient is not there in the system, add him to the system.
if (currentPatient == null) { currentPatient=createPatient(currentSession); }
// Obtain the purpose of the visit
currentVisit = new Visit(currentPatient);
currentVisit.recordPurpose(currentSession.getPurpose());
//Enter the name of the provider
currentVisit.recordProvider(currentSession.getProvider());
//Verify if the insurer is still current.
currentPatient.verifyInsurer(currentSession);
// Advise the rights, and record that information.
currentPatient.adviseRights(PatientRights.getRights(currentPatient),
currentSession.getOperator());

Depending on the objects you may have and the complexity of the code, this can result in a few hundreds of lines. For example, the simple function getSSN, may become something like:

// Get the Social Security Number from the user and validate
SSN = getInput("Enter SSN:");
if (isValid(SSN)) return SSN;
for (int i = 0; i < MAX_NUMBER_OF_TRIES, i++) {
SSN = getInput("Invalid SSN. Enter SSN:");
if (isValid(SSN)) return SSN;
}
throw InvalidInput("Invalid SSN");

While this code is certainly needed, if we placed it in the main code, it would clutter up the rest of the code. By separating it out, we have achieved our two purposes: Reuse (in case we need to get the SSN from the user somewhere else), and Readability. Notice that validating SSN is pushed into some other routine to separate such non-domain logic from a domain specific constraints (max number of times you can ask for SSN) achieves the same goals.

Here are the rules of thumb with this approach:

  1. Have functions that are smaller than 25 lines. Best way to create new functions is to abstract some part of the problem (domain, technical, or linguistic) and provide a function for that. It always should be possible to get such an abstraction going.
  2. Use the English description you write as you refine the code as comments. It tells you the domain level activities that tell the reader what you are doing.
  3. For complex logic functions, include the algorithm before the function body and after the comment section.
  4. If the function is large, break down into blocks, where each block is doing some unique activity. Use one line comment describe that activity.
  5. Use appropriate formatting scheme to cut down on excessive lines. For examples, ornate commenting scheme is not good. Placing empty lines that does not indicate some semantic separation to the reader is not good. Also, use K&R scheme of indenting to maximize the information to lines ratio.

    Summary for the impatient

    If you are a manager

    1. Setup the templates for files. Refer to some sample templates for .cpp and .java.
    2. Setup the metrics programs so that you can monitor weak spots in the code.
    3. Setup the checkstyle so that you can identify stylistic deviations in the code.

    If you are a developer

    1. Use templates to start writing code.
    2. Develop the code in top down fashion. Write each step and expand the step as a separate program. Exploit commonalities and increase the flexibility by parameterizing the code.
    3. Write comments on what you are doing rather than how you are doing.
    4. Make sure that there are no metrics violations.

    Appendix

    Resources: [WORK IN PROGRESS]

    An xemacs file for C and C++ programs

    If you are writing code in Unix and C/C++, Xemacs is one of the best choices out there. You will have to refer to the book for the full details about using for developing code. Here is code.el that you can drop in your .xemacs folder in your home directory. It does the following:

    1. Sets up the right indenting (no tabs, only spaces).
    2. Hooks up ";" and "{" so that the code is automatically indented.
    3. Adds a function menu that lists all the functions in the code.
    4. Adds line numbers to each line while editing.

    For a full set of emacs files that supports automated templates for any new files, and complete IDE, refer to the book. For further information on configuring emacs, you can refer to emacswiki.

    Jalopy: An indentation program for Java

    Jalopy in an indentation program for Java. Most of the traditional coding style guidelines can be automated using Jalopy. Here is a configuration file jalopy.xml that implements a sensible set of conventions, when used as an eclipse plugin.

    Checkstyle: A style checker program for Java

    Checkstyle enforces the style including the number of lines in a program. The standard shipped with the checkstyle is too inflexible for everyday use. Here is a checkstyle configuration which can be imported into your eclipse configuration: aalayance-checkstyleconfig.xml.

    Sample files: .cpp, .h, and .java

    sample.h file:

    // $Id$
    // Defines all the conversion constants for MKS to FPS system.

    sample.cpp file:

    // $Id$
    // Implements the conversion routines from MKS to FPS and vice versa.
    #include "sample.h"
    ...

    Sample.java file:

    // $Id$
    // Implements the class for communication protocols.
    package com.kanneganti.example;
    ...

    The end.


    Powered by Plone, the Open Source Content Management System

    This site conforms to the following standards: