Get Professional

How to turn bad code into good code

22 Sep , 2017  

Cleaning up bad codeIf you have a full-time coding gig, know what you’ll be doing most of the time?

Maintaining and re-purposing old code.

Writing new code is fun, but getting a few more miles out of existing code is better.

Why? Because:

  • The most risky part of development has been completed – the problem solving and solution design.
  • The code’s been tested (we hope!)
  • The code’s survived production! That’s the ultimate test.
  • Hours spent not re-writing code saves time and money.

These point gladden the hearts of project and delivery managers because they’re all about reducing risk, saving time and saving money.

But it means no fun for programmers.

So the best thing we can do is make our code super-easy to maintain, because chances are you’re going to be the one maintaining it!

That’s why today’s article is all about taking some bad code and turning it into good code.

Bad Code Example

This isn’t anyone’s production code – it’s code I wrote for a codewars.com challenge!

To make a bad version of the code, I reverted it to more or less the first version of my solution:

(function() {
  const b64Table = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
  const b64Pads = [
    {ascii: '', bits: ''},
    {ascii: '==', bits: '0000'},
    {ascii: '=', bits: '00'} ];

  String.prototype.toBase64 = function() {
    const cBytes = [];
    let bstr;
    for (let i = 0; i < this.length; i += 1) {
      bstr = (this.charCodeAt(i)).toString(2);
      cBytes.push(('0'.repeat(8) + bstr).substring(bstr.length));
    }

    const bStr = cBytes.join('');
    const b64Chunks = [];
    let block = '';
    for (let i = 0; i < bStr.length; i += 6) {
      block = bStr.substr(i,6);
      if (block.length !== 6) { block += b64Pads[block.length / 2].bits; }
      b64Chunks.push(b64Table.charAt(parseInt(block, 2)));
    }

    return b64Chunks.join('') + b64Pads[this.length % 3].ascii;
  };

})();

// Examples taken from https://en.wikipedia.org/wiki/Base64
assert.equal('any carnal pleasure.'.toBase64(), 'YW55IGNhcm5hbCBwbGVhc3VyZS4=');
assert.equal('any carnal pleasure'.toBase64(), 'YW55IGNhcm5hbCBwbGVhc3VyZQ==');
assert.equal('any carnal pleasur'.toBase64(), 'YW55IGNhcm5hbCBwbGVhc3Vy');

(Like the comment says, those examples come from Wikipedia. I’m not being weird.)

So, what is this code doing? It encodes a regular string in Base64, which is a way of representing any kind of data as regular, printable characters.

This makes it easy to share any kind of data as text (the example code only uses strings).

If you’re not familiar with Base64, it doesn’t matter for the purposes of today’s exercise. I’ll go over the algorithm for it briefly in a moment (you can read about it here if you’re curious).

But it’s kind of useful for our example code to be incomprehensible.

Because that’s the situation programmers are faced with every day when they’re maintaining code!

All the matters for today is that we know this code is correct, in that it produces the right results.

But we want to make it easier to understand and reuse. Why?

  • It makes the code faster to understand.
  • It’s easier to modify the code.
  • It’s easier to test the code.
  • This all saves us time and makes our job easier.

But how does bad code happen in the first place?

Remember how I said that I reverted my code to the first version of my solution?

Bad code is code that was left as soon as it ‘worked’.

Sometimes that happens because some programmers just don’t care about making it good.

More often, it’s because the programmer was so busy and under so much pressure that they move on to the next piece of code as soon as their current code works.

While this saves one programmer some time on that particular day, bad code can cost companies a lot of time and money in the long run.

The Nature of Good Code

Every programmer is going to have their own definition of ‘good code’; this guy wrote a whole book about it.

But here’s a few features of good code that most programmers will agree upon:

  • Names for variables and functions are informative.
  • Functions aren’t overly complex.
  • When numbers, characters and strings have a special purpose or meaning, they’re assigned to a named constant.
  • A function performs one task, and only one task.

That’s enough to be going on with.

Like I said, every programmer is going to have their own ‘good code’ checklist.

We’ll apply those principles to my bad code, and see if we can make it good.

But first, let’s see how the Base64 algorithm works.

Encoding in Base64 – Cliff Notes

Here’s how we do it:

Let’s say we want to encode “ABC” in Base64.

First thing we do is turn each character into a number using the standard utf8 encoding:

A: 65
B: 66
C: 67

Then we turn each of those numbers into a binary string:

65: 01000001
66: 01000010
67: 01000011

Then stick them all together into one big long binary string:

010000010100001001000011

Now, we chop that big string up into 6-digit chunks:

010000 010100 001001 000011

Then turn each of those 6-digit binary values into a decimal value:

010000: 16
010100: 20
001001: 9
000011: 3

So we have 16, 20, 9 and 3. They’re indexes. We have a set of Base64 characters that we can map those indexes to:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

That list starts at 0. So our index values map like this:

16: Q
20: U
9: J
3: D

So the Base64 representation of ‘ABC’ is ‘QUJD’.

Boy, algorithms always seem like so much trouble written out longhand.

Thanks goodness for computers!

Anyway, there’s still a couple details we need to address. Like, what if we’re encoding only one character, ‘A’?

Since the binary encoding of ‘A’ is 01000001 , we can’t split it up into chunks of 6 bits evenly. So we fill it out with zeros:

010000 01 + 0000 = 16 16 = QQ

Which gives us two 16s, and therefore two Q’s.

But that won’t decode properly, so we need to add ‘==’ to the end, and our Base64 string becomes:

QQ==

And likewise, the string AB needs to be padded with 00, and will have only one ‘=’ added to the end:

QUI=

So let’s start fixing up the code.

Bad code goes good

Here’s our bad code again (I’ll keep the asserts at the end so we can test it easily):

(function() {
  const b64Table = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
  const b64Pads = [
    {ascii: '', bits: ''},
    {ascii: '==', bits: '0000'},
    {ascii: '=', bits: '00'} ];

  String.prototype.toBase64 = function() {
    const cBytes = [];
    let bstr;
    for (let i = 0; i < this.length; i += 1) {
      bstr = (this.charCodeAt(i)).toString(2);
      cBytes.push(('0'.repeat(8) + bstr).substring(bstr.length));
    }

    const bStr = cBytes.join('');
    const b64Chunks = [];
    let block = '';
    for (let i = 0; i < bStr.length; i += 6) {
      block = bStr.substr(i,6);
      if (block.length !== 6) { block += b64Pads[block.length / 2].bits; }
      b64Chunks.push(b64Table.charAt(parseInt(block, 2)));
    }

    return b64Chunks.join('') + b64Pads[this.length % 3].ascii;
  };

})();

// Examples taken from https://en.wikipedia.org/wiki/Base64
assert.equal('any carnal pleasure.'.toBase64(), 'YW55IGNhcm5hbCBwbGVhc3VyZS4=');
assert.equal('any carnal pleasure'.toBase64(), 'YW55IGNhcm5hbCBwbGVhc3VyZQ==');
assert.equal('any carnal pleasur'.toBase64(), 'YW55IGNhcm5hbCBwbGVhc3Vy');

Step 1

Let take a look at all our initial constants, and see what we think of their names.

I’m happy with the first two constants, b64Table  and b64Pads . As a programmer I have a reasonable idea of what they’re for.

Since they’re not global constants, I’ll leave them in camelCase and avoid renaming them to B64_TABLE and B64_PADS. So ugly!

Step 2

Next, inside the toBase64()  function, we have a constant called cBytes . What’s cBytes ? This isn’t obvious to me at all.

Immediately after it is bstr , which isn’t very obvious either. Let’s see how they’re used to to work out a better name for them:

    const cBytes = [];
    let bstr;
    for (let i = 0; i < this.length; i += 1) {
      bstr = (this.charCodeAt(i)).toString(2);
      cBytes.push(('0'.repeat(8) + bstr).substring(bstr.length));
    }

So this is a classic case of looping through every element in a list, except in this case the list is a string and we’re examining every character in the string.

The first thing to happen in the loop is to execute bstr = (this.charCodeAt(i)).toString(2); which retrieves the character code at position i in the string and converts it to a binary string with toString(2). So if our string is ‘ABC’ and i is 0 (zero), then bstr becomes 1000001.

After that we use .push() to add something to the cBytes array. But what?

Let’s do some substitution:

// Assuming bstr is '1000001' (which is 65, or 'A')
('0'.repeat(8) + bstr).substring(bstr.length)

// Becomes
('00000000' + '1000001').substring(bstr.length);

// Becomes
('000000001000001').substring(7);

// Becomes '01000001'

Which is to say, we zero-pad our binary string to make sure it’s eight digits long.

First, let’s introduce some sensible names:

  • bstr can become binaryString
  • cBytes will be binaryStrings

Next, I’ll abstract some code into funcctions:

  • The code for creating a binary string
  • The code that creates a zero-padded value.

Not because I think we’ll use that code again (although it seems likely), but because it’ll make the logic easier to understand.

The first improvement:

const toBinStr = function(n) {
  if (typeof n !== 'number') {
    throw new TypeError(`Expected parameter type number, received ${typeof n}`);
  }
  return (n).toString(2);
}

const zPad = function (str, padLen){
  return ('0'.repeat(padLen) + str).substring(str.length);
}

String.prototype.toBase640 = function() {
const binaryStrings  = [];
let binaryString;
for (let i = 0; i < this.length; i += 1) {
  binaryString = toBinStr(this.charCodeAt(i));
  binaryStrings.push(zPad(binaryString, 8));
}

// [cut for brevity]

Which is ok, but it’d be better to get rid of the for loop altogether. We can do this by splitting the string into an array, and using the Array function map():

// [cut for brevity]

let binaryString;
const binaryStrings = this.split('').map( e => {
  binaryString = toBinStr(e.charCodeAt(0));
  return zPad(binaryString, 8);
});

// [cut for brevity]

Now, binaryString is looking redundant, so at the risk of losing legibility I’ll ditch that too:

// [cut for brevity]

const binaryStrings = this.split('').map( e => {
  return zPad(
    toBinStr(e.charCodeAt(0)),
    8
  );
});

// [cut for brevity]

We’re getting there now, but there’s one more thing that’s bugging me – that bare digit 8.

What’s up with that? Bare, unexplained numbers appearing in code are a smell and should always make programmers suspicious.

It seems, I don’t know, redundant to make a constant called BYTE_LEN and assign 8 to it, since when won’t a byte be eight bits long?

Still. Instead, I’ll make a new function called paddedByte, that calls zPad with the length set to 8. The digit 8 appearing in a function called paddedByte should explain itself.

Here’s our new function in action:

// [cut for brevity]

const paddedByte = function(bitStr) {
  return zPad(bitStr, 8);
}

String.prototype.toBase64 = function() {
  const binaryStrings = this.split('').map( e => {
    return paddedByte(toBinStr(e.charCodeAt(0)));
  });

// [cut for brevity]

Now I feel comfortable leaving that there, because when I read it I can see we’re mapping each character in our string to a byte and storing them in an array called binaryStrings.

Step 3

Here’s our code as it stands now:

(function() {
  const b64Table = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
  const b64Pads = [
    {ascii: '', bits: ''},
    {ascii: '==', bits: '0000'},
    {ascii: '=', bits: '00'}
  ];

  const toBinStr = function(n) {
    if (typeof n !== 'number') {
      throw new TypeError(`Expected parameter type number, received ${typeof n}`);
    }
    return (n).toString(2);
  }

  const zPad = function (str, padLen){
    return ('0'.repeat(padLen) + str).substring(str.length);
  }

  const paddedByte = function(bitStr) {
    return zPad(bitStr, 8);
  }

  String.prototype.toBase64 = function() {
    const binaryStrings = this.split('').map( e => {
      return paddedByte(toBinStr(e.charCodeAt(0)));
    });

    const bStr = binaryStrings .join('');
    const b64Chunks = [];
    let block = '';
    for (let i = 0; i < bStr.length; i += 6) {
      block = bStr.substr(i,6);
      if (block.length !== 6) { block += b64Pads[block.length / 2].bits; }
      b64Chunks.push(b64Table.charAt(parseInt(block, 2)));
    }
    return b64Chunks.join('') + b64Pads[this.length % 3].ascii;
  };
})();

// Examples taken from https://en.wikipedia.org/wiki/Base64
assert.equal('any carnal pleasure.'.toBase64(), 'YW55IGNhcm5hbCBwbGVhc3VyZS4=');
assert.equal('any carnal pleasure'.toBase64(), 'YW55IGNhcm5hbCBwbGVhc3VyZQ==');
assert.equal('any carnal pleasur'.toBase64(), 'YW55IGNhcm5hbCBwbGVhc3Vy');

Now we can concentrate on the last half of the toBase64() function:

    const bStr = binaryStrings .join('');
    const b64Chunks = [];
    let block = '';
    for (let i = 0; i < bStr.length; i += 6) {
      block = bStr.substr(i,6);
      if (block.length !== 6) { block += b64Pads[block.length / 2].bits; }
      b64Chunks.push(b64Table.charAt(parseInt(block, 2)));
    }
    return b64Chunks.join('') + b64Pads[this.length % 3].ascii;

Well, the first line is a howler.

Remember how our previous code had a variable called bstr, now here’s one called bStr. I don’t think I reverted my code properly, because that’s the kind of thing that makes me crazy.

Anyway.

This is where we’ll join all the binary strings together, chop that  big string into 6-bit sections, and use those as index values to look up Base64 codes.

At least we got rid of that other bstr variable, which frees up binaryString for use here.

b64Chunks isn’t well-named either. This is where we’ll store our Base64 values. And block is just a Base64 index!

Here’s an update to the code with some better names:

[ cut for brevity ]

    const binaryString = binaryStrings .join('');
    const b64Chars = [];
    let b64Idx = '';
    for (let i = 0; i < binaryString.length; i += 6) {
      b64Idx = binaryString.substr(i,6);
      if (b64Idx.length !== 6) { b64Idx += b64Pads[b64Idx.length / 2].bits; }
      b64Chars.push(b64Table.charAt(parseInt(b64Idx, 2)));
    }
    return b64Chars.join('') + b64Pads[this.length % 3].ascii;

[ cut for brevity ]

Slightly better. But the code’s still kind of hard to read.

Like with the last section of code, we’ve got a for loop going through a string. The difference this time is that it’s a binary string and we’re processing it 6 characters at a time, since we get a Base64 index from a 6-bit binary value.

What’s the for loop doing?

  • Assigning 6 characters of the binary string to b64Idx
  • Padding the index with zeros if it’s less than 6 characters long.
  • Then it assigns the character in our lookup table that corresponds to the index value in the b64Chars array.

Let’s turn those last two steps into functions, and I think we’ll be done.

That if statement can go into a function called applyB64Pad:

  const applyB64BitPad = function(binaryB64IdxStr) {
    if (binaryB64IdxStr.length !== b64IdxLen) {
      // Idx lengths can be 6, 4 or 2. Since the length is
      // not 6 at this point, we know that length / 2 will
      // correspond to the correct index in the b64Pads
      // array.
      binaryB64IdxStr += b64Pads[binaryB64IdxStr.length / 2].bits;
    }
    return binaryB64IdxStr;
  }

I’ve also taken the opportunity to introduce a new constant value b64IdxLen, which is 6.

Here’s the new function in use:

    for (let i = 0; i < binaryString.length; i += b64IdxLen) {
      b64Idx = binaryString.substr(i, b64IdxLen);
      b64Idx = applyB64BitPad(b64Idx);
      b64Chars.push(b64Table.charAt(parseInt(b64Idx, 2)));
    }

Now, we can take on looking up the Base64 value by it’s index:

  const base64Value = function(base64IdxStr) {
    return b64Table.charAt(parseInt(base64IdxStr, 2));
  }

And use it like this:

    for (let i = 0; i < binaryString.length; i += b64IdxLen) {
      b64Idx = binaryString.substr(i, b64IdxLen);
      b64Idx = applyB64BitPad(b64Idx);
      b64Chars.push(base64Value(b64Idx));
    }

The last line of code in our toBase64 function needs addressing too. Check it out:

return b64Chars.join('') + b64Pads[this.length % 3].ascii;

What’s self-explanatory about that? What’s it doing?

Well, if the length of our Base64 string is not divisible by 3, we need to add either ‘=’ or ‘==’ to it to make sure it is.

Why would we expect it to be divisible by 3 in the first place? Welp, it kind of makes sense. Our original mega-binary string was built from binary strings that were 8 bits long.

Then we chopped it up in to strings 6 bits long.

If it works out to being exactly divisible by 6, then it’s divisible by 3 too.

If not, then we’ll always be either one or two digits off. Anyway, that’s why we can use length % 3  to look up an array index.

Personally, I can’t think of a good name for that whole process. I can’t think of a good name for the constant 3 value either. So I’m just going to stick a comment on it:

// If length isn't divisible by 3, we need to pad by one or two.
// Look up correct pad value based on length % 3.

Finally, some ‘good’ code

(function() {
  const b64IdxLen = 6;
  const b64Table = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
  const b64Pads = [
    {ascii: '', bits: ''},
    {ascii: '==', bits: '0000'},
    {ascii: '=', bits: '00'}
  ];

  const toBinStr = function(n) {
    if (typeof n !== 'number') {
      throw new TypeError(`Expected parameter type number, received ${typeof n}`);
    }
    return (n).toString(2);
  }

  const zPad = function(str, padLen){
    return ('0'.repeat(padLen) + str).substring(str.length);
  }

  const paddedByte = function(bitStr) {
    return zPad(bitStr, 8);
  }

  const applyB64BitPad = function(binaryB64IdxStr) {
    if (binaryB64IdxStr.length !== b64IdxLen) {
      // Idx lengths can be 6, 4 or 2. Since the length is
      // not 6 at this point, we know that length / 2 will
      // correspond to the correct index in the b64Pads 
      // array.
      binaryB64IdxStr += b64Pads[binaryB64IdxStr.length / 2].bits;
    } 
    return binaryB64IdxStr;
  }

  const base64Value = function(base64IdxStr) {
    return b64Table.charAt(parseInt(base64IdxStr, 2));
  }

  String.prototype.toBase640 = function() {
    const binaryStrings = this.split('').map( e => {
      return paddedByte(toBinStr(e.charCodeAt(0)));
    });

    const binaryString = binaryStrings .join('');
    const b64Chars = [];
    let b64Idx = '';
    for (let i = 0; i < binaryString.length; i += b64IdxLen) {
      b64Idx = binaryString.substr(i, b64IdxLen);
      b64Idx = applyB64BitPad(b64Idx);
      b64Chars.push(base64Value(b64Idx));
    }

    // If length isn't divisible by 3, we need to pad by one or two.
    // Look up correct pad value based on length % 3
    return b64Chars.join('') + b64Pads[this.length % 3].ascii;
  };
})();

// Examples taken from https://en.wikipedia.org/wiki/Base64
assert.equal('any carnal pleasure.'.toBase640(), 'YW55IGNhcm5hbCBwbGVhc3VyZS4='); 
assert.equal('any carnal pleasure'.toBase640(), 'YW55IGNhcm5hbCBwbGVhc3VyZQ=='); 
assert.equal('any carnal pleasur'.toBase640(), 'YW55IGNhcm5hbCBwbGVhc3Vy'); 

I’ve hedged calling it ‘good’, because a lot of programmers will argue all up and down that this is bad:

  • It’s longer than the original version.
  • It’s inefficient (you can do the same thing in around 10 lines of code without all that concatenating and type casting).

In many ways, arguing the merits of a Base64 implementation doesn’t make much sense anyway, since it’ll be implemented by most programming environments, sometimes with pre-cached values.

But I’ll reiterate what I said at the beginning: this is code is more obvious. It’s easier and faster to understand, which makes it faster to maintain and reduces the risk of introducing errors.

For me, until time or space efficiency is proven to be a problem, obvious code always wins.

 

By


Leave a Reply

Your email address will not be published. Required fields are marked *