Geekout: How to Make Short URL’s

(Yet another geeky post. Apologies to non-programmers, though you might enjoy making some short URL’s.)

Keeping URL’s short is becoming important. It began years ago with tinyurl.com, mostly because email clients were chopping URL’s that wouldn’t fit on one line. Text-messaging/SMS sites like Twitter have intensified the need for keeping things short.

Last week my friend Danny Newman asked me for help with creating maximally short URL’s for one of his projects. Danny wanted to use at most 5 characters. Given that there are 93 valid symbols for each slot, that is enough to encode about 3.2 billion unique addresses. (That’s 93^5.) Not bad!

I came up with code (below) that converts a number into the shortest possible representation in a URL path. (E.g. the number may be an ID from your database, or the output of a hash function.) If you’re viewing this page on wanderingstan.com, you can try it out here: (See how big you have to make the number before the URL gets longer.)

 

And here is the code. This is in javascript, but it ports easy.

  function convertNumberToURLchars(N, padding) {      // Standard unique chars valid in a URL path    var chars = "0123456789"               + "abcdefghijklmnopqrstuvwxyz"              + "ABCDEFGHIJKLMNOPQRSTUVWXYZ"              + "$-_+*,|\^~`<#%/?@&";                  // These chars will not be recognized as part     // of URL by certain email clients (Outlook)     // if they are the last char in the URL.    chars += "=:{}()[]'>,.!" + '"';         var radix = chars.length;    var URLchars=""    var Q = Math.floor(Math.abs(N));    var R;        // Construct the unique character string    while (true) {      R = Q % radix;      newDigit = chars.charAt(R)      URLchars= newDigit + URLchars;      Q = (Q-R)/radix;       if (Q==0) break;    }        // Handle padding    for (var i=padding-URLchars.length; i>0; i--) {      URLchars = chars[0] += URLchars;    }        return (URLchars);  }  

Small point: As noted in the comments, some punctuation characters (14 in all) won’t be counted as part of the URL by certain email clients when they are the last character. The way to avoid this is simply not to use the last 14 numbers available for a given URL length. Danny wanted 5 characters, so then instead of having 6956883692 unique addresses, he’ll have to settle for 14 less with 6956883678. The former would give a code of http://example.org/""""", the latter gives http://example.org/""""&.

Hope you might find this useful someday.

(Thanks to linuxtopia for their radix code sample.)