Geekout: How to Make Short URL’s

(Yet another geeky post. Apologies to non-programmers, though you might enjoy making some short URL’s.)

Keeping URL’s short is becoming important. It began years ago with tinyurl.com, mostly because email clients were chopping URL’s that wouldn’t fit on one line. Text-messaging/SMS sites like Twitter have intensified the need for keeping things short.

Last week my friend Danny Newman asked me for help with creating maximally short URL’s for one of his projects. Danny wanted to use at most 5 characters. Given that there are 93 valid symbols for each slot, that is enough to encode about 3.2 billion unique addresses. (That’s 93^5.) Not bad!

I came up with code (below) that converts a number into the shortest possible representation in a URL path. (E.g. the number may be an ID from your database, or the output of a hash function.) If you’re viewing this page on wanderingstan.com, you can try it out here: (See how big you have to make the number before the URL gets longer.)

 

And here is the code. This is in javascript, but it ports easy.

  function convertNumberToURLchars(N, padding) {      // Standard unique chars valid in a URL path    var chars = "0123456789"               + "abcdefghijklmnopqrstuvwxyz"              + "ABCDEFGHIJKLMNOPQRSTUVWXYZ"              + "$-_+*,|\^~`<#%/?@&";                  // These chars will not be recognized as part     // of URL by certain email clients (Outlook)     // if they are the last char in the URL.    chars += "=:{}()[]'>,.!" + '"';         var radix = chars.length;    var URLchars=""    var Q = Math.floor(Math.abs(N));    var R;        // Construct the unique character string    while (true) {      R = Q % radix;      newDigit = chars.charAt(R)      URLchars= newDigit + URLchars;      Q = (Q-R)/radix;       if (Q==0) break;    }        // Handle padding    for (var i=padding-URLchars.length; i>0; i--) {      URLchars = chars[0] += URLchars;    }        return (URLchars);  }  

Small point: As noted in the comments, some punctuation characters (14 in all) won’t be counted as part of the URL by certain email clients when they are the last character. The way to avoid this is simply not to use the last 14 numbers available for a given URL length. Danny wanted 5 characters, so then instead of having 6956883692 unique addresses, he’ll have to settle for 14 less with 6956883678. The former would give a code of http://example.org/""""", the latter gives http://example.org/""""&.

Hope you might find this useful someday.

(Thanks to linuxtopia for their radix code sample.)

Follow me on twitter.

10 Responses to “Geekout: How to Make Short URL’s”

  1. Danny Newman Says:

    You rock!

    Thanks again!

  2. Rob Says:

    Stan – you read my mind. I was going to approach this problem this week. This is cool.

  3. Mark Says:

    Thank you for this article. Users definately will benefit with this compilation of website and I will definately make good use of it.

  4. Anonymous Says:

    Just give a try to http://www.stopurl.com

  5. short url Says:

    Nice.. you can also try http://tubeurl.com

  6. Derek Scruggs Says:

    Here’s another approach that allows you to preserve some of the SEO juice. It doesn’t shorten the urls per se, just makes your web server more tolerant of long ones that are broken. Wouldn’t be too hard to adapt.

    http://mttips.com/tolerate_broken_urls.html

  7. Tom Says:

    Very cool. Can you post the code for converting the URL back to a number?

  8. stan Says:

    Oops! Should have included that in the original post. Here you go:

    function convertURLcharsToNumber(URLstring) {
    
      // Standard unique chars valid in a URL path
      var chars = "0123456789" 
                + "abcdefghijklmnopqrstuvwxyz"
                + "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                + "$-_+*,|\^~`< #%/?@&";
                
      // These chars will not be recognized as part 
      // of URL by certain email clients (Outlook) 
      // if they are the last char in the URL.
      chars += "=:{}()[]'>,.!" + '"'; 
      
      var radix = chars.length;
      var number = 0;
      var slot = 0;
      
      while (URLstring.length > 0) {
        char = URLstring.slice(-1);
        number = number + (chars.indexOf(char) * Math.pow(radix,slot)) ;
        slot++;
        URLstring = URLstring.slice(0,-1);
      }
      
      return(number);
    }
    
    // Sample: This returns the original number, 399398203 
    alert(convertURLcharsToNumber("5vOOv"));
    
  9. Anonymous Says:

    I tested this code, but i found if you do a loop from say 1m to 2m you will get clashes, which makes it kind of unsuable in its current form. shame!

  10. Bojan Babic Says:

    nice post but need algorithm optimization

    Bojan Babic