(Yet another geeky post. Apologies to non-programmers, though you might enjoy making some short URL’s.)
Keeping URL’s short is becoming important. It began years ago with tinyurl.com, mostly because email clients were chopping URL’s that wouldn’t fit on one line. Text-messaging/SMS sites like Twitter have intensified the need for keeping things short.
Last week my friend Danny Newman asked me for help with creating maximally short URL’s for one of his projects. Danny wanted to use at most 5 characters. Given that there are 93 valid symbols for each slot, that is enough to encode about 3.2 billion unique addresses. (That’s 93^5.) Not bad!
I came up with code (below) that converts a number into the shortest possible representation in a URL path. (E.g. the number may be an ID from your database, or the output of a hash function.) If you’re viewing this page on wanderingstan.com, you can try it out here: (See how big you have to make the number before the URL gets longer.)
And here is the code. This is in javascript, but it ports easy.
function convertNumberToURLchars(N, padding) { // Standard unique chars valid in a URL path var chars = "0123456789" + "abcdefghijklmnopqrstuvwxyz" + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + "$-_+*,|\^~`<#%/?@&"; // These chars will not be recognized as part // of URL by certain email clients (Outlook) // if they are the last char in the URL. chars += "=:{}()[]'>,.!" + '"'; var radix = chars.length; var URLchars="" var Q = Math.floor(Math.abs(N)); var R; // Construct the unique character string while (true) { R = Q % radix; newDigit = chars.charAt(R) URLchars= newDigit + URLchars; Q = (Q-R)/radix; if (Q==0) break; } // Handle padding for (var i=padding-URLchars.length; i>0; i--) { URLchars = chars[0] += URLchars; } return (URLchars); }
Small point: As noted in the comments, some punctuation characters (14 in all) won’t be counted as part of the URL by certain email clients when they are the last character. The way to avoid this is simply not to use the last 14 numbers available for a given URL length. Danny wanted 5 characters, so then instead of having 6956883692 unique addresses, he’ll have to settle for 14 less with 6956883678. The former would give a code of http://example.org/""""", the latter gives http://example.org/""""&.
Hope you might find this useful someday.
(Thanks to linuxtopia for their radix code sample.)
You rock!
Thanks again!
Stan – you read my mind. I was going to approach this problem this week. This is cool.
Thank you for this article. Users definately will benefit with this compilation of website and I will definately make good use of it.
Just give a try to http://www.stopurl.com
Nice.. you can also try http://tubeurl.com
Here’s another approach that allows you to preserve some of the SEO juice. It doesn’t shorten the urls per se, just makes your web server more tolerant of long ones that are broken. Wouldn’t be too hard to adapt.
http://mttips.com/tolerate_broken_urls.html
Very cool. Can you post the code for converting the URL back to a number?
Oops! Should have included that in the original post. Here you go:
I tested this code, but i found if you do a loop from say 1m to 2m you will get clashes, which makes it kind of unsuable in its current form. shame!
nice post but need algorithm optimization
Bojan Babic