System Design Interview: URL Shortener
Design a URL shortener service like tinyurl.com. It is one of the most commonly asked system design interview questions. There are numerous resources online. My design is by no means the best or most complete. But I think I can offer my thought process from a different angle. After all, there is no one right answer to a system design question. It’s all about focus, tradeoff, and preference. Let’s dive right in.
For any system design, as always, let’s clarify who are the users and what they are trying to accomplish with the system. Let’s suppose the user is a typical internet user who has a super long and crappy URL, and wants to store or share a much shorter and cleaner version of it. Now she needs a service (probably a website) through which she can create an association between the long URL and its shortened version. For simplicity, we’ll assume that besides registering the association, the only other function of this service is to return an HTTP Redirect to the original URL when receiving a Get request of the shortened one.
A Simplistic Design
OK, this sounds simple enough. Let’s map out the technicality behind that user journey. The service has a webpage that allows users to input the original URL in a text box, and a submit button, when clicked, sends the original URL via an HTTP Post to the service’s backend. Once the backend receives the original URL, it creates a hash of it. In the case of hash collision, the backend may choose to add a random nonce to regenerate a new hash. The backend may further sanitize the hash to form the final shortened URL. The backend stores the <shortened, original> tuple in the database. There should be an index on the shortened for efficient lookup. We don’t want to use the shortened directly as the primary table key as its length may change as we update the hashing/sanitization method. So it’s better not to tie ourselves to any unnecessary constraints. The backend may choose to store some metadata along the tuple, such as the hashing method, creation timestamp, etc.. Now when the backend receives a Get request, it looks up the corresponding original URL in the database and returns an HTTP redirect if one exists and a 404 if it not found. See an overview in figure 1.