System Design Interview: URL Shortener

Design a URL shortening service like bit.ly or TinyURL. This is a classic system design interview question that tests understanding of scalability, distributed systems, and trade-offs.

Interview Format (45 minutes)

Time Allocation:

Requirements gathering: 5-8 minutes
High-level design: 10-15 minutes
Deep dive: 15-20 minutes
Scale and edge cases: 5-10 minutes

Step 1: Requirements Gathering (5-8 min)

A strong candidate will ask clarifying questions before jumping to solutions.

Functional Requirements

Good questions to ask:

What's the primary use case? (shortening URLs)
Do shortened URLs expire? (yes, configurable TTL)
Can users customize short URLs? (yes, but most are auto-generated)
Do we need analytics? (yes, click tracking)
Do we need user accounts? (optional, focus on core first)

Agreed requirements:

Given a long URL, return a short URL
Given a short URL, redirect to original long URL
URLs expire after configurable time (default: 1 year)
Track basic analytics (click count)
Custom short codes (optional)

Non-Functional Requirements

Good questions to ask:

Expected scale? (100M URLs, 10B redirects/month)
Read/write ratio? (read-heavy, 100:1)
Latency requirements? (redirects {"<"}100ms)
Availability requirements? (99.9% uptime)
Geographic distribution? (global)

Agreed requirements:

High availability (redirects critical)
Low latency ({"<"}100ms for redirects)
Scalable (10B redirects/month = ~3,800 req/sec average, 20K peak)
Durable (URLs shouldn't be lost)

Calculations

Storage:

100M URLs/year
500 bytes per URL record (original URL + metadata)
500 bytes × 100M = 50GB/year
Over 5 years = 250GB (very manageable)

Bandwidth:

10B redirects/month
Assume 200 byte response
10B × 200 bytes = 2TB/month = 66GB/day
Peak: 20K req/sec × 200 bytes = 4MB/sec

QPS:

Average: 10B/month = 3,800 req/sec
Peak (assume 2x): ~8,000 req/sec

Red flags if candidate:

Doesn't ask clarifying questions
Jumps straight to implementation details
Doesn't think about scale
Ignores read/write ratio

Step 2: High-Level Design (10-15 min)

API Design

POST /api/shorten
Request:
{
  "url": "https://example.com/very/long/url",
  "customCode": "my-link",  // optional
  "expiresAt": "2026-01-01"  // optional
}

Response:
{
  "shortUrl": "https://short.ly/abc123",
  "code": "abc123",
  "expiresAt": "2026-01-01T00:00:00Z"
}

GET /{code}
Response: 302 Redirect to original URL

Good candidate discusses:

RESTful API design
HTTP 302 (temporary) vs 301 (permanent) redirect
Error handling (expired, not found)

Core Components

┌─────────────┐
│   Client    │
└──────┬──────┘
       │
   ┌───▼────────┐
   │  Load      │
   │  Balancer  │
   └───┬────────┘
       │
   ┌───▼──────────────┐
   │  API Servers     │
   │  (Stateless)     │
   └───┬──────────────┘
       │
   ┌───▼───────────────┐
   │    Database       │
   │  (URL mappings)   │
   └───────────────────┘

Tables:

sql

CREATE TABLE urls (
    id BIGSERIAL PRIMARY KEY,
    short_code VARCHAR(10) UNIQUE NOT NULL,
    long_url TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,
    click_count INTEGER DEFAULT 0,
    user_id INTEGER  -- optional
);

CREATE INDEX idx_short_code ON urls(short_code);
CREATE INDEX idx_expires_at ON urls(expires_at) WHERE expires_at IS NOT NULL;

Step 3: Deep Dive (15-20 min)

Short Code Generation

Approach 1: Random + Collision Check

python

import random
import string

def generate_short_code(length=6):
    characters = string.ascii_letters + string.digits  # 62 characters
    while True:
        code = ''.join(random.choices(characters, k=length))

        # Check if exists
        if not db.exists(code):
            return code

Pros:

Simple to implement
Evenly distributed

Cons:

Requires database check (latency)
Collision rate increases over time
Not suitable for high-write scenarios

Math:

62^6 = ~57 billion possible codes
After 100M URLs (< 0.2% full), collision probability is low
But still requires DB check

Approach 2: Auto-incrementing ID + Base62 Encoding

python

BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def encode_base62(num):
    if num == 0:
        return BASE62[0]

    result = []
    while num > 0:
        result.append(BASE62[num % 62])
        num //= 62

    return ''.join(reversed(result))

def decode_base62(code):
    num = 0
    for char in code:
        num = num * 62 + BASE62.index(char)
    return num

# Example
id = 12345
code = encode_base62(id)  # "dnh"

Pros:

No collisions
No database lookup needed
Deterministic

Cons:

Sequential (predictable)
Exposes number of URLs
Doesn't work with custom codes

Approach 3: Hash Function (MD5/SHA256)

python

import hashlib

def generate_short_code(url):
    # Generate hash
    hash_digest = hashlib.md5(url.encode()).hexdigest()

    # Take first 6 characters
    code = hash_digest[:6]

    # Handle collisions
    if db.exists(code):
        # Add counter to URL and retry
        code = hash_digest[:6] + str(counter)

    return code

Pros:

Same URL always gets same code
Can deduplicate URLs

Cons:

Collisions possible
Still need database check
Longer codes to reduce collisions

Recommended Approach: Combination

python

def generate_short_code(url, custom_code=None):
    if custom_code:
        # Validate and use custom code
        if db.exists(custom_code):
            raise ValueError("Custom code already exists")
        return custom_code

    # Use auto-increment + base62 for generated codes
    next_id = get_next_id()  # from ID generator service
    return encode_base62(next_id)

Strong candidate discusses:

Trade-offs of each approach
Collision handling
Scalability implications
Custom code support

ID Generation at Scale

Problem: Single database auto-increment doesn't scale

Solution: Distributed ID Generation

Option 1: UUID

python

import uuid

id = uuid.uuid4()  # 128-bit

Pros: No coordination needed Cons: 128-bit is too long for short URLs

Option 2: Twitter Snowflake

64-bit ID structure:
[1 bit unused][41 bits timestamp][10 bits machine ID][12 bits sequence]

Allows:
- 2^41 milliseconds (~69 years)
- 2^10 machines (1024)
- 2^12 IDs per millisecond per machine (4096)

python

class SnowflakeIDGenerator:
    def __init__(self, machine_id):
        self.machine_id = machine_id
        self.sequence = 0
        self.last_timestamp = -1

    def generate_id(self):
        timestamp = current_timestamp_ms()

        if timestamp == self.last_timestamp:
            self.sequence = (self.sequence + 1) & 0xFFF
            if self.sequence == 0:
                # Wait for next millisecond
                timestamp = wait_next_millis(timestamp)
        else:
            self.sequence = 0

        self.last_timestamp = timestamp

        return ((timestamp << 22) |
                (self.machine_id << 12) |
                self.sequence)

Option 3: Pre-generated Range

python

# Coordination service assigns ranges
# Server 1: 1-100,000
# Server 2: 100,001-200,000

class RangeIDGenerator:
    def __init__(self):
        self.current = get_range_from_coordinator()  # 1-100,000
        self.end = self.current + 100000

    def generate_id(self):
        if self.current >= self.end:
            # Get new range
            self.current = get_range_from_coordinator()
            self.end = self.current + 100000

        self.current += 1
        return self.current

Best for URL shortener: Snowflake or Range-based

Caching Strategy

Read-heavy workload (100:1 ratio) → Perfect for caching

Cache hit ratio target: >99%

Implementation:

python

def redirect(short_code):
    # Check cache first
    long_url = cache.get(f"url:{short_code}")

    if long_url is None:
        # Cache miss: fetch from database
        url_record = db.get_by_code(short_code)

        if url_record is None or url_record.expired():
            return 404

        long_url = url_record.long_url

        # Cache for 24 hours
        cache.set(f"url:{short_code}", long_url, ttl=86400)

    # Async analytics update
    queue.publish("click", {
        "code": short_code,
        "timestamp": now()
    })

    return redirect_to(long_url)

Cache invalidation:

python

# When URL expires or is deleted
def delete_url(short_code):
    db.delete(short_code)
    cache.delete(f"url:{short_code}")

Strong candidate discusses:

Cache-aside pattern
TTL strategy
Invalidation approach
Cache warming for hot URLs

Analytics

Challenge: Updating click_count on every request kills performance

Solution 1: Async Updates

python

# Write to queue instead of DB
def track_click(short_code):
    queue.publish("analytics", {
        "code": short_code,
        "timestamp": now(),
        "user_agent": request.user_agent,
        "ip": request.ip
    })

# Consumer batches updates
def analytics_consumer():
    while True:
        messages = queue.consume(batch_size=1000)

        # Batch update
        updates = Counter(msg["code"] for msg in messages)
        db.batch_update_click_counts(updates)

Solution 2: Write to Time-Series DB

python

# Use InfluxDB or TimescaleDB
def track_click(short_code):
    influxdb.write_point(
        measurement="clicks",
        tags={"code": short_code},
        fields={"count": 1},
        timestamp=now()
    )

# Query for analytics
def get_clicks(short_code, start_date, end_date):
    return influxdb.query(f"""
        SELECT SUM(count)
        FROM clicks
        WHERE code = '{short_code}'
          AND time >= '{start_date}'
          AND time <= '{end_date}'
    """)

Step 4: Scale and Edge Cases (5-10 min)

Scaling to Billions

Updated Architecture:

┌────────────────────────────────────────────────┐
│              CDN / Cloudflare                  │
└────────────────────┬───────────────────────────┘
                     │
┌────────────────────▼───────────────────────────┐
│           Load Balancer (Global)               │
└────────────────────┬───────────────────────────┘
                     │
        ┌────────────┼────────────┐
        │            │            │
   ┌────▼────┐  ┌───▼─────┐  ┌──▼──────┐
   │  API    │  │  API    │  │  API    │
   │Server 1 │  │Server 2 │  │Server 3 │
   └────┬────┘  └───┬─────┘  └──┬──────┘
        │           │           │
        └────────┬──┴───────────┘
                 │
        ┌────────▼──────────┐
        │   Redis Cluster   │
        │   (Cache Layer)   │
        └────────┬──────────┘
                 │
        ┌────────▼──────────┐
        │  PostgreSQL       │
        │  (Sharded)        │
        │  - Shard 1: a-m   │
        │  - Shard 2: n-z   │
        └───────────────────┘

Database Sharding:

python

def get_shard(short_code):
    # Hash-based sharding
    shard_count = 16
    shard_id = hash(short_code) % shard_count
    return f"shard_{shard_id}"

def get_url(short_code):
    shard = get_shard(short_code)
    return db_connections[shard].get(short_code)

Rate Limiting

python

from redis import Redis

def check_rate_limit(ip_address):
    key = f"ratelimit:{ip_address}:{current_minute()}"
    count = redis.incr(key)

    if count == 1:
        redis.expire(key, 60)

    if count > 100:  # 100 requests per minute
        raise RateLimitExceeded()

Security Considerations

1. Malicious URLs

python

BLACKLIST_DOMAINS = ["malware.com", "phishing.com"]

def validate_url(url):
    domain = extract_domain(url)
    if domain in BLACKLIST_DOMAINS:
        raise ValueError("URL not allowed")

    # Check against safe browsing API
    if not safe_browsing_check(url):
        raise ValueError("URL flagged as unsafe")

2. Abuse Prevention

python

# Limit URLs per IP
def check_creation_limit(ip_address):
    key = f"create:{ip_address}:{current_day()}"
    count = redis.incr(key)

    if count == 1:
        redis.expire(key, 86400)

    if count > 10:  # 10 URLs per day per IP
        raise TooManyURLs()

3. DDoS Protection

Rate limiting
CAPTCHA for suspicious traffic
WAF (Web Application Firewall)

Monitoring and Alerts

python

# Metrics to track
metrics = {
    "api.latency.p95": target_ms < 100,
    "api.error_rate": target < 0.1%,
    "cache.hit_rate": target > 99%,
    "db.query_time.p95": target < 50ms,
    "redirect.success_rate": target > 99.9%
}

# Alerts
if cache_hit_rate < 95%:
    alert("Cache hit rate dropped")

if p95_latency > 200ms:
    alert("Latency degraded")

Evaluation Rubric

Strong Performance (Hire)

✅ Asks clarifying questions
✅ Considers scale from the start
✅ Discusses multiple approaches with trade-offs
✅ Designs scalable solution
✅ Handles edge cases
✅ Thinks about operations (monitoring, debugging)
✅ Clear communication and diagrams

Adequate Performance (Maybe)

✅ Functional design
✅ Some scaling considerations
⚠️ Limited exploration of alternatives
⚠️ Misses some edge cases
✅ Can be guided to better solutions

Weak Performance (No Hire)

❌ Jumps to solution without requirements
❌ Doesn't consider scale
❌ Can't explain trade-offs
❌ Monolithic thinking
❌ Poor communication

Follow-up Questions

For senior candidates:

How would you handle 1 trillion URLs?
Design the analytics system in detail
How would you migrate this system with zero downtime?
How would you handle geographic distribution?

For staff+ candidates:

Design the organization structure for this service
How would you ensure 99.99% uptime?
Design the disaster recovery plan
How would you handle compliance (GDPR, data retention)?

This question tests foundational distributed systems concepts: caching, sharding, ID generation, and trade-offs. A strong candidate will navigate these topics methodically while engaging in technical discussion.

System Design Interview: URL Shortener