Introduction to System Design
System design is the process of defining the architecture, components, and interfaces of a large-scale software system. In tech interviews, system design questions assess your ability to build scalable, reliable systems beyond just writing code.
Learning system design will make you a better engineer by teaching you how to think in terms of architecture and trade-offs when building software. This guide covers fundamental concepts and key components used to design modern, scalable systems.
Remember: In system design, "everything is a trade-off" – there is no one-size-fits-all solution. Part of your expertise will be reasoning about those trade-offs.
How to Approach System Design Questions
When faced with a system design interview question, follow a structured approach:
Step-by-Step Breakdown
-
Clarify Requirements and Scope: Outline use cases, constraints, and assumptions. Ask about target users, read/write ratios, and data volume.
-
High-Level Design: Sketch a high-level architecture identifying major components (clients, servers, databases) and data flow.
-
Design Core Components: Drill down into key components. Provide details like data models and specific algorithms.
-
Address Scalability and Reliability: Identify bottlenecks and discuss solutions like load balancers, caching, and database replication.
Fundamental Concepts and Trade-offs
Performance vs. Scalability
- Performance: How fast a system is for a single user
- Scalability: Maintaining performance under increasing load
A system is scalable if adding more resources yields proportional improvement in performance.
Latency vs. Throughput
- Latency: Time to respond to a single request (milliseconds)
- Throughput: Number of operations per unit time
Goal: Maximize throughput while keeping latency within acceptable bounds.
CAP Theorem
In distributed systems, you must choose between:
- CP Systems: Prefer consistency over availability (return errors rather than stale data)
- AP Systems: Prefer availability (serve responses even if potentially stale)
Consistency Models
| Model | Description | Example |
|---|---|---|
| Strong | All reads see latest write immediately | Traditional RDBMS |
| Eventual | Reads may be stale, but converge over time | DNS, NoSQL DBs |
| Weak | No guarantees on read freshness | VoIP (dropped audio) |
High Availability Patterns
Fail-over Types:
- Active-Passive: Standby takes over when primary fails
- Active-Active: Both nodes serve traffic; workload shifts on failure
Networking Basics
Domain Name System (DNS)
DNS translates human-friendly domain names into IP addresses:
Key DNS Record Types:
| Record | Purpose | Example |
|---|---|---|
| A | Maps name to IPv4 | example.com → 93.184.216.34 |
| CNAME | Alias to another name | api.example.com → server.example.com |
| NS | Authoritative name servers | Delegates DNS to provider |
| MX | Mail server for domain | Handles email routing |
Content Delivery Network (CDN)
A CDN is a globally distributed network that caches and serves content from locations near users.
CDN Modes:
- Pull CDN: Fetches content from origin on first request, then caches
- Push CDN: You upload content in advance to CDN servers
Benefits:
- Lower latency (content served from nearby servers)
- Reduced origin server load
- Better handling of traffic spikes
Load Balancing
A load balancer distributes traffic across multiple servers:
Load Balancing Algorithms
| Algorithm | Description |
|---|---|
| Round Robin | Cycle through servers in order |
| Least Connections | Send to server with fewest active connections |
| IP Hash | Route based on client IP (sticky sessions) |
| Weighted | Distribute based on server capacity |
Layer 4 vs Layer 7
- Layer 4 (Transport): Routes based on IP/port; faster but less flexible
- Layer 7 (Application): Routes based on URL/headers; more intelligent routing
Benefits
- Prevents overload on single servers
- Removes unhealthy servers from rotation
- Enables horizontal scaling
- Can handle SSL termination
Reverse Proxy
A reverse proxy sits between clients and internal servers:
Benefits:
- Single public endpoint (hides internal servers)
- SSL termination (offloads encryption)
- Caching (serves common responses directly)
- Security (can implement rate limiting, IP blocking)
- Compression
Application Layer and Microservices
Separating Web and Application Layers
Benefits of separation:
- Each layer scales independently
- Single responsibility per component
- Easier to maintain and deploy
Microservices Architecture
Service Discovery: Systems like Consul, Etcd, or Zookeeper track service instances and their locations.
Databases and Storage
Relational Databases (RDBMS)
ACID Properties:
| Property | Description |
|---|---|
| Atomicity | Transactions are all-or-nothing |
| Consistency | Data stays valid after transactions |
| Isolation | Concurrent transactions don't interfere |
| Durability | Committed data persists permanently |
Scaling Strategies
Master-Slave Replication
- Master handles all writes
- Slaves handle read traffic
- Increases read throughput
Sharding (Horizontal Partitioning)
- Split data across multiple databases
- Each shard handles subset of data
- Allows horizontal scaling
NoSQL Databases
| Type | Description | Examples | Use Cases |
|---|---|---|---|
| Key-Value | Simple hash table | Redis, Memcached | Caching, sessions |
| Document | JSON documents with queries | MongoDB, CouchDB | User profiles, catalogs |
| Wide-Column | Sparse matrix of rows/columns | Cassandra, HBase | Time-series, logs |
| Graph | Nodes and edges | Neo4j, Neptune | Social networks |
SQL vs NoSQL
Choose SQL when:
- Structured, relational data
- Complex queries and joins
- ACID transactions required
Choose NoSQL when:
- Flexible/evolving schema
- High write throughput
- Horizontal scaling needed
- Simple access patterns
Caching
Caching stores frequently accessed data in fast storage (usually memory):
Caching Strategies
Cache-Aside (Lazy Loading)
- Check cache first
- On miss, query database
- Store result in cache
Write-Through
- Write to cache
- Cache writes to database (synchronously)
- Cache always up-to-date
Write-Behind
- Write to cache only
- Cache writes to database asynchronously
- Faster but risks data loss
Cache Eviction Policies
- LRU (Least Recently Used): Evict oldest unused items
- LFU (Least Frequently Used): Evict rarely accessed items
- TTL (Time to Live): Expire after set time
Asynchronous Processing
Message Queues
Benefits:
- Decouples components
- Smooths out traffic spikes
- Improves reliability (jobs can retry)
- Reduces front-end latency
Popular Technologies:
- RabbitMQ
- Apache Kafka
- Amazon SQS
- Redis (simple queues)
Communication Protocols
TCP vs UDP
| Feature | TCP | UDP |
|---|---|---|
| Connection | Connection-oriented | Connectionless |
| Reliability | Guaranteed delivery | Best effort |
| Order | Ordered | May arrive out of order |
| Use Case | Web, databases | Video, gaming, DNS |
REST vs RPC
REST (Representational State Transfer):
- Resource-centric (URLs represent resources)
- Uses HTTP methods (GET, POST, PUT, DELETE)
- Stateless
- Human-readable (JSON)
GET /users/123
POST /users
PUT /users/123
DELETE /users/123
RPC (Remote Procedure Call):
- Action-centric (method calls)
- Often binary protocols (gRPC, Thrift)
- More efficient but tighter coupling
userService.GetUser(123)
userService.CreateUser(data)
Putting It All Together
A complete large-scale architecture:
Request Flow Example
- User requests news feed
- DNS resolves to CDN/LB
- Load balancer routes to web server
- Web server calls Read API
- Read API checks cache (Redis)
- Cache miss → query database
- Store result in cache
- Return response to user
Security Considerations
- Encryption in transit: Always use HTTPS (TLS)
- Encryption at rest: Encrypt sensitive database fields
- Input validation: Prevent SQL injection, XSS
- Authentication: Use secure tokens, OAuth
- Authorization: Principle of least privilege
- Rate limiting: Prevent abuse and DDoS
- Secrets management: Use vaults, not hardcoded keys
Conclusion
System design is about making informed trade-offs. When facing a design problem:
- Start with requirements - understand scale and constraints
- Sketch high-level architecture - identify major components
- Drill into critical components - detail the challenging parts
- Consider scaling and reliability - add redundancy and caching
- Discuss trade-offs - explain why you made each choice
The best engineers don't have perfect solutions – they have clear reasoning for their decisions.
This guide is based on concepts from the System Design Primer and industry best practices. For more in-depth exploration, continue practicing with real system design questions.
