System Design: A Beginner-Friendly Primer

Introduction to System Design

System design is the process of defining the architecture, components, and interfaces of a large-scale software system. In tech interviews, system design questions assess your ability to build scalable, reliable systems beyond just writing code.

Learning system design will make you a better engineer by teaching you how to think in terms of architecture and trade-offs when building software. This guide covers fundamental concepts and key components used to design modern, scalable systems.

Remember: In system design, "everything is a trade-off" – there is no one-size-fits-all solution. Part of your expertise will be reasoning about those trade-offs.

How to Approach System Design Questions

When faced with a system design interview question, follow a structured approach:

Step-by-Step Breakdown

Clarify Requirements and Scope: Outline use cases, constraints, and assumptions. Ask about target users, read/write ratios, and data volume.
High-Level Design: Sketch a high-level architecture identifying major components (clients, servers, databases) and data flow.
Design Core Components: Drill down into key components. Provide details like data models and specific algorithms.
Address Scalability and Reliability: Identify bottlenecks and discuss solutions like load balancers, caching, and database replication.

Fundamental Concepts and Trade-offs

Performance vs. Scalability

Performance: How fast a system is for a single user
Scalability: Maintaining performance under increasing load

A system is scalable if adding more resources yields proportional improvement in performance.

Latency vs. Throughput

Latency: Time to respond to a single request (milliseconds)
Throughput: Number of operations per unit time

Goal: Maximize throughput while keeping latency within acceptable bounds.

CAP Theorem

In distributed systems, you must choose between:

CP Systems: Prefer consistency over availability (return errors rather than stale data)
AP Systems: Prefer availability (serve responses even if potentially stale)

Consistency Models

Model	Description	Example
Strong	All reads see latest write immediately	Traditional RDBMS
Eventual	Reads may be stale, but converge over time	DNS, NoSQL DBs
Weak	No guarantees on read freshness	VoIP (dropped audio)

High Availability Patterns

Fail-over Types:

Active-Passive: Standby takes over when primary fails
Active-Active: Both nodes serve traffic; workload shifts on failure

Networking Basics

Domain Name System (DNS)

DNS translates human-friendly domain names into IP addresses:

Key DNS Record Types:

Record	Purpose	Example
A	Maps name to IPv4	example.com → 93.184.216.34
CNAME	Alias to another name	api.example.com → server.example.com
NS	Authoritative name servers	Delegates DNS to provider
MX	Mail server for domain	Handles email routing

Content Delivery Network (CDN)

A CDN is a globally distributed network that caches and serves content from locations near users.

CDN Modes:

Pull CDN: Fetches content from origin on first request, then caches
Push CDN: You upload content in advance to CDN servers

Benefits:

Lower latency (content served from nearby servers)
Reduced origin server load
Better handling of traffic spikes

Load Balancing

A load balancer distributes traffic across multiple servers:

Load Balancing Algorithms

Algorithm	Description
Round Robin	Cycle through servers in order
Least Connections	Send to server with fewest active connections
IP Hash	Route based on client IP (sticky sessions)
Weighted	Distribute based on server capacity

Layer 4 vs Layer 7

Layer 4 (Transport): Routes based on IP/port; faster but less flexible
Layer 7 (Application): Routes based on URL/headers; more intelligent routing

Benefits

Prevents overload on single servers
Removes unhealthy servers from rotation
Enables horizontal scaling
Can handle SSL termination

Reverse Proxy

A reverse proxy sits between clients and internal servers:

Benefits:

Single public endpoint (hides internal servers)
SSL termination (offloads encryption)
Caching (serves common responses directly)
Security (can implement rate limiting, IP blocking)
Compression

Application Layer and Microservices

Separating Web and Application Layers

Benefits of separation:

Each layer scales independently
Single responsibility per component
Easier to maintain and deploy

Microservices Architecture

Service Discovery: Systems like Consul, Etcd, or Zookeeper track service instances and their locations.

Databases and Storage

Relational Databases (RDBMS)

ACID Properties:

Property	Description
Atomicity	Transactions are all-or-nothing
Consistency	Data stays valid after transactions
Isolation	Concurrent transactions don't interfere
Durability	Committed data persists permanently

Scaling Strategies

Master-Slave Replication

Master handles all writes
Slaves handle read traffic
Increases read throughput

Sharding (Horizontal Partitioning)

Split data across multiple databases
Each shard handles subset of data
Allows horizontal scaling

NoSQL Databases

Type	Description	Examples	Use Cases
Key-Value	Simple hash table	Redis, Memcached	Caching, sessions
Document	JSON documents with queries	MongoDB, CouchDB	User profiles, catalogs
Wide-Column	Sparse matrix of rows/columns	Cassandra, HBase	Time-series, logs
Graph	Nodes and edges	Neo4j, Neptune	Social networks

SQL vs NoSQL

Choose SQL when:

Structured, relational data
Complex queries and joins
ACID transactions required

Choose NoSQL when:

Flexible/evolving schema
High write throughput
Horizontal scaling needed
Simple access patterns

Caching

Caching stores frequently accessed data in fast storage (usually memory):

Caching Strategies

Cache-Aside (Lazy Loading)

Check cache first
On miss, query database
Store result in cache

Write-Through

Write to cache
Cache writes to database (synchronously)
Cache always up-to-date

Write-Behind

Write to cache only
Cache writes to database asynchronously
Faster but risks data loss

Cache Eviction Policies

LRU (Least Recently Used): Evict oldest unused items
LFU (Least Frequently Used): Evict rarely accessed items
TTL (Time to Live): Expire after set time

Asynchronous Processing

Message Queues

Benefits:

Decouples components
Smooths out traffic spikes
Improves reliability (jobs can retry)
Reduces front-end latency

Popular Technologies:

RabbitMQ
Apache Kafka
Amazon SQS
Redis (simple queues)

Communication Protocols

TCP vs UDP

Feature	TCP	UDP
Connection	Connection-oriented	Connectionless
Reliability	Guaranteed delivery	Best effort
Order	Ordered	May arrive out of order
Use Case	Web, databases	Video, gaming, DNS

REST vs RPC

REST (Representational State Transfer):

Resource-centric (URLs represent resources)
Uses HTTP methods (GET, POST, PUT, DELETE)
Stateless
Human-readable (JSON)

GET /users/123
POST /users
PUT /users/123
DELETE /users/123

RPC (Remote Procedure Call):

Action-centric (method calls)
Often binary protocols (gRPC, Thrift)
More efficient but tighter coupling

userService.GetUser(123)
userService.CreateUser(data)

Putting It All Together

A complete large-scale architecture:

Request Flow Example

User requests news feed
DNS resolves to CDN/LB
Load balancer routes to web server
Web server calls Read API
Read API checks cache (Redis)
Cache miss → query database
Store result in cache
Return response to user

Security Considerations

Encryption in transit: Always use HTTPS (TLS)
Encryption at rest: Encrypt sensitive database fields
Input validation: Prevent SQL injection, XSS
Authentication: Use secure tokens, OAuth
Authorization: Principle of least privilege
Rate limiting: Prevent abuse and DDoS
Secrets management: Use vaults, not hardcoded keys

Conclusion

System design is about making informed trade-offs. When facing a design problem:

Start with requirements - understand scale and constraints
Sketch high-level architecture - identify major components
Drill into critical components - detail the challenging parts
Consider scaling and reliability - add redundancy and caching
Discuss trade-offs - explain why you made each choice

The best engineers don't have perfect solutions – they have clear reasoning for their decisions.

This guide is based on concepts from the System Design Primer and industry best practices. For more in-depth exploration, continue practicing with real system design questions.

Introduction to System Design

How to Approach System Design Questions

Step-by-Step Breakdown

Fundamental Concepts and Trade-offs

Performance vs. Scalability

Latency vs. Throughput

CAP Theorem

Consistency Models

High Availability Patterns

Networking Basics

Domain Name System (DNS)

Content Delivery Network (CDN)

Load Balancing

Load Balancing Algorithms

Layer 4 vs Layer 7

Benefits

Reverse Proxy

Application Layer and Microservices

Separating Web and Application Layers

Microservices Architecture

Databases and Storage

Relational Databases (RDBMS)

Scaling Strategies

Master-Slave Replication

Sharding (Horizontal Partitioning)

NoSQL Databases

SQL vs NoSQL

Caching

Caching Strategies

Cache-Aside (Lazy Loading)

Write-Through

Write-Behind

Cache Eviction Policies

Asynchronous Processing

Message Queues

Communication Protocols

TCP vs UDP

REST vs RPC

Putting It All Together

Request Flow Example

Security Considerations

Conclusion

Ready to practice?