System Design Requirements: What Every System Really Does
The Three Core Operations
| Operation | What it Does | Examples |
|---|---|---|
| Move | Transfer data between systems | API calls, message queues, streaming |
| Store | Persist data for later use | Databases, caches, file systems |
| Transform | Change data format or derive insights | ETL, aggregations, ML models |
Every feature you build is a combination of these three.
Example: Social Media Post
When you post a photo:
- Move ā Photo travels from phone to server
- Store ā Save metadata in DB, image in S3
- Transform ā Resize image, update feed algorithm
- Move ā Push to friends' feeds
Key Performance Metrics
When designing systems, you measure success with these metrics:
1. Latency
Time taken to complete one operation.
| Type | Description | Example |
|---|---|---|
| P50 | 50% of requests faster than this | 50ms |
| P95 | 95% of requests faster than this | 200ms |
| P99 | 99% of requests faster than this | 500ms |
ā ļø P99 matters most ā it's the worst experience your users feel.
Common latency targets:
| System Type | Acceptable Latency |
|---|---|
| Real-time (games, trading) | < 10ms |
| Web APIs | < 100ms |
| Background jobs | < 1s |
2. Throughput
Number of operations per unit time.
| Metric | Unit | Example |
|---|---|---|
| RPS | Requests per second | 10,000 RPS |
| QPS | Queries per second | 50,000 QPS |
| TPS | Transactions per second | 1,000 TPS |
Throughput vs Latency tradeoff:
| Scenario | Latency | Throughput |
|---|---|---|
| Process one-by-one | Low | Low |
| Batch processing | Higher | Higher |
3. Availability
Percentage of time system is operational.
| Availability | Downtime/Year | Called |
|---|---|---|
| 99% | 3.65 days | Two nines |
| 99.9% | 8.76 hours | Three nines |
| 99.99% | 52.6 minutes | Four nines |
| 99.999% | 5.26 minutes | Five nines |
ā Most systems aim for 99.9% - 99.99%.
4. Consistency
Do all users see the same data at the same time?
| Type | Description | Use Case |
|---|---|---|
| Strong | All reads see latest write | Banking, inventory |
| Eventual | Reads may lag behind writes | Social media, analytics |
5. Durability
Will data survive failures?
| Strategy | Durability | Cost |
|---|---|---|
| Single server | Low | Cheap |
| Replicated (3 copies) | High | Medium |
| Multi-region backup | Very High | Expensive |
CAP Theorem
You can only guarantee 2 out of 3:
| Property | Meaning |
|---|---|
| Consistency | All nodes see same data |
| Availability | System always responds |
| Partition Tolerance | Works despite network failures |
In distributed systems, network partitions will happen ā you choose between C or A.
| System Type | Choice | Example |
|---|---|---|
| CP | Consistency over Availability | Banks, bookings |
| AP | Availability over Consistency | Social feeds, caches |
Putting It All Together
When designing a system, ask:
| Question | Metric |
|---|---|
| How fast should it respond? | Latency (P50, P95, P99) |
| How many requests per second? | Throughput (RPS/QPS) |
| How much downtime is acceptable? | Availability (nines) |
| Must all users see same data instantly? | Consistency |
| Can we afford to lose data? | Durability |
Example: Design Requirements for E-commerce
| Requirement | Target | Why |
|---|---|---|
| Latency (P99) | < 200ms | Users abandon slow pages |
| Throughput | 50,000 RPS | Handle flash sales |
| Availability | 99.99% | Downtime = lost revenue |
| Consistency | Strong (for payments) | No double-charging |
| Durability | Multi-region backup | Orders can't be lost |
Summary
Every system does three things:
| Operation | Question |
|---|---|
| Move | How does data travel? |
| Store | Where does data live? |
| Transform | How does data change? |
Measure success with:
| Metric | What it Measures |
|---|---|
| Latency | Speed of one request |
| Throughput | Volume of requests |
| Availability | Uptime percentage |
| Consistency | Data correctness |
| Durability | Data survival |
Design is about making tradeoffs between these based on your requirements. There's no perfect system ā only the right system for your use case.