System Design Requirements: What Every System Really Does

January 27, 20265 min read
Every system in the world does just three things: Move, Store, and Transform data. Understanding this simplifies how you think about design.

The Three Core Operations

OperationWhat it DoesExamples
MoveTransfer data between systemsAPI calls, message queues, streaming
StorePersist data for later useDatabases, caches, file systems
TransformChange data format or derive insightsETL, aggregations, ML models

Every feature you build is a combination of these three.


Example: Social Media Post

When you post a photo:

  1. Move → Photo travels from phone to server
  2. Store → Save metadata in DB, image in S3
  3. Transform → Resize image, update feed algorithm
  4. Move → Push to friends' feeds

Key Performance Metrics

When designing systems, you measure success with these metrics:


1. Latency

Time taken to complete one operation.

TypeDescriptionExample
P5050% of requests faster than this50ms
P9595% of requests faster than this200ms
P9999% of requests faster than this500ms

āš ļø P99 matters most — it's the worst experience your users feel.

Common latency targets:

System TypeAcceptable Latency
Real-time (games, trading)< 10ms
Web APIs< 100ms
Background jobs< 1s

2. Throughput

Number of operations per unit time.

MetricUnitExample
RPSRequests per second10,000 RPS
QPSQueries per second50,000 QPS
TPSTransactions per second1,000 TPS

Throughput vs Latency tradeoff:

ScenarioLatencyThroughput
Process one-by-oneLowLow
Batch processingHigherHigher

3. Availability

Percentage of time system is operational.

AvailabilityDowntime/YearCalled
99%3.65 daysTwo nines
99.9%8.76 hoursThree nines
99.99%52.6 minutesFour nines
99.999%5.26 minutesFive nines

āœ… Most systems aim for 99.9% - 99.99%.


4. Consistency

Do all users see the same data at the same time?

TypeDescriptionUse Case
StrongAll reads see latest writeBanking, inventory
EventualReads may lag behind writesSocial media, analytics

5. Durability

Will data survive failures?

StrategyDurabilityCost
Single serverLowCheap
Replicated (3 copies)HighMedium
Multi-region backupVery HighExpensive

CAP Theorem

You can only guarantee 2 out of 3:

PropertyMeaning
ConsistencyAll nodes see same data
AvailabilitySystem always responds
Partition ToleranceWorks despite network failures

In distributed systems, network partitions will happen → you choose between C or A.

System TypeChoiceExample
CPConsistency over AvailabilityBanks, bookings
APAvailability over ConsistencySocial feeds, caches

Putting It All Together

When designing a system, ask:

QuestionMetric
How fast should it respond?Latency (P50, P95, P99)
How many requests per second?Throughput (RPS/QPS)
How much downtime is acceptable?Availability (nines)
Must all users see same data instantly?Consistency
Can we afford to lose data?Durability

Example: Design Requirements for E-commerce

RequirementTargetWhy
Latency (P99)< 200msUsers abandon slow pages
Throughput50,000 RPSHandle flash sales
Availability99.99%Downtime = lost revenue
ConsistencyStrong (for payments)No double-charging
DurabilityMulti-region backupOrders can't be lost

Summary

Every system does three things:

OperationQuestion
MoveHow does data travel?
StoreWhere does data live?
TransformHow does data change?

Measure success with:

MetricWhat it Measures
LatencySpeed of one request
ThroughputVolume of requests
AvailabilityUptime percentage
ConsistencyData correctness
DurabilityData survival

Design is about making tradeoffs between these based on your requirements. There's no perfect system — only the right system for your use case.