System Design
CAP theorm
Consistency
Availability
Partition Tolerence
in a distributed system, you can only support two of the following guarantees:
finace industry : consistency soical network: availability
Consistency patterns
Week consistency
After a write, reads may or may not see it. A best effort approach is taken. VoIP, video chat, realtime multiplayer game.
Eventual consistency
After a write, reads will eventual see it. Data is repilcated asynchronously. DNS and email
Strong consistency
After a write, reads will see it. Data is repilcated synchronously. DB system,
Availability patterns
Fail-over
replication
-
Active-passive With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active’s IP address and resumes service.
-
Active-active In active-active, both servers are managing traffic, spreading the load between them.
Cons fail-over
- Fail-over adds more hardware and additional complexity.
- There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive.
CDN(Content delivery network)
a content delivery network is a globally distributed network of proxy servers, serving content from locations closer to the user.
Pull CDNs
Pull CDNs grab new content from your server when the first user requests the content.
Push CDNs
Push CDNs receive new content whenever changes occur on your server.
Cons
- CDN costes could be signifianct depending on traffic, although this should be weghed with additional costs.
- Content might be stale it is updateed befroe the TTL expires it. Like the pull CDNs
Load balancer
Load balancers distrbute incoming client requestes to computing resources such as application servers and databases.
Layer 4 load balancing
TCP, UDP
Layer 7 load balancing
http, https Haproxy, Nginx all soupport 4 and 7. Nginx use stream moudule in 4
cons
- The load balancer can become a performance bottleneck if it does not have enough resources.
- increased complexity
Reverse proxy
A reverse proxy is a web server that centralizes internal serivces and provides unifiled interface to the public.
Additional benefits include:
- Increase security
- inreased saclability and flexibility
- SSL
- Caching
- Compression
- Static content
Application layer
The single responsibility principle advocates for small and autonomous services that work together.
Microservices
Like order, user, search, etc.
Service Discovery
Systems such as Consul, Etcd and Zookeeper.
Cons
Microservices can add complexity in terms of deployments and operation.
Database
ACID is a set of properties of relational database transactions.
- Atomicity - Each transaction is all or nothing
- Consistency - Any transaction will bring the database from one valid state to another
- Isolation - Executing transactions concurrently has the same results as if the transactions were executed serially
- Durability - Once a transaction has been committed, it will remain so
Master-slave replication
The master serves reads and writes, replicating writes to one or more slaves, which serve only read. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.
Cons
Additional logic is needed to promote a slave to a master
Master-master replication
Both masters serve reads and writes and coordinate withe each other on writes. If either master goes down, the system can continue to operate with both reads and writes.
Cons
- You’ll need a load balancer or you’ll need to make changes to your application logic to determine where to write.
- Most Master-master systems are either loosely consistent or hanve increased wirte latency due to synchronization.
Cons replication
- There is a potential for loss of date if the master fails before any newly written data can be replicated to other nodes.
- Writes are replayed to the read replicas. If there are a lot of writes, the read replicas can get bogged down and can not do many reads.
- The more read slaves, the more replicate, which leads to greater replication lag.
- On some systems, writing to the master can use mutiple threads for parallel writing, while read replicas only support sequential writing wiht a single thread.
- Replication adds more hard ware and additional complexity.
Federation
Sharding
Denormalization
SQL tuning
NoSQL
SQL or NoSQL
Reasons for SQL:
- Structured data
- Strict schema
- Relational data
- Need for complex jonis
- Transactions
- Clear patterns for scaling
Reasons for NoSQL:
- Semi-structured data
- Dynamic or flexible schema
- Non-relational data
- No need for complex joins
- Store many TB PB data
Cache
Caching improves page load times and can reduce the load on your servers and databases.
Client caching
Caches can be located on the client side like browser.
CDN caching
CDNs are considered a type of cache.
Web server caching
Web servers can also cache requests, returing responses without having to contact applecation servers.