Explaining different distributed system terminologies to a 10-year-old
What comes to your mind when you hear distributed? Well divided amongst fragments. Similarly, when you divide a single big system into different small systems who talk to one another it is called distributed systems. Now the textbook definition is
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another.
Like we test your understanding through exams of maths, science, social science, and godforsaken English. Similarly, we test a distributed system on the Scalability, Reliability, Availability.
Scalability: The capability of a system to grow and manage increasing demand. There are 2 types of scaling :
- Horizontal scaling: To scale anything horizontally means to add more servers to our network.
- Vertically Scaling: To Scale vertically means to extend the capability of the same machine
For further reading refer to :
Reliability: Word conveys its meaning. As in real life, if someone is reliable, he/she will deliver his / her work no matter what. The system is reliable if it keeps on delivering its service even if hardware/software fails
Availability: It is a simple measure of the percentage of time that a system, service, or a machine remains operational under normal conditions
Load Balancer: When we work in a team then the captain divide the work and continuously monitor the performance of every member of the team . Similarly Load Balancer in a distributed environment tracks the performance of all the server and distributes load/traffic amongst different servers to ensure high availability and responsiveness.
Caching: Compare Caching to cheat sheet which consists of only important information, not all the information as it is limited in size. Similarly, Cache is like short term memory, it is limited in space but faster in access time than an original data source. Caches can exist at all levels in architecture but are often found at the level nearest to the front end where they are implemented to return data quickly without taxing downstream levels
CDN (Content Delivery Network): These are network caches that are used to serve static content. If the unavailable request is forwarded to backend microservice for the result.
Proxy: It is an intermediate layer between the client and the server . You might have heard the term lets use a proxy to hide our IP. It is typically used to filter requests, log requests, or sometimes transform requests (by adding/removing headers, encrypting/decrypting, or compressing a resource). It is an ideal place to put caching
Reverse Proxy: A Reverse Proxy retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client, appearing as if they originated from the proxy server itself.
Master-Slave and Replication
Generally, Database is considered a single point of failure (if the database goes down the entire system goes down) ie master. So to overcome this problem we create a duplicate of the database ie slaves and give reliability in case of database instance goes down. We all know we cannot keep on copying the entire data every time after a scheduled interval as it is a complex operation. So After the master gets an update, it ripples it through to the slaves. Each slave outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent updates.
For Further understanding watch
CAP Theorem
CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following properties out of CAP: Consistency, Availability, and Partition tolerance.
CAP theorem says while designing a distributed system we can pick only two of the following three options:
Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads.
Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers.
Partition tolerance: The system continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.
Publisher and Subscriber
Publisher Subscriber is a widely used messaging system where senders of messages, called publishers, do not send the messages directly to specific receivers, called subscribers, but instead categorize published messages into classes called brokers without knowledge of which subscribers if any, there may be. Similarly, subscribers express interest in one or more subscribers and only receive messages that are of interest, without knowledge of which publishers, if any, there are.
In other words, the publisher and the subscriber will never know about the existence of one another. So how they communicate?
In short, another component named message broker is known by both publisher and subscriber. The publisher will send the message to the message broker and the message broker will filtering and broadcasting the message to the right subscriber.