|List the three main software components that may fail when the client process invokes a method in the server object, giving an example of failure in each case. Suggest how the components can be made to tolerate on one another’s failures.
Failure of software components include:
Channel/process in between may exhibit arbitrary behavior due to which the process may stop or take an incorrect step. E.g. The data or the file sent may be corrupted due to improper function of OS or other software
A process completes a send but the message is not put in its outgoing message buffer. E.g. the server may be crashed before receiving the request.
A message is put in a process’s incoming message buffer, but that process does not receive it. E.g. the client who placed the request may not be able to receive it. This may occur due to time delay caused by the server.
Failures can be handled by three different ways:
By detecting failures
Checksums can be used to detect the corrupted data in the message or file transmitted. Failures such as remote crashed server in the Internet cannot be detected but these failures can be put under suspicion and required steps can be taken.
Sometimes message may fail to arrive. So message can be retransmitted. File data can be written to a pair of disks so that if one is corrupted, the other may be correct.
Under worst cases, these may not work because the data on the second disk may be corrupted too or the message may not get through in a reasonable time however often it is transmitted.
Clients can be designed to tolerate failures. E.g. Web browser keeps on trying while the web server fails to display a page but it does not make the user wait forever. It informs the user about the problem.
Failure recovery can be obtained by the use of redundant components. Few of them are as follows:
There should be two routes between any two routers in the Internet.
In the Domain Name System, each table should be replicated in two different servers.
The database must be replicated in several servers to ensure that the data remains accessible after failure of any single server.