POS355 Week 4 Distributed System Failures

POS355 Week 4 Distributed System Failures

  • Submitted By: lallen337
  • Date Submitted: 03/10/2014 10:08 AM
  • Category: Technology
  • Words: 877
  • Page: 4
  • Views: 2






Distributed System Failures
POS355
Distributed System Failures
A distributed system consists of a collection of independent computers, connected through a network and distribution middleware. Each computer has its own local memory. The collections of computers synchronize their activities and to share the resources of the system, so that users perceive the system as a single, integrated computing facility. All the computers work together to ask as on large computer. A distributed system is not a perfect system. It still has faults that can happen such as network link failure, site failure, machine failures, and storage failure.
Network link failure can be failure of a link, the failure of a site, or the loss of a message. Network link failure is hard to detect exactly what type of link failure occurred. Machine failures where the machine stops before performing an erroneous operation that is visible to other processors. Storage medium failure can cause loss of the operating system, applications, or data. Both site failure and storage failure are failures that could also occur in a centralized system, whereas a network link failure can occur only in a networked-distributed system.
To detect link and site failure, a handshaking procedure is used. Suppose two sites have a direct link between them. The two sites will send each other a message. If the first site does not receive a message from the second site within a determined amount of time, the site will assume that the second site is not communicating. This can be due to the link is down, the message never made it to the second site, or the second site is down. The first site can then choose to wait for the message from the second site or send a message asking if the second site is awake. If the first site is able to send a message to the second site via a different path and receives a response then it knows it is the direct link between the sites that is down. If the first site does not...

Similar Essays