CompTIA Network+ Rapid Review: Network Concepts

  • 12/15/2012

Objective 1.8: Given a scenario, implement the following network troubleshooting methodology

One of the key elements of troubleshooting a network problem is having a plan of action. Many troubleshooting calls are from users who are improperly using software, and these can often be cleared up immediately with some remedial training. When you are faced with what appears to be a real problem, however, you should follow a set troubleshooting procedure, which consists of a series of steps similar to those in this objective.

Exam need to know

  • Identify the problem

    For example: What questions should the troubleshooter ask the user?

  • Establish a theory of probable cause

    For example: What are all of the possible causes of the problem?

  • Test the theory to determine cause

    For example: What can you do to determine whether your theory is correct?

  • Establish a plan of action to resolve the problem and identify potential effects

    For example: What needs to be done to resolve the problem fully?

  • Implement the solution or escalate as necessary

    For example: Under what conditions must the problem be escalated?

  • Verify full system functionality and if applicable implement preventative measures

    For example: Is there anything that can be done to prevent the problem from reoccurring?

  • Document findings, actions, and outcomes

    For example: What mechanisms does the organization have in place to maintain a history of the problem and its solution?

Identify the problem

The first step in troubleshooting a network problem is to determine exactly what is going wrong and to note how the problem affects the network so that you can assign it a priority. It is sometimes difficult to determine the exact nature of the problem from the description given by a relatively inexperienced user, but part of the process of narrowing down the cause of a problem involves obtaining accurate information about what has occurred. Users are often vague about what they were doing when they experienced the problem, or even what the indications of the problem were.

Begin by asking the user questions like the following:

  • What exactly were you doing when the problem occurred?
  • Have you had any other problems with your computer lately?
  • Was the computer behaving normally just before the problem occurred?
  • Has any hardware or software been installed, removed, or reconfigured recently?
  • Did you or anyone else do anything to try to resolve the problem?

When a computer or other network component that used to work properly now does not, it stands to reason that some change has occurred. When a user reports a problem, it is important to determine how the computing environment changed immediately before the malfunction. Unfortunately, getting this information from the user can often be difficult. On a network with properly established maintenance and documentation procedures, you should be able to determine whether the user’s computer has been upgraded or modified recently.

Major changes, such as the installation of new hardware or software, are obvious possible causes of the problem, but you must be conscious of causes evidenced in more subtle changes as well. For example, an increase in network traffic levels, as disclosed by a protocol analyzer, can contribute to a reduction in network performance.

True or false: The priority you assign to a problem report should, in most cases, be based primarily on the number of users the problem affects.

Answer: True. Although there can be political and economic factors that affect your decision, the general rule is that the more users who are affected, the higher the priority of the problem.

Establish a theory of probable cause

After gathering all the information you can, make a list of all the possible problems that fit the circumstances, from the mundane to the extreme. A user’s inability to access a website could be caused by a problem in the user’s computer, a problem in the web server, or anywhere in between. When you first begin the troubleshooting process, your list of possibilities might include everything from an unplugged network cable to solar flares. As you gather more information, you should be able to rule out a lot of the possible causes on your list and work your way down to a manageable few.

The final step of this phase is to select the item from your list that seems to be the most probable cause of the problem. Don’t be afraid to question the obvious. There’s an old doctors’ axiom that says, “When you hear hoofbeats, think horses, not zebras.” In the context of network troubleshooting, this means that when you look for the probable cause of a problem, start with the obvious cause first.

True or false: The most obvious cause of a problem is usually the correct one.

Answer: False. IT troubleshooting is rarely well-guided by simplistic axioms such as these. A problem’s cause can be just as easily obvious as obscure.

Test the theory to determine the cause

When you have established your theory of the probable cause of the problem, the next step is to test that theory. If you have isolated the problem to a particular piece of equipment, try to determine whether hardware or software is the culprit. If it is a hardware problem, you might replace the unit that is at fault or use an alternative that you know is functioning properly.

In some cases, the only way to test your theory involves resolving the problem. For example, if you suspect that a computer’s inability to access the network is due to a bad patch cable, the only way to test your theory is to replace the patch cable with one you know is good. If that works, then your theory is confirmed.

Confirming your theory might actually resolve the problem, but that is not always so. If the problem affects multiple computers, each of which will require modifications, then you might be able to confirm your theory by modifying one, to see if your procedure works.

If your test concludes that your theory is incorrect, then you have to go back to your list of possible causes and decide which of the remaining ones is the next most probable. Then the whole testing process begins again. It is not unusual for a troubleshooter to disprove several theories before arriving at the correct one.

Depending on the size of your organization and the chain of command, you might have to escalate the problem by bringing it to someone with greater responsibility than yours, someone who can determine when or if you can safely test your theory.

True or false: The easiest way to test if a hardware component has malfunctioned is to replace it with one that you know is working properly.

Answer: True. Replacing the suspected component is a sure way of testing it, but it is not always the most practical or most economical way. A component that is vital to the company’s operation or extremely expensive might not be easily replaceable, in which case you must find another solution.

Establish a plan of action to resolve the problem and identify potential effects

If your theory is proven correct and your solution needs to be implemented on a larger scale, the next step of the process is to create a complete plan of what needs to be done to fully resolve the issue. The plan should include all service interruptions that will be needed and all potential effects on the rest of the network. If the plan includes taking critical network components offline, then it should include the ramifications of that downtime and scheduling recommendations for work during off hours.

It is important, throughout the troubleshooting process, to keep an eye on the big network picture and not become too involved in the problems experienced by one user (or application or LAN). While resolving one problem, you could inadvertently create another that is more severe or that affects more users.

True or false: Server troubleshooting takes precedence over user productivity.

Answer: False. This is almost never true, especially when user productivity is directly equated with generation of revenue. Server outages should be planned for off hours and coordinated with all of the management personnel involved.

Implement the solution or escalate as necessary

When you have a solution to the problem mapped out and ready, it is time to implement it. If the solution falls within your area of responsibility, you can go ahead and do what is needed. However, if the solution involves other areas, or if special permission is required for the expenditures needed to execute your plan, then this is the time to escalate the issue to someone higher up in your organization’s chain of command.

True or false: Escalation of a problem only occurs when a troubleshooter is unable to arrive at a satisfactory solution.

Answer: False. A well-organized IT department has a chain of command that specifies who is responsible for each area of the network. Escalation of a troubleshooting issue should occur whenever it falls under a superior’s area of responsibility.

Verify full system functionality and, if applicable, implement preventative measures

Even if you have already performed small-scale tests to confirm your theory, after your solution is completely implemented, you must test again to confirm its success. To fully test whether the problem is resolved, you should return to the very beginning of the process and repeat the task that originally brought it to light. If the problem no longer occurs, you should test any other functions related to the changes you made, to ensure that fixing one problem has not created another.

At this point, the time you spend documenting the troubleshooting process becomes worthwhile. Repeat the procedures used to duplicate the problem exactly to ensure that the trouble the user originally experienced has been completely eliminated, and not just temporarily masked. If the problem was intermittent to begin with, it might take some time to ascertain whether the solution has been effective. It might be necessary to check with the user several times to make sure that the problem is not recurring.

If the problem ended up being the result of some network condition, or the action of a user administrator, you should consider at this point what must be done to prevent the problem from occurring again. This might involve a change to existing company policy or the creation of a new one.

True or false: Testing a solution to a troubleshooting issue involves recreating the original problem, if possible.

Answer: True. Recreate the original steps that caused the problem to appear, or have the original user do so, to determine whether your solution has been successful.

Document findings, actions, and outcomes

Although it is presented here as a separate step, the process of documenting all of the actions you perform should begin as soon as the user calls for help. A well-organized network support organization should have a system in place in which each problem call is registered as a trouble ticket that will eventually contain a complete record of the problem and the steps taken to isolate and resolve it.

The final phase of the troubleshooting process is to explain to the user what happened and why. Of course, the average network user is probably not interested in hearing all the technical details, but it is a good idea to let users know whether their actions caused the problem, exacerbated it, or made it more difficult to resolve. Educating users can lead to a quicker resolution next time or can even prevent a problem from occurring altogether.

True or false: Documentation of a troubleshooting effort should begin as soon as the problem is resolved.

Answer: False. Documentation should begin as soon as the problem is reported and continue throughout the troubleshooting process.

Can you answer these questions?

Find the answers to these questions at the end of this chapter.

  1. A user reports a problem to the help desk; after making a concerted troubleshooting effort for several hours, you are unable to resolve the issue. What should you do next?
  2. It is a busy morning at the help desk, and you are currently handling three calls. One appears to be a hard drive failure in a user’s workstation, one is a user unable to access a particular website, and the third consists of several calls reporting that the company email server is unavailable. Which should you handle first?
  3. A user calls the help desk and reports an inability to access any network resources, whether internally or on the Internet. What should you do to determine the scope of the problem?
  4. How do you test whether a network access problem is limited to a single workstation?

Related resources

There are currently no related titles.