Legal Information |
|
There is a simple follow-these-steps approach to troubleshooting problems. The method goes like this:
A common error report is "I can't connect to the server right now." What could be the problem? It helps to dissect this simple sentence to understand the issues that may be involved. For example:
Is this the only user who has called in reporting network problems? If there are others, do they have similar issues? If so, then right away it's clear you don't need to begin your troubleshooting at the user's computer. Instead, the issue is most likely "out there" somewhere, and that could mean maybe your DNS server is offline or your DNS provider services may be experiencing difficulty. Or maybe a router on your internal network may be going crazy and dropping packets. Or maybe the server your users are trying to connect to may have crashed.
You should also stop and think about any commonalities these users may have. For example, are their machines all on the same subnet? If so, then maybe the default gateway for that subnet is misconfigured or the router crashed. Or maybe a contractor working in the plenum crawlspace has accidentally cut a network cable connecting the subnet's workgroup switch to the department's main Ethernet backbone switch. Or maybe someone malicious has installed a rogue DHCP server on that subnet and it's stealing machines as their leases come up for renewal and assigning them unroutable addresses to create a denial of service condition.
If it's only that one user though who has the problem, then it's probably time to play braindead and start asking questions like "OK, is your computer turned on? Is the network cable securely attached at the back of your machine?" and so on.
A good question to ask this user is "What do you mean by connect?" That's because "connect" is a technical-sounding word that users often use to impress Help Desk to show they know what they're talking about. Well, they usually don't. Why? Because there are different kinds of connectivity including MAC-level communications, TCP sessions, password-authentication, access rights and privileges, NAT-traversal connectivity, firewall pass-through, application-level sessions, and so on.
What kind of connectivity problem are they actually having? What are they trying to do when they say they want to "connect to" the server? Are they trying to access a share on that server? Do they get an "Access denied" message when they do this? Are they getting a login box prompting them for credentials? Is it rejecting their credentials? Are they having trouble finding the share in Active Directory? Is it a mapped drive they are having problems with? Are they trying to browse to find the server in My Network Places? And so on.
And is it just that server they're having trouble connecting to, or are they having problems connecting to anything on the network? Determining the scope of the problem here is important: Is connectivity failing in just one way or many ways?
You've got this user over here, and this server over there, and the network between. They can't connect. Why? Well, where exactly is that server anyway? Is it on the user's subnet? On an adjacent subnet? In a different department? On a different floor? In a different building? On a different continent? What kind of network connects the user with that particular server?
A wired Ethernet LAN? A wireless LAN (WLAN)? A fractional T1 line? Frame Relay? A VPN tunnel over the Internet? A dial-up modem connection? Cable modem or DSL?
First determine the type of connection (possibly several types) between the user and the server, and then ponder where things might break down. Maybe the CSU/DSU has gone wonky; try recycling its power or contact your service provider who should be monitoring it. Maybe the janitor is cleaning the server room and he bumped a power bar and an Ethernet switch has gone offline.
Check for an alert message from your network management software, assuming you're using managed switches. Maybe there's been a power blackout at the remote branch office where that server is located. Call them on the phone and see what's happening.
And is it server or servers? Is the user having trouble connecting to only that server or to other servers as well? Are others having problems connecting to other servers also? What are the commonalities (if any) between all the servers being affected? Or apparently being affected -- remember, the problem may be with the users' computers or more likely with the network infrastructure itself.
The time element is crucial in troubleshooting. Did the problem just start happening? When was the last time you successfully connected to the server? How long has it been going on for? Is it continuous or intermittent?
Intermittent network problems involving unreliable WAN links and other issues can be difficult to troubleshoot, especially if they're transient, i.e. brief and occasional.
Time can also help you relate the problem to other circumstances that might be impacting your network. Did the problem start this morning at 10 am? What else happened on your network around then? Were patches applied by a WSUS server? Did scheduled maintenance on a domain controller occur? Was a construction crew in the building compound using a backhoe to repair a water main break?
TCP/IP troubleshooting is structured around three critical areas:
Asking the right questions is also critical to good troubleshooting. Learning when to be methodical and when to take a mental leap is the essence of the art of troubleshooting, and it involves full use of both your left brain (logic) and right brain (intuition). Nnetwork troubleshooting generally isn't as easy as1--2-3. In other words, it's often more of an art (i.e. based on intuition) than a science (based on a methodology).
Finally, getting your hands dirty and actually testing things to try and isolate the problem is critical, and to do this you need a toolbox of troubleshooting tools you know how to use.
Search Knowledge Base | Feedback |