GOOD2USE Knowledge Network Microsoft Exchange 2000 Server

PC Knowledge Base -Microsoft Exchange 2000 Server - Tier III: Branch Servers

Good Knowledge Is Good2Use

Tier III servers provide the postoffice' function that will route messages sent from the branch to other branches, other sites and outside of the environment by sending the message to the appropriate upstream postoffice.
Postoffice servers send e-mail upstream to bridgehead servers. Problems on the postoffice servers will affect only users within that postoffice. Messages will not travel outside the postoffice.
Postoffice server failures can cause messages to queue at both its own queues and the queues of any sites attempting to communicate with the problem postoffice. The messages will remain in queue until the connection is re-established or they are removed with a diagnostic tool.

Consider scheduled downtime when monitoring Exchange services and processes. Exchange administrators should be aware of servers scheduled for maintenance to avoid false alerts from the monitors.
Also, temporarily disable any "auto-fix" type of monitors during scheduled maintenance. Suggestion: disable all monitors during the same part of the day that maintenance is scheduled to occur. First, make sure Exchange administrators and those performing maintenance are coordinated.
All EventLog ID numbers assume use of Microsoft Exchange version 5.0. EventLog IDs for Exchange 5.5 may differ, but the problem description and resolution will remain the same.

= High priority, notify immediately;
= Medium priority, notify within 1 hour;
= Low priority, notify within 24 hours.

Problem Description	Method of Detection	Recommended Action	Monitoring Interval	Severity	Threshold
Database problems Database too fragmented	EventLog ID 65 detected	Use "edbutil" to defrag database (should be done by Exchange admins. Only)	15 min.	2	1 time every 3 months
Database in inconsistent state (This message may also appear in the Directory or Information Store database, in the case of a power failure. This error usually means that the database is in an inconsistent state and cannot start.)	Event Log ID Error -550 has occurred	Confirm that the database state database is inconsistent, and then try a defragmentation repair Stop all services and backup all files before you manually run the Edbutil.exe program. . To check the state, use Edbutil.exe with the "MH" option on the problem database and dump the output to a text file: EDBUTIL /MH c:\exchsrvr\dsadata\dir.edb >c:\edbdump.txt -OR- EDBUTIL /MH c:\exchsrvr\mdbdata\priv.edb >c:\edbdump.txt -OR- EDBUTIL /MH c:\exchsrvr\mdbdata\pub.edb >c:\edbdump.txt View the Edbdump.txt file and confirm that the database state is inconsistent. If it is and it will not start due to a -550 error in the EventLog, restore the database from the online backup, replay the logs, and restart the consistent database. If and only if the online backup is unavailable, follow step #3. To repair the database, use the following Edbutil syntax: EDBUTIL /R /DS Use /ISPRIV or /ISPUB instead of /DS for repairing the private or public information stores. Because there is a difference between the Repair (/d [database]/r) database while defragmenting and Recovery (/r) option of EDBUTIL, do not run the EDBUTIL /D /R unless specifically directed by Microsoft PSS. Refer to Knowledge Base article Q143235 for information on running the Recovery option (edbutil /r).	15 min.	2	1
Recovery Database reaching capacity	EventLog ID 1112 detected or IS size reached 80% of logical disk capacity	Normally logged after database has shutdown for reaching capacity, this requires that the server run edbutil /d to free space up. After completion of edbutil database, restart Information store.	20 min.	2	1
Database cache hit rate too low	Monitor the database buffer cache hit ratio for the IS and DS database	DS and IS buffers can be increased if there is sufficient RAM. If these fall below 95% frequently, it indicates the buffers are too low. To correct the problem, manually run perfwiz -v	.30 min.	3	Baseline
MTA process is down	Monitor the number of threads in use by the MTA	Restart MTA Service. If service fails to restart, restart ALL Exchange services in order.	10 min.	1	1
Directory updates failed	EventLog ID 1171 detected - exception event	Directory Service Problem followed by a 1214 Error in the Event log indicates a Server failing on a deletion or addition of a directory object. Contact Microsoft (PSS) for troubleshooting	15 min.	2	1
Directory updates failed	EventLog ID 1214 detected - KCC event	Knowledge consistency checker fails to complete successfully. Indicates a corruption in the Directory schema that may affect more than one (1) server in a site or Organizational Unit. Contact Microsoft PSS for troubleshooting	15 min.	2	1
Directory Services Pending Replications too high	Monitor the number of pending replications in the DS	Huge lag in Directory updates may indicate a problem with Network connectivity to other bridgehead servers and confirms that the ability to ping other Bridgehead servers that this server uses for directory replication still exists. This can also occur with servers in the same site.	30 min.	2	Baseline
Directory Services remaining replication updates not decreasing	Monitor the number of objects being processed by the DS	Indicates a directory problem with either the server failing to exchange directory replication messages due to Network issues or directory problems. Check Event logs on Server for details	30 min.	2	Baseline
Internal connection to IS failed	Monitor number of logons to the IS.	Check the status of the IS service. If service is up, but still get errors, then check that IP Stack is working properly on server. For example, ping 127.0.0.1 then run IPCONFIG, and ping Gateway of server. If the server has active mailboxes and there are zero connections then a problem exists. Use a test account to see if there is a problem making a connection to the server.	15 min.2	Baseline
Overall Exchange problems	Exchange services down	Monitor the service control manager to detect status of services.Check all Exchange services. Restart services that are down in order. Gracefully reboot server if necessary.	5 min.	1	1
Exchange process dead	Monitor the CPU and thread utilization of the Exchange processes	Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary	5 min.	1	Baseline
Runaway Exchange process	Monitor the CPU and memory utilization of the Exchange processes	Check all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary	.5 min.	1	Baseline
Paging too high	Monitor the paging frequency of the operating systems (pagefile usage)	Excessive paging requires the need for upgrade of memory. If paging persists, treat as bug.	5 min.	2	Baseline
Low logical disk free space	Monitor the logical disk space of the Exchange machines	Delete unnecessary files to free up disk space. Install new disk space if necessary.	15 min.	1	Baseline
Memory utilization too high	Monitor the memory utilization of the Exchange processes	Tune the memory utilization of Exchange. Add additional RAM if necessary.	15 min.	1	Baseline
CPU Queue Length too high	Monitor the overall queue length of the CPU over a prolonged period.	Use Performance Monitor to identify CPU bottlenecks and rectify as necessary	30 min.	2	Baseline
Compaq Insight Manager errors	Monitor the internal temperature of server	Check hardware for errors	10 min.	1	Baseline
Compaq Insight Manager errors	Monitor any critical IDE or SCSI disk failures	Check hardware for errors	10 min.	1	Baseline
Compaq Insight Manager errors	Monitor NIC failures	Check hardware for errors.	10 min..	1.	Baseline
Compaq Insight Manager errors	Monitor any fan failures	Check hardware for errors	10 min.	1	Baseline
Compaq Insight Manager errors	Monitor any correctable memory errors	Check hardware for errors	10 min.	1	Baseline
Network utilization high	Monitor total bytes per second processed by network interface card.	Check and/or tune performance of NIC card.	10 min.	3	Baseline
ICMP errors	Monitor the receipt time for ICMP packets	Check and/or tune performance of NIC card.	10 min.	3	Baseline
ICMP errors	Monitor the level of unreachable destinations	Check and/or tune performance of NIC card.	10 min.	3	Baseline

Search Knowledge Base

Feedback

If you like our web site refer a friend.
Your friends name.	Your friends email address.
Your Name	Your Email Address