Legal Information
PC Knowledge Base -Microsoft Exchange 2000 Server - Tier I: IMS Servers

Good Knowledge Is Good2Use

Tier I servers contain the IMS (Internet Mail Service), which carry mail from the internal system to the external gateways for access to the Internet. IMS servers need to be able to send mail outside of the system through the UNIX send-mail host, which must be operational for this to occur. Although this falls into the hand of the UNIX OS support team, port #25 on these boxes should be monitored nonetheless.

The impact of any problems on the IMS will only affect messages that travel outside the environment. IMS problems will not affect site to site or post office to post office operations. However, IMS failures will cause messages destined for external addresses to queue at the bridgehead server trying to send the message. The message will remain in queue until the IMS is operational or it is removed from queue with a diagnostic tool.

Consider scheduled downtime when monitoring Exchange services and processes. Exchange administrators should be aware of servers scheduled for maintenance to avoid false alerts from the monitors.
Also, temporarily disable any "auto-fix" type of monitors during scheduled maintenance. Suggestion: disable all monitors during the same part of the day that maintenance is scheduled to occur. First, make sure the efforts of the Exchange administrators and those performing maintenance are coordinated.

All EventLog ID numbers assume use of Microsoft Exchange version 5.0. EventLog IDs for Exchange 5.5 may differ, but the problem description and resolution will remain the same.
Severity definition:

  1. = High priority, notify immediately;
  2. = Medium priority, notify within 1 hour;
  3. = Low priority, notify within 24 hours.
Problem DescriptionMethod of DetectionRecommended ActionMonitoring IntervalSeverityThreshold
Connectivity Unable to send mail upstream Telnet to Port #25 of IMS to obtain a "ESMTP spoken here" response
Troubleshoot TCP/IP status on machine.
Ensure port #25 is operational and not used by another application.
Check the c:\winnt\system32\drivers\etc\services file to determine which application may be using port #25
15 min.11
Database problems
Database too fragmented
EventLog ID 65 detectedUse "edbutil" to defrag database (should be done by Exchange admins. Only)15 min.21 time every 3 months
Database in inconsistent state (This message may also appear in the Directory or Information Store database, in the case of a power failure. This error usually means that the database is in an inconsistent state and cannot start.)Event Log ID Error -550 has occurred Confirm that the database state database is inconsistent, and then try a defragmentation repair15 min.21
Database reaching capacityEventLog ID 1112 detected or IS size reached 80% of logical disk capacityNormally logged after database has shutdown for reaching capacity, this requires that the server run edbutil /d to free space up. After completion of edbutil database, restart Information store.20 min.21
Database cache hit rate too lowMonitor the database buffer cache hit ratio for the IS and DS databaseDS and IS buffers can be increased if there is sufficient RAM. If these fall below 95% frequently, it indicates the buffers are too low. To correct the problem, manually run
perfwiz -v
.30 min.3Baseline
MTA messages per second too low or too highMonitor the number of messages being processed by the MTA.Check the status of the MTA and the CPU and memory consumption of the processes.15 min.1Baseline
MTA process is downMonitor the number of threads in use by the MTA and EventLog ID 2110 detectedRestart MTA Service. If service fails to restart, restart ALL Exchange services in order.10 min.11
MTA Work Queue length too highMonitor MTA Queue length on serverCheck the MTA Service is up and the MTA service on upstream connections (i.e. if MTA queue length of bridgehead server is too high, check the MTA on the IMS)15 min.23Baseline
Directory updates failedEventLog ID 1171 detected - exception eventDirectory Service Problem followed by a 1214 Error in the Event log indicates a Server failing on a deletion or addition of a directory object. Contact Microsoft (PSS) for troubleshooting15 min.21
Directory updates failedEventLog ID 1214 detected - KCC eventKnowledge consistency checker fails to complete successfully. Indicates a corruption in the Directory schema that may affect more than one (1) server in a site or Organizational Unit. Contact Microsoft PSS for troubleshooting15 min.21
Directory Services Pending Replications too highMonitor the number of pending replications in the DSHuge lag in Directory updates may indicate a problem with Network connectivity to other bridgehead servers and confirms that the ability to ping other Bridgehead servers that this server uses for directory replication still exists. This can also occur with servers in the same site.30 min.2Baseline
Directory Services remaining replication updates not decreasingMonitor the number of objects being processed by the DSIndicates a directory problem with either the server failing to exchange directory replication messages due to Network issues or directory problems. Check Event logs on Server for details30 min.2Baseline
IMS NDRs are highMonitor the NDR for the IMS.
EventLog ID 1001 detected.
EventLog ID 1026 or 2007 detected
Confirm Sendmail, Unix Relay host is available from Server. EventLog ID 1001 means the IMS service has been stopped or has shutdown. EventLog ID 1026 or 2007: contact Microsoft PSS.30 min.2Baseline
IMS inbound queue too highMonitor the IMS inbound queueCheck the IMS Service. 30 min.2Baseline
IMS outbound queue too highMonitor the IMS outbound queueCheck the IMS Service30 min.2Baseline
Overall Exchange problems Exchange services downMonitor the service control manager to detect status of services.Check all Exchange services. Restart services that are down in order. Gracefully reboot server if necessary.5 min.11
Exchange process deadMonitor the CPU and thread utilization of the Exchange processesCheck all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary5 min.1Baseline
Runaway Exchange processMonitor the CPU and memory utilization of the Exchange processesCheck all of the Exchange services. Restart services that are down in order. Gracefully reboot server if necessary.5 min.1Baseline
Paging too highMonitor the paging frequency of the operating systems (pagefile usage)Excessive paging requires the need for upgrade of memory. If paging persists, treat as bug.5 min.2Baseline
Low logical disk free spaceMonitor the logical disk space of the Exchange machinesDelete unnecessary files to free up disk space. Install new disk space if necessary.15 min.1Baseline
CPU Queue Length too highMonitor the overall queue length of the CPU over a prolonged period.Use Performance Monitor to identify CPU bottlenecks and rectify as necessary30 min.2Baseline
Compaq Insight Manager errorsMonitor the internal temperature of serverCheck hardware for errors10 min.1Baseline
Compaq Insight Manager errorsMonitor any critical IDE or SCSI disk failuresCheck hardware for errors10 min.1Baseline
Compaq Insight Manager errorsMonitor NIC failuresCheck hardware for errors.10 min..1.Baseline
Compaq Insight Manager errorsMonitor any fan failuresCheck hardware for errors10 min.1Baseline
Compaq Insight Manager errorsMonitor any correctable memory errorsCheck hardware for errors10 min.1Baseline
Network utilization highMonitor total bytes per second processed by network interface card.Check and/or tune performance of NIC card.10 min.3Baseline
ICMP errorsMonitor the receipt time for ICMP packetsCheck and/or tune performance of NIC card.10 min.3Baseline
ICMP errorsMonitor the level of unreachable destinationsCheck and/or tune performance of NIC card.10 min.3Baseline


Search Knowledge Base Feedback
If you like our web site refer a friend.
Your friends name.
Your friends email address.
Your Name
Your Email Address


© Copyright 1998-1999 GOOD2USE