Diagnostic Tools

"Why is my check failing?"

We get this question often. It isn't always immediately obvious what's going on with a web site or service that is "down," and additional information about what our probes are seeing can be helpful. For example, if your website is timing out, is it the web server, a DNS problem, or maybe packet loss on the network? The NodePing Diagnostic Tools allow you to run several utilities to see what our probes are seeing about your web site or service. These tools can be useful to narrow down where the failure is occurring so you can get things fixed and service restored as quickly as possible.

Tools List

  • Ping
  • Traceroute
  • MTR
  • Dig
  • Page Load
  • Screenshot

Ping

Use the ping tool to help detect packet loss or routing problems between our probe servers and your services. We'll send 10 ICMP packets to the target IP or FQDN from the probe you choose. Look for % of packet loss in the summary line at the bottom of the results.

Traceroute

The traceroute tool is helpful in determining at what point a route is failing. It can also show what firewall is blocking our probes. The results show the route taken from our probe to your target IP or FQDN. Each line is a "hop" and shows the latency between your target and that node along the route. Look to see if the last line is your destination. If it isn't, there's either a routing issue, firewall, or the server is down.

MTR

Use the mtr tool to help detect packet loss and routing problems. It's like ping and traceroute combined. Running an mtr is a good place to start when you see a 'Timeout' failure. Look at the "Loss%" column to see packet loss. If the last line isn't your destination, it likely indicates either a routing issue, firewall, or the server is down.

Dig

The dig tool is for finding DNS issues. Look for errors like "no servers could be reached", which indicates that the DNS server is unavailable. A missing "ANSWER SECTION" means there is no resolution for that FQDN.

Page Load

For passing HTTP checks, this tool provides an analysis of a full page load, including javascript, css, etc. Use the HAR viewer to get details on each HTTP request. You can see which assets are loading more slowly. Note the "Total response" time listed in milliseconds. It indicates the page speed.

Screenshot

Use the screenshot tool for your HTTP checks to get a visual snapshot of your site.

Troubleshooting

Here are some tips for troubleshooting some of the most common failures seen for the various check types.

HTTP, HTTP Content, HTTP Advanced, HTTP Parse checks

HTTP checks can be the most complex of our checks to troubleshoot because they rely on so many other services to function.

HTTP checks usually require DNS, IP routing, proper firewall configuration, a working SSL certificate, a functioning database, and running webserver. It's no wonder we receive more questions about failed HTTP checks than any other check types.

One of the fastest ways to diagnose an HTTP failure is to create checks for all the dependent services (DNS, ping, port, and SSL). If your HTTP check fails but none of the other checks fail, it's a good indicator that the webserver is the cause of the trouble. If your ping check fails, you can figure that the server is offline, routing is broken, or your datacenter is experiencing packet loss.

Timeout failures can be the hardest to determine what's going wrong. We recommend you start with running an MTR to see if there is any routing or packet loss between our probes and your service. If the MTR doesn't show any problems, run a dig test against your URL FQDN to see if it's your slow or failed DNS service that's causing the timeout. Please note that page load and screenshot tests are not useful for HTTP checks that are timing out.

500 errors usually indicate a problem on the server itself. It could be the database is offline or your box has run out of memory. For 500 errors, you'll want to contact your hosting company's support.

Not found errors are usually seen on HTTP Content and HTTP Advanced checks when the text you've configured on the check does not appear exactly as you have it set. Our probes see the HTML, not the rendered page as your browser shows you so any special characters are likely encoded or have other HTML tags in or around them. Also, any text that is loaded via javascript libraries isn't in the initial HTML and since our checks do not run javascript on the pages, it won't be seen.

PING checks

Our ICMP PING checks are fairly simple and there's only a few things that can go wrong. Either the server is offline (turned off), a firewall is blocking the pings, the network is unroutable, or there's packet loss somewhere along the route.

Timeout failures can be verified by using the ping diagnostic tool running from the location your check is running. This will verify that the pings are, in fact, not being replied to. To check for routing and packet loss problems, run an mtr diagnostic. It will let you know where along the route our pings stop receiving a reply. If the mtr is showing that it's sometimes able to reach your server, you'll be able to see the rough approximation of the packet loss seen between our probe and your server.

DNS checks

DNS can be difficult to troubleshoot since changes sometimes take many hours to propagate. Use our dig diagnostic tool to query a specific FQDN against a specific DNS server (or the default DNS server on our probe) to see if resolution is correct.

SSH checks

The most common issue we see with SSH checks are timeouts that the owner is unable to reproduce. These are usually caused by unresponsive DNS servers for the PTR records of the SSH host. OpenSSH clients, by default, do a reverse IP lookup when connecting to a server. If the DNS servers are not responding, the SSH check may timeout before ever trying to connect. The challenge comes when someone who often connects to that SSH server is able to connect, but our probes are not. This is because their client has cached the PTR record and the OpenSSH client uses the cache and then connects. We recommend creating DNS checks for the PTR record of the IP addresses for the servers you have SSH checks for. You can also use our dig diagnostic tool to see if the PTR records are available.

SMTP, POP3, IMAP4 checks

Our various email checks are similar to HTTP checks in that they rely on several other services to function properly. DNS, routing, and packet loss can all cause mail checks to fail.

Timeout failures should first be verified by running an MTR to see if there is any routing or packet loss between our probes and your service. If the MTR doesn't show any problems, run a dig test against the checks FQDN to see if it's your slow or failed DNS service that's causing the timeout.

Advanced Troubleshooting

If you're still having trouble determining the cause of an outage, NodePing support is always available and happy to help. Send an email to support@nodeping.com with the check label or id and we'll do what we can to help sort it out.

If you have any questions, get in touch at support@nodeping.com, or use our Contact form.