FiveM Server Monitoring: Uptime Alerts, Graphs and Catching Crashes Early

There are two ways to find out your server is down. The first is a player DM at 2am that says "server's dead?" The second is an alert that fired three minutes earlier, while you were already restarting the process. Good fivem server monitoring is the difference between those two worlds, and the shift is mostly a mindset change before it is a tooling one.

Reacting means players are your alerting system: they notice the outage, get annoyed, and you find out after the damage is done. Monitoring means you own the signal. You know the process is healthy, you know when tickrate drops, and you know a restart failed before the next wave hits an empty server. This article covers what to watch, the tools at each level, and how to alert staff without the noise.

What "Up" Actually Means

A server can be "running" and still be useless: alive while the game loop is stalled, or responding at the OS level while players time out. Define "up" in layers and check each.

Process alive: the FXServer process exists, and the port :30120 accepts connections.
Game responding: the FiveM heartbeat answers, and :30120/players.json and :30120/dynamic.json return valid data.
Actually playable: players can connect and aren't stuck on a black loading screen.

The dynamic.json and players.json endpoints are the cheapest, most honest health check you have. If dynamic.json returns your hostname and player count, the server is genuinely serving; if it times out or returns garbage, you have a real problem no matter what top says.

The Core Metrics Worth Watching

Don't graph everything on day one. Start with the signals that predict outages or explain them afterward.

Player count trends: a sudden drop to zero is a crash; a slow bleed over an hour is usually a performance or content problem.
Host CPU, RAM and disk: FiveM is largely single-thread-bound, so watch per-core CPU, not just the average. A full disk crashes you silently.
Server tickrate and frame time: rising frame time (ms per tick) warns you a resource is misbehaving before players start rubber-banding.
Restart success: a scheduled restart that silently fails leaves a stale process that degrades all night.

Tools, From Simple to Serious

Build this up in stages; each level is useful on its own.

External uptime monitor: a free service or tiny script that hits :30120/dynamic.json every minute from outside your network, catching the case where the whole box is unreachable.
txAdmin: if you run it, you already have player graphs, performance stats, and a live console: the fastest way to see tickrate and player history with no setup.
Prometheus + node_exporter + Grafana: the proper stack. node_exporter exposes host CPU, RAM, disk and network; Prometheus stores it; Grafana draws the dashboards. A small exporter that parses players.json puts player count and frame time on the same graphs.
Dead-man's-switch checks: a healthchecks.io-style service where your restart cron pings a URL on success. If the ping doesn't arrive on schedule, it alerts you. This catches a restart that never ran, which a normal uptime check can't see. Pair it with an external reachability check and you cover the two failure modes that blindside most operators.

Alerting Without the Noise

An alerting setup that cries wolf gets muted, and a muted alert is no alert. Aim for few messages, each one meaning something.

Use Discord webhooks so alerts land where your team already lives; a simple curl to a webhook URL is enough.
Alert on sustained conditions, not single blips: "down for 3 consecutive checks" beats "one failed ping."
Use hysteresis: fire when CPU stays above 90% for five minutes, recover below 75%, so you don't get a flapping storm.
Separate severity: a full disk or crash loop should ping a person; a brief tickrate dip can sit on the dashboard.

Logs and Crash Loops

Uptime checks tell you the server is down; logs tell you why and help you catch a crash loop before it empties the room. A crash loop fools a naive monitor: the process keeps coming back, so the port test passes intermittently while players never stay connected.

Tail the server console to a file and watch for repeated stack traces or the same resource erroring on every boot.
Count restarts per hour; more than a couple of unscheduled ones is a loop, and it should alert loudly.
Watch for the lines that precede a hang: script timeouts, OneSync errors, or a resource that never finishes loading.

When a loop starts, the fastest recovery is usually rolling back the last resource you changed.

A Dashboard the Whole Team Can Read

Your monitoring is only as good as the people who read it. Build one Grafana dashboard, pinned in your staff Discord, that a non-technical moderator can glance at: a green/red "is it up" panel, player count, host CPU and RAM, and frame time. Keep it to one screen so a mod can answer "is the server okay right now?" in two seconds.

Tuning, Protection and Better Building Blocks

Monitoring tells you something is wrong; fixing the cause is the next step. When your graphs point at frame time and resource load, dig into deeper performance tuning to find which scripts are burning your tick budget. Many "crashes" are really attacks or abuse, so pairing dashboards with proper server protection and health checks closes that gap. The best long-term fix is upstream: running well-built scripts that don't leak memory or stall the main thread means fewer alerts to begin with.

The payoff isn't fancy graphs; it's that you stop being surprised. You catch the failed restart at 3am from an alert instead of a player complaint at noon, and you spot a resource leak on a trend line days before it crashes the box. Start with one external check on :30120/dynamic.json and a Discord webhook today, then add the rest of the stack as you grow.