Prometheus monitoring and alerting

Prometheus is an open-source systems monitoring and alerting toolkit. Prometheus collects and stores metrics as time-series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.

Users can measure the internal status of a QuestDB instance via an HTTP endpoint exposed by QuestDB at port 9003. This document describes how to enable metrics via this endpoint, how to configure Prometheus to scrape metrics from a QuestDB instance, and how to enable alerting from QuestDB to Prometheus Alertmanager.

For guidance on what metrics to monitor and alerting strategies, see Monitoring and alerting.

Prerequisites

Scraping Prometheus metrics from QuestDB

QuestDB has a /metrics HTTP endpoint on port 9003 to expose Prometheus metrics. Before being able to query metrics, they must be enabled via the metrics.enabled key in server configuration:

/path/to/server.conf
metrics.enabled=true

When running QuestDB via Docker, port 9003 must be exposed and the metrics configuration can be enabled via the QDB_METRICS_ENABLED environment variable:

Docker
docker run \
-e QDB_METRICS_ENABLED=TRUE \
-p 8812:8812 -p 9000:9000 -p 9003:9003 -p 9009:9009 \
-v "$(pwd):/var/lib/questdb" \
questdb/questdb:9.3.3

To verify that metrics are being exposed correctly by QuestDB, navigate to http://<questdb_ip>:9003/metrics in a browser, where <questdb_ip> is the IP address of an instance, or execute a basic curl like the following example:

Given QuestDB running at 127.0.0.1
curl http://127.0.0.1:9003/metrics
# TYPE questdb_json_queries_total counter
questdb_json_queries_total 0

# TYPE questdb_memory_tag_MMAP_DEFAULT gauge
questdb_memory_tag_MMAP_DEFAULT 77872

# TYPE questdb_memory_malloc_count gauge
questdb_memory_malloc_count 659

# ...

To configure Prometheus to scrape these metrics, provide the QuestDB instance IP and port 9003 as a target. The following example configuration file questdb.yml assumes there is a running QuestDB instance on localhost (127.0.0.1) with port 9003 available:

questdb.yml
global:
scrape_interval: 5s
external_labels:
monitor: 'questdb'

scrape_configs:
- job_name: 'questdb'
scrape_interval: 5s
static_configs:
- targets: ['127.0.0.1:9003']

Start Prometheus and pass this configuration on launch:

prometheus --config.file=questdb.yml

Prometheus should be available on 0.0.0.0:9090 and navigating to http://0.0.0.0:9090/targets should show that QuestDB is being scraped successfully:

Prometheus targets tab showing a QuestDB instance status

In the graphing tab of Prometheus (http://0.0.0.0:9090/graph), autocomplete can be used to graph QuestDB-specific metrics which are all prefixed with questdb_:

Prometheus graphing tab showing QuestDB instance metrics on a chart

The following metrics are available:

Commit metrics

MetricTypeDescription
questdb_commits_totalcounterTotal commits of all types (in-order and out-of-order) executed on database tables.
questdb_o3_commits_totalcounterTotal out-of-order (O3) commits executed on database tables.
questdb_committed_rows_totalcounterTotal rows committed to database tables.
questdb_physically_written_rows_totalcounterTotal rows physically written to disk. Greater than committed_rows with out-of-order ingestion. Write amplification is physically_written_rows / committed_rows.
questdb_rollbacks_totalcounterTotal rollbacks executed on database tables.

Query metrics

MetricTypeDescription
questdb_json_queries_totalcounterTotal REST API queries, including retries.
questdb_json_queries_completed_totalcounterSuccessfully executed REST API queries.
questdb_json_queries_cachedgaugeCurrent cached REST API queries.
questdb_json_queries_cache_hits_totalcounterTotal cache hits for JSON queries.
questdb_json_queries_cache_misses_totalcounterTotal cache misses for JSON queries.
questdb_pg_wire_queries_totalcounterTotal PGWire queries.
questdb_pg_wire_queries_completed_totalcounterSuccessfully executed PGWire queries.
questdb_pg_wire_select_queries_cachedgaugeCurrent cached PGWire SELECT queries.
questdb_pg_wire_update_queries_cachedgaugeCurrent cached PGWire UPDATE queries.
questdb_pg_wire_select_cache_hits_totalcounterTotal cache hits for PGWire select queries.
questdb_pg_wire_select_cache_misses_totalcounterTotal cache misses for PGWire select queries.
questdb_pg_wire_errors_totalcounterTotal errors in PostgreSQL wire protocol.
questdb_unhandled_errors_totalcounterTotal unhandled errors. Usually indicates critical service degradation.

Connection metrics

MetricTypeDescription
questdb_http_connectionsgaugeCurrently active HTTP connections.
questdb_line_tcp_connectionsgaugeCurrently active ILP TCP connections.
questdb_pg_wire_connectionsgaugeCurrently active PGWire connections.

TLS certificate metrics (QuestDB Enterprise)

These gauges report the number of seconds until the active TLS certificate expires for each endpoint. Values update on certificate reload, making it straightforward to set up alerting for upcoming expirations.

MetricTypeDescription
questdb_tls_cert_ttl_seconds_httpgaugeSeconds until TLS certificate expires for the HTTP endpoint.
questdb_tls_cert_ttl_seconds_http_mingaugeMinimum TLS certificate TTL for the HTTP endpoint.
questdb_tls_cert_ttl_seconds_linegaugeSeconds until TLS certificate expires for the ILP endpoint.
questdb_tls_cert_ttl_seconds_pggaugeSeconds until TLS certificate expires for the PGWire endpoint.

WAL metrics

MetricTypeDescription
questdb_wal_written_rows_totalcounterTotal rows written to WAL.
questdb_wal_apply_written_rows_totalcounterTotal rows written during WAL apply.
questdb_wal_apply_physically_written_rows_totalcounterTotal physically written rows during WAL apply.
questdb_wal_apply_rows_per_secondgaugeRate of rows applied per second during WAL apply.
questdb_wal_seq_txngaugeSum of all committed transaction sequence numbers. Used with questdb_wal_writer_txn.
questdb_wal_writer_txngaugeSum of all applied transaction sequence numbers. With no pending WAL transactions, equals questdb_wal_seq_txn. A steadily growing lag indicates QuestDB cannot keep up with writes.
Renamed WAL metrics

questdb_wal_seq_txn_total and questdb_wal_writer_txn_total have been renamed to questdb_wal_seq_txn and questdb_wal_writer_txn respectively.

JVM garbage collection metrics

MetricTypeDescription
questdb_jvm_major_gc_count_totalcounterTimes major GC was triggered.
questdb_jvm_major_gc_time_totalcounterTotal time on major GC (ms).
questdb_jvm_minor_gc_count_totalcounterTimes minor GC pause was triggered.
questdb_jvm_minor_gc_time_totalcounterTotal time on minor GC pauses (ms).
questdb_jvm_unknown_gc_count_totalcounterTimes GC of unknown type was triggered. Non-zero only on non-mainstream JVMs.
questdb_jvm_unknown_gc_time_totalcounterTotal time on unknown type GC (ms). Non-zero only on non-mainstream JVMs.

JVM memory metrics

MetricTypeDescription
questdb_memory_jvm_freegaugeFree Java heap memory (bytes).
questdb_memory_jvm_totalgaugeCurrent Java heap size (bytes).
questdb_memory_jvm_maxgaugeMaximum Java heap memory (bytes).

Native memory metrics

MetricTypeDescription
questdb_memory_mem_usedgaugeCurrent allocated native memory.
questdb_memory_rssgaugeResident Set Size (Linux/Unix) / Working Set Size (Windows).
questdb_memory_malloc_countgaugeTimes native memory was allocated.
questdb_memory_realloc_countgaugeTimes native memory was reallocated.
questdb_memory_free_countgaugeTimes native memory was freed.

Native memory tag metrics

These gauges track memory allocated by specific QuestDB subsystems.

MetricDescription
questdb_memory_tag_MMAP_DEFAULTMmapped files.
questdb_memory_tag_MMAP_O3O3 mmapped files.
questdb_memory_tag_MMAP_TABLE_WRITERTable writer mmapped files.
questdb_memory_tag_MMAP_TABLE_READERTable reader mmapped files.
questdb_memory_tag_MMAP_INDEX_READERIndex reader mmapped files.
questdb_memory_tag_MMAP_INDEX_WRITERIndex writer mmapped files.
questdb_memory_tag_MMAP_INDEX_SLIDERIndexed column view mmapped files.
questdb_memory_tag_MMAP_BLOCK_WRITERBlock writer mmapped files.
questdb_memory_tag_MMAP_IMPORTImport operations.
questdb_memory_tag_MMAP_PARALLEL_IMPORTParallel import operations.
questdb_memory_tag_MMAP_PARTITION_CONVERTERPartition converter operations.
questdb_memory_tag_MMAP_SEQUENCER_METADATASequencer metadata.
questdb_memory_tag_MMAP_TABLE_WAL_READERTable WAL reader mmapped files.
questdb_memory_tag_MMAP_TABLE_WAL_WRITERTable WAL writer mmapped files.
questdb_memory_tag_MMAP_TX_LOGTransaction log mmapped files.
questdb_memory_tag_MMAP_TX_LOG_CURSORTransaction log cursor mmapped files.
questdb_memory_tag_MMAP_UPDATEUpdate operations.
questdb_memory_tag_NATIVE_DEFAULTUntagged native memory.
questdb_memory_tag_NATIVE_O3O3 operations.
questdb_memory_tag_NATIVE_RECORD_CHAINSQL record chains.
questdb_memory_tag_NATIVE_TREE_CHAINSQL tree chains.
questdb_memory_tag_NATIVE_COMPACT_MAPSQL compact maps.
questdb_memory_tag_NATIVE_FAST_MAPSQL fast maps.
questdb_memory_tag_NATIVE_FAST_MAP_INT_LISTFast map integer list.
questdb_memory_tag_NATIVE_LONG_LISTLong lists.
questdb_memory_tag_NATIVE_HTTP_CONNHTTP connections.
questdb_memory_tag_NATIVE_PGW_CONNPGWire connections.
questdb_memory_tag_NATIVE_REPLReplication tasks.
questdb_memory_tag_NATIVE_CB1Circular buffer 1.
questdb_memory_tag_NATIVE_CB2Circular buffer 2.
questdb_memory_tag_NATIVE_CB3Circular buffer 3.
questdb_memory_tag_NATIVE_CB4Circular buffer 4.
questdb_memory_tag_NATIVE_CB5Circular buffer 5.
questdb_memory_tag_NATIVE_CIRCULAR_BUFFERCircular buffers.
questdb_memory_tag_NATIVE_DIRECT_BYTE_SINKDirect byte sink.
questdb_memory_tag_NATIVE_DIRECT_CHAR_SINKDirect char sink.
questdb_memory_tag_NATIVE_DIRECT_UTF8_SINKDirect UTF-8 sink.
questdb_memory_tag_NATIVE_FUNC_RSSFunction RSS.
questdb_memory_tag_NATIVE_GROUP_BY_FUNCTIONGroup by function.
questdb_memory_tag_NATIVE_ILP_RSSILP RSS.
questdb_memory_tag_NATIVE_IMPORTNative import operations.
questdb_memory_tag_NATIVE_INDEX_READERNative index reader.
questdb_memory_tag_NATIVE_IO_DISPATCHER_RSSIO dispatcher RSS.
questdb_memory_tag_NATIVE_JITJIT compilation.
questdb_memory_tag_NATIVE_JIT_LONG_LISTJIT long list.
questdb_memory_tag_NATIVE_JOIN_MAPJoin map.
questdb_memory_tag_NATIVE_LATEST_BY_LONG_LISTLatest by long list.
questdb_memory_tag_NATIVE_LOGGERLogger.
questdb_memory_tag_NATIVE_MIGMIG operations.
questdb_memory_tag_NATIVE_MIG_MMAPMIG mmapped files.
questdb_memory_tag_NATIVE_OFFLOADOffload operations.
questdb_memory_tag_NATIVE_PARALLEL_IMPORTNative parallel import.
questdb_memory_tag_NATIVE_PATHPath operations.
questdb_memory_tag_NATIVE_ROSTIRosti operations.
questdb_memory_tag_NATIVE_SAMPLE_BY_LONG_LISTSample by long list.
questdb_memory_tag_NATIVE_SQL_COMPILERSQL compiler.
questdb_memory_tag_NATIVE_TABLE_READERNative table reader.
questdb_memory_tag_NATIVE_TABLE_WAL_WRITERNative table WAL writer.
questdb_memory_tag_NATIVE_TABLE_WRITERNative table writer.
questdb_memory_tag_NATIVE_TEXT_PARSER_RSSText parser RSS.
questdb_memory_tag_NATIVE_TLS_RSSTLS RSS.
questdb_memory_tag_NATIVE_UNORDERED_MAPUnordered map.

Worker metrics

MetricTypeDescription
questdb_workers_job_start_micros_maxgaugeMaximum time to start a worker job (microseconds).
questdb_workers_job_start_micros_mingaugeMinimum time to start a worker job (microseconds).

Most of the above metrics are volatile, i.e. they're collected since the current database start. The exceptions are questdb_wal_seq_txn and questdb_wal_writer_txn, because transaction sequence numbers are persistent.

Configuring Prometheus Alertmanager

note

Full details on logging configurations can be found within the Logging & Metrics documentation.

QuestDB includes a log writer that sends any message logged at critical level (by default) to Prometheus Alertmanager over a TCP/IP socket connection. To configure this writer, add it to the writers config alongside other log writers.

Alertmanager may be started via Docker with the following command:

docker run -p 127.0.0.1:9093:9093 --name alertmanager quay.io/prometheus/alertmanager

To discover the IP address of this container, run the following command which specifies alertmanager as the container name:

docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' alertmanager

To run QuestDB and point it towards Alertmanager for alerting, first create a file ./conf/log.conf with the following contents. 172.17.0.2 in this case is the IP address of the docker container for alertmanager that was discovered by running the docker inspect command above.

./conf/log.conf
# Which writers to enable
writers=stdout,alert

# stdout
w.stdout.class=io.questdb.log.LogConsoleWriter
w.stdout.level=INFO

# Prometheus Alerting
w.alert.class=io.questdb.log.LogAlertSocketWriter
w.alert.level=CRITICAL
w.alert.alertTargets=172.17.0.2:9093

Start up QuestDB in Docker using the following command:

docker run \
-p 9000:9000 -p 8812:8812 -p 9009:9009 -p 9003:9003 \
-v "$(pwd)::/var/lib/questdb" \
questdb/questdb:6.1.3

When alerts are successfully triggered, QuestDB logs will indicate the sent and received status:

2021-12-14T18:42:54.222967Z I i.q.l.LogAlertSocketWriter Sending: 2021-12-14T18:42:54.122874Z I i.q.l.LogAlertSocketWriter Sending: 2021-12-14T18:42:54.073978Z I i.q.l.LogAlertSocketWriter Received [0] 172.17.0.2:9093: {"status":"success"}
2021-12-14T18:42:54.223377Z I i.q.l.LogAlertSocketWriter Received [0] 172.17.0.2:9093: {"status":"success"}