{"id":269212,"date":"2023-07-31T13:01:23","date_gmt":"2023-07-31T18:01:23","guid":{"rendered":"https:\/\/www.webscale.com\/?p=269190"},"modified":"2023-12-29T15:30:58","modified_gmt":"2023-12-29T20:30:58","slug":"prometheus-querying","status":"publish","type":"post","link":"https:\/\/www.webscale.com\/blog\/prometheus-querying\/","title":{"rendered":"Prometheus Querying – Breaking Down PromQL"},"content":{"rendered":"
Prometheus has its own language specifically dedicated to queries called\u00a0PromQL<\/a>. It is a powerful functional expression language, which lets you filter with Prometheus\u2019 multi-dimensional time-series labels. The result of each expression can be shown either as a graph, viewed as tabular data within Prometheus\u2019 own expression browser, or consumed via external systems via the\u00a0HTTP API<\/a>.<\/p>\n PromQL can be a difficult language to understand, particularly if you are faced with an empty input field and are having to come up with the formation of queries on your own. This article is a primer dedicated to the basics of how to run Prometheus queries.<\/p>\n You will need only three tools:<\/p>\n Through query building, you will end up with a graph per CPU by the deployment.<\/p>\n The core part of any query in PromQL are the metric names of a time-series. Indeed, all Prometheus metrics are time based data. There are four parts to every metric. Taking the\u00a0 The parts are:<\/p>\n Each distinct metric_name & label combination is called a\u00a0time-series<\/strong>\u00a0(often just called a series in the documentation). If each series only has a single value for each timestamp, as in the above example, the collection of series returned from a query is called an\u00a0instant-vector<\/strong>. If each series has multiple values, it is referred to as a\u00a0range-vector<\/strong>. These are generated by appending a time selector to the instant-vector in square brackets (e.g. [5m] for five minutes). The instant vector and range vector are two of four types of expression language; the final two are scalar, a simple numeric floating point value, and string, a simple string value. See Range Selectors below for further information on this.<\/p>\n All of these metrics are scraped from\u00a0exporters<\/a>. Prometheus scrapes these metrics at regular intervals. The setting for when the intervals should occur is specified by the scrape_interval in the prometheus.yaml config. Most scrape intervals are 30s. This means that every 30s, there will be a new data point with a new timestamp. The value may or may not have changed, but at every scrape_interval, there will be a new datapoint.<\/p>\n There are four types of metrics:<\/p>\n We\u2019re going to deal with counters for this analysis, as it\u2019s the most common metric type.<\/p>\n The structure of a basic Prometheus query looks very much like a metric. You start with a metric name. If you just query\u00a0 Next, you can filter the query using labels. Label filters support four operators:<\/p>\n Label filters go inside the {} after the metric name, so an equality match looks like:<\/p>\n which will return only\u00a0 Regex matches use the\u00a0RE2 syntax<\/a>. If you\u2019re familiar with\u00a0PCRE<\/a>, it will look much the same, but it doesn\u2019t support backreferences (which really shouldn\u2019t matter here anyway).<\/p>\n You can also use multiple label filters, separated by a comma. Multiple label filters are an \u201cAND\u201d query, so in order to be returned, a metric must match all the label filters. will return all\u00a0 By appending a\u00a0range duration<\/a>\u00a0to a query, we get multiple values for each timestamp. This is referred to as a range-vector. The values for each timestamp will be the values recorded in the time series back in time, taken from the timestamp for the length of time given in the range duration.<\/p>\n As an example, take\u00a0 We get two values for each series because the varnish scrape config specifies that it has a 30 second interval, so if you look at the timestamps after the @ symbol in the value, you can see that they are exactly 30 seconds apart. If you graphed these series without the range selector and inspected the value of the lines at those timestamps, it would show these values.<\/p>\n Now, range-vectors can\u2019t be graphed because they have multiple values for each timestamp. If you select the Graph tab in the Prometheus web UI on a range-vector, you\u2019ll see this message:<\/p>\n A range-vector is typically generated in order to then apply a function to it to get an instant-vector, which can be graphed (only instant vectors can be graphed). Prometheus has\u00a0many functions<\/a>\u00a0for both instant and range vectors. The more commonly used functions for working with range-vectors are:<\/p>\n Your selection of range duration will determine how granular your chart is. A [1m] duration, for instance, will give a very spiky chart, making it difficult to visualize a trend, looking something like this:<\/p>\n For a one hour view, [5m] would show a decent view:<\/p>\n For longer time-spans, you may want to set a longer range duration to help smooth out spikes and achieve more of a long-term trend view. Compare the three day view with a [5m] duration to a [30m] duration:<\/p>\n\n
Prometheus Querying<\/h2>\n
varnish_main_client_req<\/code>\u00a0metric as an example:<\/p>\n
\n
\n
varnish_main_client_req<\/code>)<\/li>\n
namespace=\"section-b4a199920b24b\"<\/code>). Each metric will have at least a\u00a0
job<\/code>\u00a0label, which corresponds to the scrape config in the prometheus config.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n
\n
\n
varnish_main_client_req<\/code>\u00a0is an example of this, which provides the total number of HTTP requests that the varnish instance has handled in its life.<\/li>\n
node_memory_utilisation<\/code>, which provides the current percentage of memory used on each node.<\/li>\n
Query Structure<\/h2>\n
varnish_main_client_req<\/code>, every one of those metrics for every varnish pod in every namespace will get returned. If you do this in Grafana, you risk crashing the browser tab as it tries to render so many data points simultaneously.<\/p>\n
\n
=<\/code>\u00a0equal<\/li>\n
!=<\/code>\u00a0not-equal<\/li>\n
=~<\/code>\u00a0matches regex<\/li>\n
!~<\/code>\u00a0doesn\u2019t match regex<\/li>\n<\/ul>\n
varnish_main_client_req{namespace=\"section-9469f9cc28d8d\"}<\/code><\/pre>\n
varnish_main_client_req<\/code>\u00a0metrics with that exact namespace.<\/p>\n
\nFor instance,<\/p>\nvarnish_main_client_req{namespace=~\".*3.*\",namespace!~\".*env4.*\"}<\/code><\/pre>\n
varnish_main_client_req<\/code>\u00a0metrics with a 3 in their namespace that don\u2019t also contain\u00a0env4<\/em>.<\/p>\n
Range Selectors<\/h2>\n
varnish_main_client_req{namespace=\"section-9469f9cc28d8d\"}<\/code>. If we add a [1m] range selector we now get this:<\/p>\n
Error executing query: invalid expression type \"range vector\" for range query, must be Scalar or instant Vector<\/code><\/p>\n
\n
rate()<\/code>\u00a0– calculates the per-second average rate of increase of the time series in the range vector over the whole range.<\/li>\n
irate()<\/code>\u00a0– calculates the per-second average rate of increase of the time series in the range vector\u00a0using only the last two data points in the range<\/strong>.<\/li>\n
increase()<\/code>\u00a0– calculates the increase in the time series per the time range selected. It\u2019s basically rate multiplied by the number of seconds in the time range selector.<\/li>\n<\/ul>\n