How to convert metrics data into nested JSON format

i hava this plain text(metrics):

"""
# HELP app_name_self_repair_duration
# TYPE app_name_self_repair_duration gauge
app_name_self_repair_duration 0.265227453
# HELP app_name_db_duration 
# TYPE app_name_db_duration histogram
app_name_db_duration_bucket{query="get_transaction",le="0.01"} 443481
app_name_db_duration_bucket{query="get_transaction",le="0.025"} 447919
app_name_db_duration_bucket{query="get_transaction",le="0.05"} 456449
app_name_db_duration_bucket{query="get_transaction",le="0.1"} 472222
app_name_db_duration_bucket{query="get_transaction",le="0.3"} 472571
app_name_db_duration_bucket{query="get_transaction",le="0.5"} 472605
app_name_db_duration_bucket{query="get_transaction",le="0.8"} 472617
app_name_db_duration_bucket{query="get_transaction",le="1"} 472621
app_name_db_duration_bucket{query="get_transaction",le="1.5"} 472624
app_name_db_duration_bucket{query="get_transaction",le="+Inf"} 472624
app_name_db_duration_sum{query="get_transaction"} 1894.9452645920032
app_name_db_duration_count{query="get_transaction"} 472624
app_name_db_duration_bucket{query="get_transaction_chain",le="0.01"} 54589
app_name_db_duration_bucket{query="get_transaction_chain",le="0.025"} 55305
app_name_db_duration_bucket{query="get_transaction_chain",le="0.05"} 56016
app_name_db_duration_bucket{query="get_transaction_chain",le="0.1"} 59266
app_name_db_duration_bucket{query="get_transaction_chain",le="0.3"} 1461041
app_name_db_duration_bucket{query="get_transaction_chain",le="0.5"} 2154344
app_name_db_duration_bucket{query="get_transaction_chain",le="0.8"} 2159971
app_name_db_duration_bucket{query="get_transaction_chain",le="1"} 2160450
app_name_db_duration_bucket{query="get_transaction_chain",le="1.5"} 2160851
app_name_db_duration_bucket{query="get_transaction_chain",le="+Inf"} 2161614
app_name_db_duration_sum{query="get_transaction_chain"} 589251.955250992
app_name_db_duration_count{query="get_transaction_chain"} 2161614
# HELP app_name_replication_validation_duration 
# TYPE app_name_replication_validation_duration histogram
app_name_replication_validation_duration_bucket{le="0.01"} 953
app_name_replication_validation_duration_bucket{le="0.025"} 17268
app_name_replication_validation_duration_bucket{le="0.05"} 17598
app_name_replication_validation_duration_bucket{le="0.1"} 17705
app_name_replication_validation_duration_bucket{le="0.3"} 17837
app_name_replication_validation_duration_bucket{le="0.5"} 17913
app_name_replication_validation_duration_bucket{le="0.8"} 18019
app_name_replication_validation_duration_bucket{le="1"} 18099
app_name_replication_validation_duration_bucket{le="1.5"} 18266
app_name_replication_validation_duration_bucket{le="+Inf"} 18285
app_name_replication_validation_duration_sum 697.9178646500011
app_name_replication_validation_duration_count 18285
# HELP app_name_crypto_tpm_sign_duration 
# TYPE app_name_crypto_tpm_sign_duration histogram
app_name_crypto_tpm_sign_duration_bucket{le="0.01"} 3663
app_name_crypto_tpm_sign_duration_bucket{le="0.025"} 209901
app_name_crypto_tpm_sign_duration_bucket{le="0.05"} 254318
app_name_crypto_tpm_sign_duration_bucket{le="0.1"} 264880
app_name_crypto_tpm_sign_duration_bucket{le="0.2"} 265617
app_name_crypto_tpm_sign_duration_bucket{le="0.3"} 266163
app_name_crypto_tpm_sign_duration_bucket{le="0.4"} 266329
app_name_crypto_tpm_sign_duration_bucket{le="0.5"} 266329
app_name_crypto_tpm_sign_duration_bucket{le="1"} 266329
app_name_crypto_tpm_sign_duration_bucket{le="+Inf"} 266330
app_name_crypto_tpm_sign_duration_sum 5177.010243896026
app_name_crypto_tpm_sign_duration_count 266330
"""

i wanted to convert it into JSON format


[
  {
    "help": "app_name_db_duration",
    "type": "histogram",
    "metrics": [
      {
        "labels": {
          "method": "query",
          "label_handler": "get_transaction"
        },
        "quantiles": {
          "0.01": "443481",
          "0.025": "447919",
          "0.05": "456449",
          "0.1": "472222"
        },
        "count": "472624",
        "sum": "1894.9452645920032"
      },
      {
        "labels": {
          "method": "query",
          "label_handler": "get_transaction_chain"
        },
        "quantiles": {
          "0.99": "3542.9",
          "0.9": "1202.3",
          "0.5": "1002.8"
        },
        "count": "4",
        "sum": "345.01"
      }
    ]
  },
  {
    "help": "app_name_replication_validation_duration",
    "type": "histogram",
    "metrics": [
      {
        "quantiles": {
          "0.99": "3542.9",
          "0.9": "1202.3",
          "0.5": "1002.8"
        },
        "count": "4",
        "sum": "345.01"
      }
    ]
  },
  {
    "help": "app_name_crypto_tpm_sign_duration",
    "type": "histogram",
    "metrics": [
      {
        "quantiles": {
          "0.99": "3542.9",
          "0.9": "1202.3",
          "0.5": "1002.8"
        },
        "count": "4",
        "sum": "345.01"
      }
    ]
  }
]

i’m trying since a week to achieve this json format out of that text but with no luck i stuck there

Where did you get stuck?

I’d split the task into two problems: parsing the text and building the output. The first one seems a bit more challenging since you need to build a parser. You can do it by hand, but you can also use Nimble Parsec here which could help make the code more readable.

4 Likes

While You should use a parser… You can also use Regex.

iex> Regex.named_captures ~r/(?<key>[^{]*)(?<opt>{.*})?\ (?<value>.*$)/, "app_name_replication_validation_duration_bucket{le=\"0.01\"} 953"
%{
  "key" => "app_name_replication_validation_duration_bucket",
  "opt" => "{le=\"0.01\"}",
  "value" => "953"
}

1 Like
defmodule Parser do

  def run(content) do
    content
    |> String.split("\n", trim: true)
    |> Enum.reduce(%{current: %{}, metrics: []}, fn line, acc ->

      %{ content: line, current: acc.current, metrics: acc.metrics}
      |> parse_help()
      |> parse_type()
      |> parse_metric()
    end)
    |> finalize()
  end

  defp finalize(%{current: current, metrics: metrics}) do
    [current | metrics]
  end

  defp parse_help(acc = %{content: content, current: current}) do
    case Regex.run(~r/HELP (.*)/, content) do
      [_, name] ->
        case Map.get(current, :name) do
          nil ->
            Map.put(acc, :current, %{name: String.trim(name) })
          _ ->
            acc
            |> Map.update!(:metrics, &[current | &1])
            |> Map.put(:current, %{name: name |> String.trim() })
        end
        
      _ ->
        acc
    end
  end

  defp parse_type(acc = %{content: content, current: current}) do
    case Regex.run(~r/TYPE (.*) (.*)/, content) do
      [_, _, type] ->
      %{ acc | current: Map.put(current, :type, type) }
    _ ->
      acc
    end
  end

  defp parse_metric(acc = %{content: content, current: current = %{name: metric_name}}) do    
    case Regex.run(~r/^#{metric_name}_(.*){(.*)} (.*)$/, content) do
      [_, "bucket", labels, value] ->
        %{ acc | current: add_bucket(labels, current, value) }
      [_, "sum", _labels, value] ->
        %{ acc | current: Map.put(current, :sum, value) }
      [_, "count", _labels, value] ->
        %{ acc | current: Map.put(current, :count, value) }
      nil ->
        case Regex.run(~r/^#{metric_name} (.*)$/, content) do
          [_, value] ->
            %{ acc | current: Map.put(current, :value, value) }
          _ ->
            acc
        end
      _ ->
        acc
    end
  end

  defp add_bucket(labels, current, value) do
    labels
    |> String.split(",", trim: true)
    |> Enum.reduce(current, fn label, acc ->
      [k, v] = String.split(label, "=")

      case k do
        "le" ->
          Map.update(acc, :quantiles, %{ v => value }, &Map.put(&1, v, value))
        _ ->
          Map.update(acc, :labels, %{ k => v}, &Map.put(&1, k, v))
      end
    end)
  end
end

with this code i’m in half way but the challenge is updating nested maps, if you see json format above the metrics has two layer’s of nesting, updating this laters i’m facing challenges and other challenge is some lines has multiple labels and bucket values i have to capture that also

I like your approach, can you please elaborate it more

The idea would be to split the parsing phase from the transformation phase, so you end up with two, but smaller and isolated problems to solve. On a high-level the code would be:

def run(content) do
  content
  |> parse()
  |> to_json()
end

The job of parse would be to turn the text into some data structure, probably one that is close to the input. The job of to_json would be to transform the intermediate data structure to the desired JSON.

Looking just at the example you gave, I’d go with something like this for the data structure:

defmodule Metric do
  defstruct [
    # e.g. app_name_self_repair_duration
    :name,
    # "gauge" or "histogram"
    :type,
    # only for type "gauge"
    :gauge_value,
    # only for type "histogram"
    :histogram_buckets
    ]
end

defmodule HistogramBucket do
  defstruct [
    # e.g. get_transaction
    :name,
    # tuples {"0.01", 443481} or a separate struct
    :quantiles,
    :sum,
    :count,
  ]
end

It should be pretty straightforward to build a list of %Metric{}s from the input you’re given and then transform it to the output JSON.

1 Like

thank you @stefanchrobot i’m working on some idea given by my senior

i’m done with converting text into JSON structure,
now i wanted to do Aggregate histogram metrics,
to know what is the response time in given milliseconds
i have these below details:

in less then or equal to 10ms["0.01"] => number of events occurred 443481

in less then 25ms["0.025"] => events 447919

in less then 50ms["0.05"] => events 456449

in less then 100ms["0.1"] => events 472222

in less then 300ms["0.3"] => events 472571

in less then 500ms["0.5"] => events 472605

in less then 800ms["0.8"] => events 472617

in less then 1000ms["1"] => events 472621

in less then 1500ms["1.5"] => events 472624

prometheus has built-in tools for it
elastic search also has its own tools
but i could not get what logic they applied?
because i wanted to do my own functions to calculate and then use echarts to show results on graph

what logic to apply? to calculate response time