Migrating From GCP Cloud Monitoring to Prometheus and Grafana on Akamai

Traducciones al Español
Estamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
Create a Linode account to try this guide with a $ credit.
This credit will be applied to any valid services used during your first  days.

Cloud Monitoring is an observability solution from Google Cloud Platform (GCP). It allows users to monitor their applications, infrastructure, and services within the GCP ecosystem as well as in external and hybrid environments. Cloud Monitoring provides real-time insights into system health, performance, and availability by collecting metrics, logs, and traces.

This guide walks through how to migrate standard GCP Cloud Monitoring service logs, metrics, and monitoring to a Prometheus and Grafana software stack on a Linode instance. To illustrate the migration process, an example Flask-based Python application running on a separate instance is configured to send logs and metrics to Cloud Monitoring, and then modified to integrate with Prometheus and Grafana. While this guide uses a Flask application as an example, the principles can be applied to any workload currently monitored via Cloud Monitoring.

Introduction to Prometheus and Grafana

Prometheus is a time-series database that collects and stores metrics from applications and services. It provides a foundation for monitoring system performance using the PromQL query language to extract and analyze granular data. Prometheus autonomously scrapes (pulls) metrics from targets at specified intervals, efficiently storing data through compression while retaining the most critical details. It also supports alerting based on metric thresholds, making it suitable for dynamic, cloud-native environments.

Grafana is a visualization and analytics platform that integrates with Prometheus. It enables users to create real-time, interactive dashboards, visualize metrics, and set up alerts to gain deeper insights into system performance. Grafana can unify data from a wide array of data sources, including Prometheus, to provide a centralized view of system metrics.

Prometheus and Grafana are considered industry standard, and are commonly used together to monitor service health, detect anomalies, and issue alerts. Being both open source and platfrom-agnostic allows them to be deployed across a diverse range of cloud providers and on-premise infrastructures. Organizations often adopt these tools to reduce operational costs while gaining greater control over how data is collected, stored, and visualized.

Prometheus and Grafana Marketplace App
If you prefer an automatic deployment rather than the manual installation steps in this guide, Prometheus and Grafana can be deployed through our Prometheus and Grafana Marketplace app.

Before You Begin

  1. If you do not already have a virtual machine to use, create a Compute Instance for the Prometheus and Grafana stack using the steps in our Get Started and Create a Compute Instance guides:

    • Prometheus and Grafana instance requirements: Linode 8 GB Shared CPU plan, Ubuntu 24.04 LTS distribution

    Use these steps if you prefer to use the Linode CLI to provision resources.

    The following command creates a Linode 8 GB compute instance (g6-standard-4) running Ubuntu 24.04 LTS (linode/ubuntu24.04) in the Miami datacenter (us-mia):

    linode-cli linodes create \
        --image linode/ubuntu24.04 \
        --region us-mia \
        --type g6-standard-4 \
        --root_pass PASSWORD \
        --authorized_keys "$(cat ~/.ssh/id_rsa.pub)" \
        --label monitoring-server

    Note the following key points:

    • Replace the region as desired.
    • Replace PASSWORD with a secure alternative for your root password.
    • This command assumes that an SSH public/private key pair exists, with the public key stored as id\_rsa.pub in the user’s $HOME/.ssh/ folder.
    • The --label argument specifies the name of the new server (monitoring-server).

    To emulate a real-world workload, the examples in this guide use an additional optional instance to run an example Flask Python application. This application produces sample metrics and is used to illustrate configuration changes when switching from GCP Cloud Monitoring to an alternative monitoring solution. This instance can live on GCP or other infrastructure (such as a Linode) as long as it is configured to send metrics to GCP Cloud Monitoring.

    • Example Flask app instance requirements: 1 GB Shared CPU, Ubuntu 24.04 LTS distribution
  2. Follow our Set Up and Secure a Compute Instance guide to update each system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access.

Note
This guide is written for a non-root user. Commands that require elevated privileges are prefixed with sudo. If you’re not familiar with the sudo command, see the Users and Groups guide.

Install Prometheus as a Service

  1. To install Prometheus, login via SSH to your Linode instance as your limited sudo user:

    ssh SUDO_USER@LINODE_IP
  2. Create a dedicated user for Prometheus, disable its login, and create the necessary directories for Prometheus:

    sudo useradd --no-create-home --shell /bin/false prometheus
    sudo mkdir /etc/prometheus
    sudo mkdir /var/lib/prometheus
  3. Download the latest version of Prometheus from its GitHub repository:

    wget https://github.com/prometheus/prometheus/releases/download/v2.55.1/prometheus-2.55.1.linux-amd64.tar.gz

    This guide uses version 2.55.1. Check the project’s releases page for the latest version that aligns with your instance’s operating system.

  4. Extract the compressed file and navigate to the extracted folder:

    tar xzvf prometheus-2.55.1.linux-amd64.tar.gz
    cd prometheus-2.55.1.linux-amd64
  5. Move both the prometheus and promtool binaries to /usr/local/bin:

    sudo cp prometheus /usr/local/bin
    sudo cp promtool /usr/local/bin

    The prometheus binary is the main monitoring application, while promtool is a utility application that queries and configures a running Prometheus service.

  6. Move the configuration files and directories to the /etc/prometheus folder you created previously:

    sudo cp -r consoles /etc/prometheus
    sudo cp -r console_libraries /etc/prometheus
    sudo cp prometheus.yml /etc/prometheus/prometheus.yml
  7. Set the correct ownership permissions for Prometheus files and directories:

    sudo chown -R prometheus:prometheus /etc/prometheus
    sudo chown -R prometheus:prometheus /var/lib/prometheus
    sudo chown prometheus:prometheus /usr/local/bin/prometheus
    sudo chown prometheus:prometheus /usr/local/bin/promtool

Create a systemd Service File

A systemd service configuration file must be created to run Prometheus as a service.

  1. Create the service file using the text editor of your choice. This guide uses nano.

    sudo nano /etc/systemd/system/prometheus.service

    Add the following content to the file, and save your changes:

    File: /etc/systemd/system/prometheus.Service
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    
    [Unit]
    Description=Prometheus Service
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    User=prometheus
    Group=prometheus
    Type=simple
    ExecStart=/usr/local/bin/prometheus \
        --config.file=/etc/prometheus/prometheus.yml \
        --storage.tsdb.path=/var/lib/prometheus \
        --web.console.templates=/etc/prometheus/consoles \
        --web.console.libraries=/etc/prometheus/console_libraries
    
    [Install]
    WantedBy=multi-user.target
  2. Reload the systemd configuration files to apply the new service file:

    sudo systemctl daemon-reload
  3. Using systemctl, start the flash-app service and enable it to automatically start after a system reboot:

    sudo systemctl start prometheus
    sudo systemctl enable prometheus
  4. Verify that Prometheus is running:

    systemctl status prometheus

    The output should display active (running), confirming a successful setup:

    ● prometheus.service - Prometheus Service
         Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; preset: enabled)
         Active: active (running) since Thu 2024-12-05 16:11:57 EST; 5s ago
       Main PID: 1165 (prometheus)
          Tasks: 9 (limit: 9444)
         Memory: 16.2M (peak: 16.6M)
            CPU: 77ms
         CGroup: /system.slice/prometheus.service

    When done, press the Q key to exit the status output and return to the terminal prompt.

  5. Open a web browser and visit your instance’s IP address on port 9090 (Prometheus’s default port):

    http://IP_ADDRESS:9090

    The Prometheus UI should appear:

    Note
    Prometheus settings are configured in the /etc/prometheus/prometheus.yml file. This guide uses the default values. For production systems, consider enabling authentication and other security measures to protect your metrics.

Install the Grafana Service

Grafana provides an apt repository, reducing the number of steps needed to install and update it on Ubuntu.

  1. Install the necessary package to add new repositories:

    sudo apt install software-properties-common -y
  2. Import and add the public key for the Grafana repository:

    wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
    sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
  3. Update the package index and install Grafana:

    sudo apt update
    sudo apt install grafana -y
  4. The installation process already sets up the systemd configuration for Grafana. Start and enable the Grafana service:

    sudo systemctl start grafana-server
    sudo systemctl enable grafana-server
  5. Run the following command to verify that Grafana is active (running):

    systemctl status grafana-server
    ● grafana-server.service - Grafana instance
         Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; preset: enabled)
         Active: active (running) since Thu 2024-12-05 13:57:10 EST; 8s ago
           Docs: http://docs.grafana.org
       Main PID: 3434 (grafana)
          Tasks: 14 (limit: 9444)
         Memory: 71.4M (peak: 80.4M)
            CPU: 2.971s
         CGroup: /system.slice/grafana-server.service

Connect Grafana to Prometheus

  1. Open a web browser and visit your instance’s IP address on port 3000 (Grafana’s default port) to access the Grafana web UI:

    http://IP_ADDRESS:3000
  2. Login using the default credentials of admin for both the username and password:

    Grafana login page showing fields for entering username and password.

  3. After logging in, you are prompted to enter a secure replacement for the default password:

    Grafana user interface prompting for a new password after the first login.

  4. Add Prometheus as a data source by expanding the Home menu, navigating to the Connections entry, and clicking Add new connection:

    Grafana home menu with the option to add a new connection under the Connections section.

  5. Search for and select Prometheus.

  6. Click Add new data source.

  7. In the URL field, enter http://localhost:9090.

  8. Click Save & Test to confirm the connection.

    If successful, your Grafana installation is now connected to the Prometheus installation running on the same Linode.

Configure Example Flask Server

This guide demonstrates the migration process using an example Flask app running on a separate instance from which metrics and logs can be collected.

  1. Log in to the instance running the example Flask application as a user with sudo privileges.

  2. Create a directory for the project named exmaple-flask-app and navigate into it:

    mkdir example-flask-app
    cd example-flask-app
  3. Using a text editor of your choice, create a file called app.py:

    nano app.py

    Give it the following contents. Replace YOUR_PROJECT_ID with your actual project ID:

    File: app.py
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    
    import json
    import logging
    import time
    
    from flask import Flask, request
    from google.cloud import monitoring_v3 # Note: pip install google-cloud-monitoring
    
    logging.basicConfig(filename='flask-app.log', level=logging.INFO)
    logger = logging.getLogger(__name__)
    
    app = Flask(__name__)
    
    # Google Cloud Monitoring setup
    metric_client = monitoring_v3.MetricServiceClient()
    project_id = 'YOUR_PROJECT_ID'  # replace with your project ID
    project_name = f"projects/{project_id}"
    
    @app.before_request
    def start_timer():
        request.start_time = time.time()
    
    @app.after_request
    def send_latency_metric(response):
        latency = time.time() - request.start_time
    
        # Send latency metric to Google Cloud Monitoring
        series = monitoring_v3.TimeSeries()
        series.metric.type = 'custom.googleapis.com/EndpointLatency'
        series.resource.type = 'global'
        series.metric.labels['endpoint'] = request.path
        series.metric.labels['method'] = request.method
    
        point = monitoring_v3.Point()
        now = time.time()
        seconds = int(now)
        nanos = int((now - seconds) * 10**9)
        point.interval.end_time.seconds = seconds
        point.interval.end_time.nanos = nanos
        point.value.double_value = latency
    
        series.points.append(point)
        metric_client.create_time_series(name=project_name, time_series=[series])
    
        return response
    
    @app.route('/')
    def hello_world():
        logger.info("A request was received at the root URL")
        return {'message': 'Hello, World!'}, 200
    
    if __name__ == '__main__':
        app.run(host='0.0.0.0', port=8080)

    When done, save your changes, and close the text editor.

  4. Create a separate text file called requirements.txt:

    nano requirements.txt

    Provide it with the following basic dependencies for the Flask application to function, and save your changes:

    File: requirements.txt
    1
    2
    3
    4
    5
    
    Flask==3.0.3
    itsdangerous==2.2.0
    Jinja2==3.1.4
    MarkupSafe==2.1.5
    Werkzeug==3.0.4
  5. A virtual environment is required to run pip commands in Ubuntu 24.04 LTS. Use the following command to install python3.12-venv:

    sudo apt install python3.12-venv
  6. Using the venv utility, create a virtual environment named venv within the example-flask-app directory:

    python3 -m venv venv
  7. Activate the venv virtual environment:

    source venv/bin/activate
  8. Use pip to install the example Flask application’s dependencies from the requirements.txt file:

    pip install -r requirements.txt
  9. Also using pip, install google-cloud-monitoring, which is required for interfacing with GCP resources:

    pip install google-cloud-monitoring
  10. Exit the virtual environment:

    deactivate

Create a systemd Service File

  1. Create a systemd service file for the example Flask app:

    sudo nano /etc/systemd/system/flask-app.service

    Provide the file with the following content, replacing USERNAME with your actual sudo user:

    File: /etc/systemd/system/flask-app.service
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
    [Unit]
    Description=Flask Application Service
    After=network.target
    
    [Service]
    User=USERNAME
    WorkingDirectory=/home/USERNAME/example-flask-app
    ExecStart=/home/USERNAME/example-flask-app/venv/bin/python /home/USERNAME/example-flask-app/app.py
    Restart=always
    
    [Install]
    WantedBy=multi-user.target

    Save your changes when complete.

  2. Reload the systemd configuration files to apply the new service file, then start and enable the service:

    sudo systemctl daemon-reload
    sudo systemctl start flask-app
    sudo systemctl enable flask-app
  3. Verify that the flask-app service is active (running):

    systemctl status flask-app
    ● flask-app.service - Flask Application Service
         Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled)
         Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago
       Main PID: 4413 (python)
          Tasks: 1 (limit: 9444)
         Memory: 20.3M (peak: 20.3M)
            CPU: 196ms
         CGroup: /system.slice/flask-app.service

    Once the Flask application is running, GCP Cloud Monitoring can monitor its data.

  4. Generate data by issuing an HTTP request using the following cURL command. Replace FLASK_APP_IP_ADDRESS with the IP address of the instance where the Flask app is running:

    curl http://FLASK_APP_IP_ADDRESS:8080

    You should receive the following response:

    {"message": "Hello, World!"}

Migrate from GCP Cloud Monitoring to Prometheus and Grafana

Migrating from GCP Cloud Monitoring to Prometheus and Grafana requires planning to ensure the continuity of your monitoring capabilities. Transitioning from GCP Cloud Monitoring provides greater control over data storage and handling while unlocking the advanced customization and visualization features offered by Prometheus and Grafana.

Assess Current Monitoring Requirements

Begin by cataloging all metrics currently monitored in GCP Cloud Monitoring. So that you can recreate similar monitoring with Prometheus, identify common metrics for web applications, such as latency, request rates, CPU usage, and memory consumption. Remember to document existing alert configurations, as alerting strategies must also be ported to Prometheus Alertmanager.

Using the example Python Flask application, GCP Cloud Monitoring collects key metrics such as API requests, latency, and application logs. This may vary depending on your application. Below are examples of metrics visualized in GCP Cloud Monitoring dashboards:

  • API Requests Over Time: This dashboard tracks the total number of API requests served by the application:

  • CPU Utilization: This metric monitors the CPU usage of the underlying infrastructure without requiring additional configuration.

  • API Request Latency: This dashboard visualizes the amount of time it takes to serve API requests:

The metrics shown above are typically tracked in a web application. GCP Cloud Monitoring provides these metrics by default when deployed in a GCP Compute Engine, without the need to modify the application code. Documenting these existing metrics and alerts can help you configure equivalent monitoring using Prometheus and Grafana.

Export Existing Cloud Monitoring Logs and Metrics

GCP Cloud Logging integrates with Cloud Monitoring and allows you to create sinks that export logs to different destinations. Sinks can be configured to filter logs for a specific application, exporting only relevant entries. Below is an example sink that facilitates the export of logs from GCP:

The Cloud Monitoring API allows you to programmatically retrieve metric data. Once this data is retrieved, it can be stored locally or sent to another monitoring system. The Google Cloud Managed Service for Prometheus includes an adapter to fetch GCP metrics directly. This avoids the need for manual exporting or scripts, providing real-time observability as if the metrics were local to Prometheus.

GCP Cloud Monitoring has default data retention policies that may limit the availability of historical data. Ensure the exported data frequency meets system requirements, especially when using the API since data may need to be reformatted to match the destination’s schema. For example, some destinations may require data formatted as JSON, while others may need CSV.

To avoid unexpected costs, review GCP’s billing policies. GCP may charge for API calls and data exports, especially when querying metrics at high frequency.

Expose Application Metrics to Prometheus

Prometheus works differently from GCP Cloud Monitoring: instead of pushing data like GCP Cloud Monitoring, Prometheus pulls metrics from the monitored application. After assessing or exporting metrics as needed, modify the application to enable Prometheus metric scraping so that it collects the same metrics previously sent to GCP Cloud Monitoring. This process varies from application to application.

For the example Flask application in this guide, the prometheus_flask_exporter library is a standard library that can be used for instrumenting Flask applications to expose Prometheus metrics.

  1. Reactivate the venv virtual environment:

    source venv/bin/activate
  2. Use pip to install the prometheus_client and prometheus_flask_exporter libraries:

    pip install prometheus_client prometheus_flask_exporter
  3. Exit the virtual environment:

    deactivate
  4. Using a text editor of your choice, open the app.py file for the Flask application:

    nano app.py

    Replace the file’s current GCP Cloud Monitoring-specific contents with the Prometheus-specific code below, making sure to replace USERNAME with your actual username:

File: app.py
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
 import logging
 import random
 import time

 from flask import Flask
 from prometheus_flask_exporter import PrometheusMetrics

 logging.basicConfig(filename="flask-app.log", level=logging.INFO)
 logger = logging.getLogger(__name__)

 app = Flask(__name__)
 metrics = PrometheusMetrics(app)

 metrics.info("FlaskApp", "Application info", version="1.0.0")


 @app.route("/")
 def hello_world():
     logger.info("A request was received at the root URL")
     return {"message": "Hello, World!"}, 200


 @app.route("/long-request")
 def long_request():
     n = random.randint(1, 5)
     logger.info(
         f"A request was received at the long-request URL. Slept for {n} seconds"
     )
     time.sleep(n)
     return {"message": f"Long running request with {n=}"}, 200


 if __name__ == "__main__":
     app.run(host="0.0.0.0", port=8080)
 ```

 These lines use the `prometheus_flask_exporter` library to:

 -   Instrument the Flask app for Prometheus metrics.
 -   Expose default and application-specific metrics at the `/metrics` endpoint.
 -   Provide metadata such as version information via `metrics.info`.

1.  Save and close the file, then restart the `flask-app` service:

 ```command
 sudo systemctl restart flask-app
 ```

1.  Verify that the `flask-app` service is `active (running)`:

 ```command
 systemctl status flask-app
 ```

 ```output
  flask-app.service - Flask Application Service
      Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled)
      Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago
    Main PID: 4413 (python)
       Tasks: 1 (limit: 9444)
      Memory: 20.3M (peak: 20.3M)
         CPU: 196ms
      CGroup: /system.slice/flask-app.service
 ```

1.  Test to see if the Flask app is accessible by issuing the following cURL command. Replace FLASK_APP_IP_ADDRESS with the IP address of the instance where the Flask app is running:

 ```command
 curl http://FLASK_APP_IP_ADDRESS:8080
 ```

 You should receive the following response:

 ```output
 {"message": "Hello, World!"}
 ```

1.  To view the metrics, open a web browser and visit the following URL:

 ```command
 http://FLASK_APP_IP_ADDRESS:8080/metrics
 ```

 The metrics shown include `http_request_duration_seconds` (request latency) and `http_requests_total` (total number of requests).

### Configure Prometheus to Ingest Application Metrics

1.  Using a text editor, open and modify the Prometheus configuration at `/etc/prometheus/prometheus.yml` to include the Flask application as a scrape target:

 ```command
 sudo nano /etc/prometheus/prometheus.yml
 ```

 Append the following content to the `scrape_configs` section of the file, replacing FLASK_APP_IP_ADDRESS with the IP address of your `monitoring-server` instance:

 ```file {title="/etc/prometheus/prometheus.yml"}
   - job_name: 'flask_app'
     static_configs:
       - targets: ['FLASK_APP_IP_ADDRESS:8080']
 ```

 This configuration tell Prometheus to scrape metrics from the Flask application running on port `8080`.

1.  Save the file, and restart Prometheus to apply the changes:

 ```command
 sudo systemctl restart prometheus
 ```

1.  To verify that Prometheus is successfully scraping the Flask app, open a web browser and navigate to the Prometheus user interface on port 9090. This is the default port used for Prometheus. Replace INSTANCE_IP_ADDRESS with the IP of your instance:

 ```command
 http://INSTANCE_IP_ADDRESS:9090
 ```

1.  In the Prometheus UI click the **Status** tab and select **Targets**. You should see the Flask application service listed as a target with a status of `UP`, indicating that Prometheus is successfully scraping metrics from the application.

 ![Prometheus UI showing the status and targets of monitored services.](prometheus-ui-targets.png)

### Create a Grafana Dashboard with Application Metrics

Grafana serves as the visualization layer, providing an interface for creating dashboards from Prometheus metrics.

1.  Open a web browser and visit the following URL to access the Grafana UI on port 3000 (the default port for Grafana). Replace INSTANCE_IP_ADDRESS with the IP of your instance:

 ```command
 http://INSTANCE_IP_ADDRESS:3000
 ```

1.  Navigate to the **Dashboards** page:

 ![Grafana home menu with the Dashboards section selected.](grafana-home-menu-dashboards.png)

1.  Create a new dashboard in Grafana by clicking **Create dashboard**:

 ![Grafana Dashboards page with an option to create a new dashboard.](grafana-dashboards-overview.png)

1.  Click **Add visualization**:

 ![Grafana interface showing the Add Visualization dialog for creating a new graph.](grafana-add-visualization.png)

1.  In the resulting dialog, select the **prometheus** data source:

 ![Grafana data source selection dialog with Prometheus highlighted.](grafana-prometheus-datasource.png)

1.  To duplicate the GCP Cloud Monitoring metrics for the Flask application, first click on the **Code** tab in the right-hand side of the panel editor:

 ![Grafana panel editor with the Code tab selected for entering a PromQL query.](grafana-panel-editor-query-code.png)

1.  Input the following PromQL query to calculate the average latency for an endpoint:

 ```command
 flask_http_request_duration_seconds_sum{method="GET",path="/",status="200"} /
 flask_http_request_duration_seconds_count{method="GET",path="/",status="200"}
 ```

1.  After entering the formula, click **Run queries** to execute the PromQL query. The chart should update with data pulled from Prometheus:

 ![Grafana dashboard displaying a latency graph for a Flask application, based on Prometheus data.](grafana-latency-dashboard.png)

 This visualization replicates GCP Cloud Monitoring's latency metrics, detailing the average latency over time for a specific endpoint. Prometheus also provides default labels such as method, path, and status codes, for additional granularity in analysis.

## Additional Considerations and Concerns

### Cost Management

GCP Cloud Monitoring incurs [costs](https://cloud.google.com/stackdriver/pricing) for log storage and retention, data ingestion, API calls, and alerting policies. Migrating to Prometheus and Grafana eliminates these charges but introduces infrastructure costs for compute, storage, maintenance, and network traffic. Additionally, since Prometheus is primarily designed for short-term data storage, setting up a long-term storage solution may also increase costs.

**Recommendation**:

-   Estimate infrastructure costs for Prometheus and Grafana by assessing current GCP Cloud Monitoring data volume and access usage.
-   Access the [Google Cloud Billing](https://console.cloud.google.com/billing) report to determine a baseline for costs related to GCP Cloud Monitoring and Cloud Logging.
-   Use Prometheuss default short-term storage for real-time data, and configure a long-term storage solution for essential data to optimize costs.
-   Employ Grafanas alerting and dashboards strategically to reduce high-frequency scrapes and unnecessary data retention.
-   Regularly review and refine retention policies and scraping intervals to balance cost against visibility needs.

### Data Consistency and Accuracy

GCP Cloud Monitoring automates metric collection with built-in aggregation, whereas Prometheus relies on manual configuration through exporters and application instrumentation. Prometheus stores raw data with high granularity, but does not provide the same level of aggregated historical data as GCP Cloud Monitoring. This may lead to gaps in insights if retention isnt properly managed.

**Recommendation**:

-   Set up Prometheus exporters such as the [Node Exporter](https://prometheus.io/docs/guides/node-exporter/) (for host metrics) or [custom exporters](https://prometheus.io/docs/instrumenting/writing_exporters/) (for application metrics).
-   Configure scrape intervals to capture data at regular intervals.
-   Verify that custom instrumentation is accurate for critical metrics such as latency, requests, and resource usage.
-   Use the [remote-write capability](https://prometheus.io/docs/specs/remote_write_spec/) from Prometheus to write data to a remote storage backend like [Thanos](https://thanos.io/) or [Cortex](https://cortexmetrics.io/) for historical data retention. This ensures that older data remains accessible and aggregated at a lower resolution, which is similar to GCP's approach to historical data.

### GCP Cloud Monitoring Aggregated Data Versus Prometheus Raw Data

GCP Cloud Monitoring aggregates data automatically and can provide a straightforward approach to historical trend analysis. Prometheus captures high-resolution, raw data, which can require custom queries to derive similar insights.

**Recommendation**:

-   Leverage Grafanas dashboards to create aggregated views of Prometheus metrics.
-   Apply queries to aggregate data over larger time windows to create an summarized view similar to GCP Cloud Monitoring.
-   Use Prometheus [query functions](https://prometheus.io/docs/prometheus/latest/querying/functions/) such as [`rate`](https://prometheus.io/docs/prometheus/latest/querying/functions/#rate), [`avg_over_time`](https://prometheus.io/docs/prometheus/latest/querying/functions/#aggregation_over_time), and [`sum_over_time`](https://prometheus.io/docs/prometheus/latest/querying/functions/#aggregation_over_time) to replicate GCP Cloud Monitoring's aggregated trends.

### Alert System Migration

GCP Cloud Monitoring alerts are configured with thresholds and conditions that must be translated into query-based alert rules in Prometheus.

**Recommendation**:

-   Audit existing GCP Cloud Monitoring alerts and replicate them using Prometheus's Alertmanager.
-   Refine alert thresholds based on the type and granularity of data collected by Prometheus.
-   Integrate Alertmanager with any existing notification systems (e.g. email, Slack, etc.) to maintain consistency in how teams are alerted to critical events.

### Security and Access Controls

GCP Cloud Monitoring integrates with GCPs Identity and Access Management (IAM) system for Role-Based Access Control (RBAC). This helps manage who can view, edit, or delete logs and metrics. Prometheus and Grafana require manual configuration of security and access controls.

Securing Prometheus and Grafana involves setting up user authentication (e.g. OAuth, LDAP, etc.) and ensuring metrics and dashboards are only accessible to authorized personnel. Additionally, data in transit should be encrypted using TLS to maintain security.

**Recommendation**:

-   Configure Grafana with an RBAC policy and integrate it with an authentication system like OAuth or LDAP.
-   Enable TLS for Prometheus to secure data in transit.

### Separate Log and Metric Responsibilities

Prometheus is primarily designed for metrics collection and does not include built-in capabilities for managing logs. Since GCP Cloud Monitoring natively combines logs and metrics, migration requires decoupling those functions.

**Recommendation**:

-   Use a specialized log aggregation solution alongside Prometheus and Grafana for collecting, aggregating, and querying logs:
 -   [**Grafana Loki**](https://grafana.com/grafana/loki/) is designed to integrate with Grafana. It provides log querying capabilities within Grafana's existing interface, giving a unified view of metrics and logs in a single dashboard.
 -   [**Fluentd**](https://www.fluentd.org/) is a log aggregator that can forward logs to multiple destinations, including object storage for long-term retention, and can work with both Loki and ELK.

More Information

You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

This page was originally published on


Your Feedback Is Important

Let us know if this guide was helpful to you.


Join the conversation.
Read other comments or post your own below. Comments must be respectful, constructive, and relevant to the topic of the guide. Do not post external links or advertisements. Before posting, consider if your comment would be better addressed by contacting our Support team or asking on our Community Site.
The Disqus commenting system for Linode Docs requires the acceptance of Functional Cookies, which allow us to analyze site usage so we can measure and improve performance. To view and create comments for this article, please update your Cookie Preferences on this website and refresh this web page. Please note: You must have JavaScript enabled in your browser.