Product docs and API reference are now on Akamai TechDocs.
Search product docs.
Search for “” in product docs.
Search API reference.
Search for “” in API reference.
Search Results
 results matching 
 results
No Results
Filters
Migrating From AWS CloudWatch to Prometheus and Grafana on Akamai
Traducciones al EspañolEstamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
AWS CloudWatch is a monitoring and observability service designed to collect and analyze metrics, logs, and events from AWS resources and applications. It provides insights into the performance and health of infrastructure, letting users generate real-time alerts and dashboards for proactive monitoring.
While CloudWatch can be useful for AWS environments, organizations may seek alternative solutions to reduce costs or increase flexibility across multiple cloud platforms. Prometheus and Grafana offer an open source, platform-agnostic alternative.
This guide walks through how to migrate standard AWS CloudWatch service logs, metrics, and monitoring to a Prometheus and Grafana software stack on a Linode instance. To illustrate the migration process, an example Flask-based Python application running on a separate instance is configured to send logs and metrics to CloudWatch, and then modified to integrate with Prometheus and Grafana. While this guide uses a Flask application as an example, the principles can be applied to any workload currently monitored via AWS CloudWatch.
Introduction to Prometheus and Grafana
Prometheus is a time-series database that collects and stores metrics from applications and services. It provides a foundation for monitoring system performance using the PromQL query language to extract and analyze granular data. Prometheus autonomously scrapes (pulls) metrics from targets at specified intervals, efficiently storing data through compression while retaining the most critical details. It also supports alerting based on metric thresholds, making it suitable for dynamic, cloud-native environments.
Grafana is a visualization and analytics platform that integrates with Prometheus. It enables users to create real-time, interactive dashboards, visualize metrics, and set up alerts to gain deeper insights into system performance. Grafana can unify data from a wide array of data sources, including Prometheus, to provide a centralized view of system metrics.
Prometheus and Grafana are considered industry standard, and are commonly used together to monitor service health, detect anomalies, and issue alerts. Being both open source and platfrom-agnostic allows them to be deployed across a diverse range of cloud providers and on-premise infrastructures. Organizations often adopt these tools to reduce operational costs while gaining greater control over how data is collected, stored, and visualized.
Before You Begin
If you do not already have a virtual machine to use, create a Compute Instance for the Prometheus and Grafana stack using the steps in our Get Started and Create a Compute Instance guides:
- Prometheus and Grafana instance requirements: Linode 8 GB Shared CPU plan, Ubuntu 24.04 LTS distribution
Use these steps if you prefer to use the Linode CLI to provision resources.
The following command creates a Linode 8 GB compute instance (
g6-standard-4
) running Ubuntu 24.04 LTS (linode/ubuntu24.04
) in the Miami datacenter (us-mia
):linode-cli linodes create \ --image linode/ubuntu24.04 \ --region us-mia \ --type g6-standard-4 \ --root_pass PASSWORD \ --authorized_keys "$(cat ~/.ssh/id_rsa.pub)" \ --label monitoring-server
Note the following key points:
- Replace the
region
as desired. - Replace PASSWORD with a secure alternative for your root password.
- This command assumes that an SSH public/private key pair exists, with the public key stored as
id\_rsa.pub
in the user’s$HOME/.ssh/
folder. - The
--label
argument specifies the name of the new server (monitoring-server
).
To emulate a real-world workload, the examples in this guide use an additional optional instance to run an example Flask Python application. This application produces sample metrics and is used to illustrate configuration changes when switching from AWS CloudWatch to an alternative monitoring solution. This instance can live on AWS or other infrastructure (such as a Linode) as long as it is configured to send metrics to AWS CloudWatch.
- Example Flask app instance requirements: 1 GB Shared CPU, Ubuntu 24.04 LTS distribution
Follow our Set Up and Secure a Compute Instance guide to update each system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access.
sudo
. If you’re not familiar with the sudo
command, see the
Users and Groups guide.Install Prometheus as a Service
To install Prometheus, login via SSH to your Linode instance as your limited sudo user:
ssh SUDO_USER@LINODE_IP
Create a dedicated user for Prometheus, disable its login, and create the necessary directories for Prometheus:
sudo useradd --no-create-home --shell /bin/false prometheus sudo mkdir /etc/prometheus sudo mkdir /var/lib/prometheus
Download the latest version of Prometheus from its GitHub repository:
wget https://github.com/prometheus/prometheus/releases/download/v2.55.1/prometheus-2.55.1.linux-amd64.tar.gz
This guide uses version
2.55.1
. Check the project’s releases page for the latest version that aligns with your instance’s operating system.Extract the compressed file and navigate to the extracted folder:
tar xzvf prometheus-2.55.1.linux-amd64.tar.gz cd prometheus-2.55.1.linux-amd64
Move both the
prometheus
andpromtool
binaries to/usr/local/bin
:sudo cp prometheus /usr/local/bin sudo cp promtool /usr/local/bin
The
prometheus
binary is the main monitoring application, whilepromtool
is a utility application that queries and configures a running Prometheus service.Move the configuration files and directories to the
/etc/prometheus
folder you created previously:sudo cp -r consoles /etc/prometheus sudo cp -r console_libraries /etc/prometheus sudo cp prometheus.yml /etc/prometheus/prometheus.yml
Set the correct ownership permissions for Prometheus files and directories:
sudo chown -R prometheus:prometheus /etc/prometheus sudo chown -R prometheus:prometheus /var/lib/prometheus sudo chown prometheus:prometheus /usr/local/bin/prometheus sudo chown prometheus:prometheus /usr/local/bin/promtool
Create a systemd
Service File
A systemd
service configuration file must be created to run Prometheus as a service.
Create the service file using the text editor of your choice. This guide uses
nano
.sudo nano /etc/systemd/system/prometheus.service
Add the following content to the file, and save your changes:
- File: /etc/systemd/system/prometheus.Service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
[Unit] Description=Prometheus Service Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus \ --config.file=/etc/prometheus/prometheus.yml \ --storage.tsdb.path=/var/lib/prometheus \ --web.console.templates=/etc/prometheus/consoles \ --web.console.libraries=/etc/prometheus/console_libraries [Install] WantedBy=multi-user.target
Reload the
systemd
configuration files to apply the new service file:sudo systemctl daemon-reload
Using
systemctl
, start theflash-app
service and enable it to automatically start after a system reboot:sudo systemctl start prometheus sudo systemctl enable prometheus
Verify that Prometheus is running:
systemctl status prometheus
The output should display
active (running)
, confirming a successful setup:● prometheus.service - Prometheus Service Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; preset: enabled) Active: active (running) since Thu 2024-12-05 16:11:57 EST; 5s ago Main PID: 1165 (prometheus) Tasks: 9 (limit: 9444) Memory: 16.2M (peak: 16.6M) CPU: 77ms CGroup: /system.slice/prometheus.service
When done, press the Q key to exit the status output and return to the terminal prompt.
Open a web browser and visit your instance’s IP address on port
9090
(Prometheus’s default port):http://IP_ADDRESS:9090
The Prometheus UI should appear:
Note Prometheus settings are configured in the/etc/prometheus/prometheus.yml
file. This guide uses the default values. For production systems, consider enabling authentication and other security measures to protect your metrics.
Install the Grafana Service
Grafana provides an apt
repository, reducing the number of steps needed to install and update it on Ubuntu.
Install the necessary package to add new repositories:
sudo apt install software-properties-common -y
Import and add the public key for the Grafana repository:
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
Update the package index and install Grafana:
sudo apt update sudo apt install grafana -y
The installation process already sets up the
systemd
configuration for Grafana. Start and enable the Grafana service:sudo systemctl start grafana-server sudo systemctl enable grafana-server
Run the following command to verify that Grafana is
active (running)
:systemctl status grafana-server
● grafana-server.service - Grafana instance Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; preset: enabled) Active: active (running) since Thu 2024-12-05 13:57:10 EST; 8s ago Docs: http://docs.grafana.org Main PID: 3434 (grafana) Tasks: 14 (limit: 9444) Memory: 71.4M (peak: 80.4M) CPU: 2.971s CGroup: /system.slice/grafana-server.service
Connect Grafana to Prometheus
Open a web browser and visit your instance’s IP address on port
3000
(Grafana’s default port) to access the Grafana web UI:http://IP_ADDRESS:3000
Login using the default credentials of
admin
for both the username and password:After logging in, you are prompted to enter a secure replacement for the default password:
Add Prometheus as a data source by expanding the Home menu, navigating to the Connections entry, and clicking Add new connection:
Search for and select Prometheus.
Click Add new data source.
In the URL field, enter
http://localhost:9090
.Click Save & Test to confirm the connection.
If successful, your Grafana installation is now connected to the Prometheus installation running on the same Linode.
Configure Example Flask Server
This guide demonstrates the migration process using an example Flask app running on a separate instance from which metrics and logs can be collected.
Log in to the instance running the example Flask application as a user with
sudo
privileges.Create a directory for the project named
exmaple-flask-app
and navigate into it:mkdir example-flask-app cd example-flask-app
Using a text editor of your choice, create a file called
app.py
:nano app.py
Give it the following contents:
- File: app.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
import boto3 # Note: pip install boto3 import json import logging import time from flask import Flask, request logging.basicConfig(filename='flask-app.log', level=logging.INFO) logger = logging.getLogger(__name__) app = Flask(__name__) # AWS CloudWatch setup cloudwatch = boto3.client('cloudwatch') @app.before_request def start_timer(): request.start_time = time.time() @app.after_request def send_latency_metric(response): latency = time.time() - request.start_time # Send latency metric to CloudWatch cloudwatch.put_metric_data( Namespace='FlaskApp', MetricData=[ { 'MetricName': 'EndpointLatency', 'Dimensions': [ { 'Name': 'Endpoint', 'Value': request.path }, { 'Name': 'Method', 'Value': request.method } ], 'Unit': 'Seconds', 'Value': latency } ] ) return response @app.route('/') def hello_world(): logger.info("A request was received at the root URL") return {'message': 'Hello, World!'}, 200 if __name__ == '__main__': app.run(host='0.0.0.0', port=8080)
The example Flask application in this guide collects and sends endpoint latency metrics to CloudWatch using the
put_metric_data
API from Boto3. Application logs are written to a local file and ingested into CloudWatch Logs for centralization.When done, save your changes, and close the text editor.
Create a separate text file called
requirements.txt
:nano requirements.txt
Provide it with the following basic dependencies for the Flask application to function, and save your changes:
- File: requirements.txt
1 2 3 4 5
Flask==3.0.3 itsdangerous==2.2.0 Jinja2==3.1.4 MarkupSafe==2.1.5 Werkzeug==3.0.4
A virtual environment is required to run
pip
commands in Ubuntu 24.04 LTS. Use the following command to installpython3.12-venv
:sudo apt install python3.12-venv
Using the
venv
utility, create a virtual environment namedvenv
within theexample-flask-app
directory:python3 -m venv venv
Activate the
venv
virtual environment:source venv/bin/activate
Use
pip
to install the example Flask application’s dependencies from therequirements.txt
file:pip install -r requirements.txt
Also using
pip
, install theboto3
library, a Python library required for interfacing with AWS resources:pip install boto3
Exit the virtual environment:
deactivate
Create a systemd
Service File
Create a
systemd
service file for the example Flask app:sudo nano /etc/systemd/system/flask-app.service
Provide the file with the following content, replacing USERNAME with your actual
sudo
user:- File: /etc/systemd/system/flask-app.service
1 2 3 4 5 6 7 8 9 10 11 12
[Unit] Description=Flask Application Service After=network.target [Service] User=USERNAME WorkingDirectory=/home/USERNAME/example-flask-app ExecStart=/home/USERNAME/example-flask-app/venv/bin/python /home/USERNAME/example-flask-app/app.py Restart=always [Install] WantedBy=multi-user.target
Save your changes when complete.
Reload the
systemd
configuration files to apply the new service file, then start and enable the service:sudo systemctl daemon-reload sudo systemctl start flask-app sudo systemctl enable flask-app
Verify that the
flask-app
service isactive (running)
:systemctl status flask-app
● flask-app.service - Flask Application Service Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled) Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago Main PID: 4413 (python) Tasks: 1 (limit: 9444) Memory: 20.3M (peak: 20.3M) CPU: 196ms CGroup: /system.slice/flask-app.service
Once the Flask application is running, CloudWatch can monitor its data.
Generate data by issuing an HTTP request using the following cURL command. Replace FLASK_APP_IP_ADDRESS with the IP address of the instance where the Flask app is running:
curl http://FLASK_APP_IP_ADDRESS:8080
You should receive the following response:
{"message": "Hello, World!"}
Migrate from AWS CloudWatch to Prometheus and Grafana
Migrating from AWS CloudWatch to Prometheus and Grafana requires careful planning. It is important to ensure the continuity of your monitoring capabilities while leveraging the added control over data handling and advanced features of Prometheus and Grafana.
Assess Current Monitoring Requirements
Before migrating to Prometheus and Grafana, it’s important to understand what metrics and logs are currently being collected by CloudWatch and how they are used. This may vary depending on your application.
Metrics such as endpoint latency are collected for every HTTP request, along with HTTP method details. Application logs record incoming requests, exceptions, and warnings. For example, when the sample Flask application is configured with AWS CloudWatch, it emits logs like the following:



CloudWatch also visualizes metrics in graphs. For instance, by querying the endpoint latency metrics sent by the Flask application, a graph may look like this:



Export Existing CloudWatch Logs and Metrics
AWS includes tools for exporting CloudWatch data for analysis or migration. For example, CloudWatch logs can be exported to an S3 bucket, making them accessible outside AWS and enabling them to be re-ingested into other tools.
To export CloudWatch Logs to S3, use the following create-export-task
command from the system where your AWS CLI is configured:
aws logs create-export-task \
--log-group-name LOG_GROUP \
--from START_TIME \
--to END_TIME \
--destination S3_BUCKET_NAME \
--destination-prefix cloudwatch-logs/
Replace the following placeholders with your specific values:
- LOG_GROUP: The name of the log group to export.
- START_TIME and END_TIME: The time range in milliseconds.
- S3_BUCKET_NAME: The name of your S3 bucket.
Expose Application Metrics to Prometheus
Prometheus works differently from CloudWatch. Instead of pushing data like CloudWatch, Prometheus pulls metrics from the monitored application. After assessing or exporting metrics as needed, modify the application to enable Prometheus metric scraping so that it collects the same metrics previously sent to CloudWatch. This process varies from application to application.
For the example Flask application in this guide, the prometheus_flask_exporter
library is a standard library that can be used for instrumenting Flask applications to expose Prometheus metrics.
Reactivate the
venv
virtual environment:source venv/bin/activate
Use
pip
to install theprometheus_client
andprometheus_flask_exporter
libraries:pip install prometheus_client prometheus_flask_exporter
Exit the virtual environment:
deactivate
Using a text editor of your choice, open the
app.py
file for the Flask application:nano app.py
Replace the file’s current AWS-specific contents with the Prometheus-specific code below:
- File: app.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
import logging import random import time from flask import Flask from prometheus_flask_exporter import PrometheusMetrics logging.basicConfig(filename="flask-app.log", level=logging.INFO) logger = logging.getLogger(__name__) app = Flask(__name__) metrics = PrometheusMetrics(app) metrics.info("FlaskApp", "Application info", version="1.0.0") @app.route("/") def hello_world(): logger.info("A request was received at the root URL") return {"message": "Hello, World!"}, 200 @app.route("/long-request") def long_request(): n = random.randint(1, 5) logger.info( f"A request was received at the long-request URL. Slept for {n} seconds" ) time.sleep(n) return {"message": f"Long running request with {n=}"}, 200 if __name__ == "__main__": app.run(host="0.0.0.0", port=8080)
This uses the
prometheus_flask_exporter
library to:- Instrument the Flask app for Prometheus metrics.
- Expose default and application-specific metrics at the
/metrics
endpoint. - Provide metadata such as version information via
metrics.info
.
Save and close the file, then restart the
flask-app
service:sudo systemctl restart flask-app
Verify that the
flask-app
service isactive (running)
:systemctl status flask-app
● flask-app.service - Flask Application Service Loaded: loaded (/etc/systemd/system/flask-app.service; enabled; preset: enabled) Active: active (running) since Thu 2024-12-05 17:26:18 EST; 1min 31s ago Main PID: 4413 (python) Tasks: 1 (limit: 9444) Memory: 20.3M (peak: 20.3M) CPU: 196ms CGroup: /system.slice/flask-app.service
Test to see if the Flask app is accessible by issuing the following cURL command. Replace FLASK_APP_IP_ADDRESS with the IP address of the instance where the Flask app is running:
curl http://FLASK_APP_IP_ADDRESS:8080
You should receive the following response:
{"message": "Hello, World!"}
To view the metrics, open a web browser and visit the following URL:
http://FLASK_APP_IP_ADDRESS:8080/metrics
The metrics shown include
http_request_duration_seconds
(request latency) andhttp_requests_total
(total number of requests).
Configure Prometheus to Ingest Application Metrics
Log back in to the Prometheus & Grafana instance.
Using a text editor, open and modify the Prometheus configuration at
/etc/prometheus/prometheus.yml
to include the Flask application as a scrape target:sudo nano /etc/prometheus/prometheus.yml
Append the following content to the
scrape_configs
section of the file, replacing FLASK_APP_IP_ADDRESS with the IP address of yourmonitoring-server
instance:- File: /etc/prometheus/prometheus.yml
1 2 3
- job_name: 'flask_app' static_configs: - targets: ['FLASK_APP_IP_ADDRESS:8080']
This configuration tells Prometheus to scrape metrics from the Flask application running on port
8080
.Save the file, and restart Prometheus to apply the changes:
sudo systemctl restart prometheus
To verify that Prometheus is successfully scraping the Flask app, open a web browser and navigate to the Prometheus user interface on port 9090. This is the default port used for Prometheus. Replace INSTANCE_IP_ADDRESS with the IP of your instance:
http://INSTANCE_IP_ADDRESS:9090
In the Prometheus UI click the Status tab and select Targets. You should see the Flask application service listed as a target with a status of
UP
, indicating that Prometheus is successfully scraping metrics from the application.
Create a Grafana Dashboard with Application Metrics
Grafana serves as the visualization layer, providing an interface for creating dashboards from Prometheus metrics.
Open a web browser and visit the following URL to access the Grafana UI on port 3000 (the default port for Grafana). Replace INSTANCE_IP_ADDRESS with the IP of your instance:
http://INSTANCE_IP_ADDRESS:3000
Navigate to the Dashboards page:
Create a new dashboard in Grafana by clicking Create dashboard:
Click Add visualization:
In the resulting dialog, select the prometheus data source:
To duplicate the CloudWatch metrics for the Flask application, first click on the Code tab in the right-hand side of the panel editor:
Input the following PromQL query to calculate the average latency for an endpoint:
flask_http_request_duration_seconds_sum{method="GET",path="/",status="200"} / flask_http_request_duration_seconds_count{method="GET",path="/",status="200"}
After entering the formula, click Run queries to execute the PromQL query. The chart should update with data pulled from Prometheus:
This visualization replicates CloudWatch’s endpoint latency graph, detailing the average latency over time for a particular endpoint. Prometheus also provides default labels such as method, path, and status codes, for additional granularity in analysis.
Additional Considerations and Concerns
Cost Management
CloudWatch incurs costs based on the number of API requests, log volume, and data retention. As monitoring scales, these costs can increase. Prometheus is an open source tool with no direct charges for usage and offers a potential for cost savings.
However, infrastructure costs for running Prometheus and Grafana are still a consideration. Running Prometheus and Grafana requires provisioning compute and storage resources, with expenses for maintenance and handling network traffic. Additionally, since Prometheus is primarily designed for short-term data storage, setting up long-term storage solution may also increase costs.
Recommendation:
- Estimate infrastructure costs for Prometheus and Grafana by assessing current CloudWatch data volume and access usage.
- Utilize object storage or other efficient long-term storage mechanisms to minimize costs.
Data Consistency and Accuracy
CloudWatch aggregates metrics over set intervals, whereas Prometheus collects high-resolution raw metrics. Therefore, migrating from CloudWatch to Prometheus can raise potential concerns about data consistency and accuracy during and after the transition.
Recommendation:
- Tune Prometheus scrape intervals to capture the necessary level of detail without overwhelming storage or compute capacities.
- Validate that CloudWatch metrics correctly map to Prometheus metrics, with the appropriate time resolutions.
CloudWatch Aggregated Data Versus Prometheus Raw Data
Aggregated data from CloudWatch offers a high-level view of system health and application performance, and can be helpful for monitoring broader trends. Alternatively, the raw data from Prometheus enables detailed analyses and granular troubleshooting. Both approaches have their use cases, and it’s important to understand which is most appropriate for you.
While Prometheus has the ability to collect raw data, consider whether CloudWatch’s aggregation is more useful, and how to replicate that with Grafana dashboards or Prometheus queries.
Recommendation:
- Create Grafana dashboards that aggregate Prometheus data for overall system-level insights.
- Leverage Prometheus’s detailed, raw metrics for fine-grained data analysis.
Alert System Migration
CloudWatch’s integrated alerting system is tightly coupled with AWS services and allows for alerts based on metric thresholds, log events, and more. Prometheus offers its own alerting system, Alertmanager, which can handle alerts based on Prometheus query results.
Migrating an alerting setup requires translating existing CloudWatch alarms into Prometheus alert rules. Consider how the thresholds and conditions set in CloudWatch translate to query-based alerts in Prometheus.
Recommendation:
- Audit all CloudWatch alerts and replicate them using Prometheus Alertmanager.
- Refine alert thresholds based on the type of data collected by Prometheus.
- Integrate Alertmanager with any existing notification systems (e.g. email, Slack, etc.) to maintain consistency in how teams are alerted to critical events.
Security and Access Controls
CloudWatch integrates with AWS Identity and Access Management (IAM) for role-based access control (RBAC). This helps with management of who can view, edit, or delete logs and metrics. Prometheus and Grafana require manual configuration of security and access controls.
Securing Prometheus and Grafana involves setting up user authentication (e.g. OAuth, LDAP, etc.) and ensuring metrics and dashboards are only accessible to authorized personnel. To maintain security, data in transit should be encrypted using TLS.
Recommendation:
- Implement secure access controls from the start.
- Configure Grafana with a well-defined RBAC policy and integrate it with an authentication system, such as OAuth or LDAP.
- Enable TLS for Prometheus to secure data in transit, and restrict access to sensitive metrics.
Separate Log and Metric Responsibilities
Since Prometheus is primarily a metrics-based monitoring solution, it does not have built-in capabilities for handling logs in the same way CloudWatch does. Therefore, it’s important to decouple log management needs from metric collection when migrating.
Recommendation:
- Introduce a specialized log aggregation solution alongside Prometheus and Grafana for collecting, aggregating, and querying logs:
- Grafana Loki is designed to integrate with Grafana. It provides log querying capabilities within Grafana’s existing interface, giving a unified view of metrics and logs in a single dashboard.
- Fluentd is a log aggregator that can forward logs to multiple destinations, including object storage for long-term retention. It works with both Loki and ELK.
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This page was originally published on