---
title: "AI Tools - Monitor your cloud resources"
description: "Discover how to monitor your AI Tools resources"
url: https://docs.ovhcloud.com/pt/guides/public-cloud/ai-machine-learning/ai-solutions-resources
lang: pt
lastUpdated: 2025-09-22
---
# AI Tools - Monitor your cloud resources

## Objective

**AI Tools** offers comprehensive monitoring capabilities for your cloud resources, ensuring optimal performance and efficiency.

This guide will walk you through accessing and interpreting the various metrics provided by the monitoring dashboard, which is accessible for AI Notebooks, AI Training, and AI Deploy services through a dedicated UI on **Grafana**.

## Requirements

- An AI Project created inside a [Public Cloud project](https://www.ovhcloud.com/pt/public-cloud/) in your OVHcloud account
- An [AI user](/pt/guides/public-cloud/ai-machine-learning/ai-users.md)
- Access to the <ManagerLink to="/#/pci/projects">OVHcloud Control Panel</ManagerLink> or [the OVHcloud AI CLI](/pt/guides/public-cloud/ai-machine-learning/ai-cli-install-client.md) installed on your computer
- A running **OVHcloud AI Tool** (AI Notebooks, AI Training, or AI Deploy)


***

### OVHcloud Control Panel Access

- **Direct link:** <ManagerLink to="/#/pci/projects">Public Cloud Projects</ManagerLink>
- **Navigation path:** <code className="action">Public Cloud</code> > Select your project

***


## Instructions

### Monitoring Grafana Access

The monitoring dashboard for **AI Tools** can be accessed via a dedicated URL, which is provided in the AI Tool details, accessible from the Control Panel (UI) or with the [ovhai CLI](/pt/guides/public-cloud/ai-machine-learning/ai-cli-install-client.md). The monitoring URL is structured as follows: _`https://monitoring.<REGION>.ai.cloud.ovh.net/d/gpu?var-job=<JOB-ID>`_.

To fetch your AI Tool monitoring URL, you can use either the CLI or the Control Panel UI:


**Using the Control Panel (UI)**

Click <ManagerLink to="/#/pci/projects">this link</ManagerLink> to access your Public Cloud project, then go to the <code className="action">AI & Machine Learning</code> category in the left menu and choose <code className="action">AI Notebooks</code>, <code className="action">AI Training</code> or <code className="action">AI Deploy</code> section depending on the AI Tool you are using.
From there, you will access a table listing your instances, where you can find the one you need and its general information. To view your instance details, click either the instance name or the <code className="action">...</code> button and then <code className="action">Manage</code>.
![image](/images/public-cloud/ai-machine-learning/gi-11-concepts-resources/00_access_tool_details.png)From there, you can click on the <code className="action">Grafana Dashboard</code> button, under <code className="action">Usage monitoring</code>, to access your monitoring Grafana UI.
![image](/images/public-cloud/ai-machine-learning/gi-11-concepts-resources/01_access_grafana_ui.png)

**Using ovhai CLI**

To follow this part, make sure you have installed the [ovhai CLI](/pt/guides/public-cloud/ai-machine-learning/ai-cli-install-client.md) on your computer or on an instance.
If you have not done it already, log in to the `ovhai` CLI. Once logged in, you can list your existing notebooks, jobs, or apps by running one of the following commands, depending on the AI tool you are using:
```bash
ovhai notebook list
ovhai job list
ovhai app list
```
Using the ID of the instance you are interested in, you can retrieve more information, including the monitoring link, by executing one of the following commands:
```bash
ovhai notebook get <NOTEBOOK-ID>
ovhai job get <JOB-ID>
ovhai app get <APP-ID>
```
Replace `<NOTEBOOK-ID>`, `<JOB-ID>`, or `<APP-ID>` with the respective id of your notebook, job, or app, respectively.
This will give you a similar output:
```
Created At: 01-01-25 14:00:00
Id:         abcdefgh-ijkl-mnop-qrst-uvwxyz01234567
Spec:
  Command:
  Default Http Port:    8080
  Env Vars:             ~
  Image:                ovhcom/ai-training-pytorch
  Labels:
    ovh/id:   abcdefgh-ijkl-mnop-qrst-uvwxyz01234567
    ovh/type: job
  Name:                 ai-tool-demo
  Resources:
    Cpu:               28
    Ephemeral Storage: 3.0 TiB
    Flavor:            h100-1-gpu
    Gpu:               1
    Gpu Brand:         NVIDIA
    Gpu Memory:        79.6 GiB
    Gpu Model:         H100
    Memory:            350.0 GiB
    Private Network:   0 bps
    Public Network:    5.0 Gbps
  Shutdown:             ~
  Ssh Public Keys:      ~
  Timeout:              0
  Timeout Auto Restart: false
  Unsecure Http:        false
  Volumes:              ~
  Grpc Port:            0
Status:
  Data Sync:            ~
  Duration:             111s
  External Ip:          51.210.38.76
  History:
    DATE                  STATE
    19-09-25 14:16:43     QUEUED
    19-09-25 14:16:44     INITIALIZING
    19-09-25 14:16:44     PENDING
    19-09-25 14:16:46     RUNNING
  Info:
    Message:   Job is running
  Info Url:             https://ui.gra.ai.cloud.ovh.net/job/abcdefgh-ijkl-mnop-qrst-uvwxyz01234567
  Ip:                   10.42.178.59
  Monitoring Url:       https://monitoring.gra.ai.cloud.ovh.net/d/job?var-job=abcdefgh-ijkl-mnop-qrst-uvwxyz01234567&from=1758291343926
  Ssh Url:              ~
  State:                RUNNING
  Url:                  https://abcdefgh-ijkl-mnop-qrst-uvwxyz01234567.job.gra.ai.cloud.ovh.net
  Volumes:              ~
  Grpc Address:         abcdefgh-ijkl-mnop-qrst-uvwxyz01234567.job-grpc.gra.ai.cloud.ovh.net:443
Updated At: 19-09-25 14:16:48
User:       user-abcdefghijkl
```
The AI Tool monitoring can be found in the **Monitoring Url** field, located at the bottom of the details section.


### Monitoring UI Details for AI Tools

The monitoring dashboard for **AI Tools** provides detailed insights into the resource usage of your instances. The available panels vary depending on the service (AI Notebooks, AI Training, or AI Deploy).

#### AI Notebooks and AI Training Jobs

For AI Notebooks and AI Training, the following features are available:

- **GPUs (int)**: Number of GPUs allocated to your AI Tool.
- **GPU Average Usage (%)**: Average GPU usage percentage.
- **GPU Average Temperature (°C)**: Average temperature of the GPUs.
- **GPU Average Power Usage (w)**: Average power usage of the GPUs.
- **CPU Usage (%)**: Overall CPU usage percentage.
- **Memory Usage (GB)**: Usage and limit of memory allocated to your job.

**Detailed GPU Metrics**

- **GPU Usage**: Utilization of each GPU allocated to your notebook or job.
- **GPU Memory Usage**: Usage and limit of memory for each GPU.
- **SM Clocks**: Streaming Multiprocessor clocks for each GPU.
- **GPU Memory Clocks**: Memory clocks for each GPU.
- **Framebuffer Used**: Amount of framebuffer memory used.
- **GPU Power Usage**: Power usage of each GPU.
- **GPU Temperature**: Temperature of each GPU.
- **CPU Usage**: Overall CPU usage.
- **Memory Usage**: Usage and limit of memory allocated to your notebook or job.
- **Network Usage**: Input and output traffic on your notebook or job.
- **Ephemeral Storage Usage**: Usage and limit of ephemeral storage allocated to your notebook or job.
- **Volumes Total Size**: Total size of volumes attached to your notebook or job.
- **Volumes Total File Count**: Total number of files in attached volumes.

![image](/images/public-cloud/ai-machine-learning/gi-11-concepts-resources/02_resource_dashboard.png)
#### AI Deploy Applications

For AI Deploy, which allows deploying applications, the following categories are available:

**Resource Usage**

- **GPU Average Usage**: Average GPU usage percentage.
- **GPU Average Memory Usage**: Average memory usage of the GPUs.
- **CPU Average Usage**: Average CPU usage percentage.
- **Ephemeral Storage Usage**: Usage and limit of ephemeral storage allocated to your app.
- **Network Usage**: Input and output traffic on your app.
- **Memory Total Usage**: Total memory usage.
- **Volumes Total Size**: Total size of volumes attached to your app.
- **Volumes Total File Count**: Total number of files in attached volumes.

**HTTP**

- **HTTP Call Latency**: Response time for HTTP calls.
- **HTTP Call Count**: Number of HTTP calls made.

**Auto-scaling**

- **Replicas**: Number of replicas.
- **Usage Target**: Target usage for scaling.
- **CPU Usage**: Overall CPU usage.
- **Memory Usage**: Usage and limit of memory allocated to your app.
- **GPU Usage**: Utilization of each GPU allocated to your app.
- **GPU Memory Usage**: Usage and limit of memory for each GPU.
- **GPU SM Clocks**: Streaming Multiprocessor clocks.
- **GPU Memory Clocks**: Memory clocks.
- **GPU Power Usage**: Power usage of each GPU.
- **Framebuffer Used**: Amount of framebuffer memory.
- **GPU Temperature**: Temperature of each GPU.

## Warnings

:::warning

- GPUs panels (usage, memory) are only available for AI Tools that consume GPUs.

:::

:::warning

- AI Tools can use ephemeral storage for data not within a synchronised container. If your usage goes beyond the limit of the ephemeral storage, your job will be rejected.

:::

## Feedback

Please send us your questions, feedback, and suggestions to improve the service:

- On the OVHcloud [Discord server](https://discord.gg/ovhcloud)

If you need training or technical assistance to implement our solutions, contact your sales representative or click on [this link](https://www.ovhcloud.com/pt/professional-services/) to get a quote and ask our Professional Services experts for a custom analysis of your project.
