Setting up Cloud Monitoring with a standalone agent
Cloud Monitoring helps you gain visibility into the performance, availability, and health of your applications and infrastructure. You can use Cloud Monitoring and other parts of Google Cloud Observability to monitor, troubleshoot, and operate VMware Engine services at scale.
You define configuration of metrics forwarding separately for each private cloud using a standalone agent. Each private cloud requires a separate agent, hosted in either a Compute Engine VM or a VMware VM.
Once you successfully enable metrics forwarding, you can see metrics in
the Cloud Monitoring Metrics Explorer. Resource types and metrics from
VMware Engine begin with external.googleapis.com/vmware/vcenter
,
and the vCenter FQDN is tagged as part of the namespace.
Before you begin
The steps in this document assume that you first do the following:
- Enable the Cloud Monitoring API
- Identify a solution user account to use with the standalone agent, and set a strong password for the solution user account.
Create a Compute Engine VM or a VMware VM to use as a host for the standalone agent. Compute Engine VMs must be in a Virtual Private Cloud (VPC) network that's peered to the private cloud VPC network containing the resources you want to monitor.
For an example of creating a Compute Engine VM, see the Compute Engine Quickstart using a Linux VM. For an example of creating a VMware VM, see Creating a VMware VM.
If you use a Compute Engine VM to host the standalone agent, then Google manages key rotation for the service account that you connect. However, it can be cost efficient to create a VMware VM if you have unused capacity in your private cloud and you don't mind managing key rotation by yourself.
Regardless of where you create your agent host VM, it must meet the following requirements:
Requirements
Your agent host VM must meet the following system requirements:
- Supported Linux operating systems:
- CentOS 6, 7, or 8
- Red Hat Enterprise Linux 6, 7, or 8
- SLES 12 or 15
- Ubuntu 14, 16, 18, or 19
- At least 4 GB of RAM
- 300 MB installation space available
- Installation directory set to
/opt/bpagent
curl
CLI utility installed
Your agent host VM also needs access to the following addresses to collect and push metrics and logs:
- Port 443 (TCP) for the HTTPS connection to your vCenter Server (default)
monitoring.googleapis.com:443 (external access)
logging.googleapis.com:443 (external access)
Enabling metrics forwarding
The process of setting up your agent host VM and enabling metrics forwarding consists of the following steps:
- Install the agent on the VM
- Specify a service account
- Configure the agent to access your private cloud for metrics
- Configure the agent to access the service account for reporting
- Collect metrics and logs
- Configure a private cloud for syslog forwarding
Installing the standalone agent
To install the agent on your host VM, do the following:
- Connect to your agent host VM.
Run the installation script remotely:
sudo sh -c "$(curl -S https://storage.googleapis.com/gcve-observability-agent/latest/vmware-linux-amd64/installer/install.sh)"
For systems where the agent host VM might not have external network access, you can download the agent and installation script using the following commands:
To get the standalone agent, run the following:
curl -S https://storage.googleapis.com/gcve-observability-agent/latest/vmware-linux-amd64/artifacts/bpagent-headless-vmware.tar.gz > agent.tar.gz
To get the installation script, run the following:
curl -S https://storage.googleapis.com/gcve-observability-agent/latest/vmware-linux-amd64/installer/install.sh > install.sh
To install the agent, run the following on your agent host VM:
sudo sh install.sh agent.tar.gz
Specify a service account
Forwarding data from the agent to Cloud Monitoring requires a service account from your Google Cloud project. That service account must have the Monitoring Admin role for metrics and the Logs Writer role for logs.
If you don't have a service account for monitoring and logging applications, create one:
In the Google Cloud console, go to IAM & Admin > Service Accounts.
Click Create service account.
Enter a name, ID, and description for the service account. We recommend noting that the account is used for the agent integration.
Click Create and continue.
For Role, select Monitoring Admin.
For Role, select Logs Writer.
Click Continue.
Click Done.
If you created a VMware VM as your agent host VM, retrieve the service account private key so you can use it to set up the agent:
In the Google Cloud console, go to IAM & Admin > Service Accounts.
Find your service account in the list of service accounts.
In the Actions column, click the
service account actions menu and select Manage keys.Click Add key and select Create new key.
Select the JSON key type, and click Create.
Open the generated JSON file and copy the entire file contents. Metrics and logging collection both use the same JSON key file.
On your agent host VM, copy the JSON key file to the
/opt/bpagent/config/destinations/google_cloud
directory.
Configure the agent to access your private cloud for metrics
The standalone agent needs access to your private cloud to collect metrics. On
your agent host VM, set up access by copying and configuring the
vmware_vcenter.yaml
file:
Copy
vmware_vcenter.yaml
to theconfig/metrics/sources
directory:cp /opt/bpagent/config/metrics/examples/vmware_vcenter.yaml /opt/bpagent/config/metrics/sources
Edit the
vmware_vcenter.yaml
to match the information in your VMware Engine environment:collection_interval: 1m0s connection_info: connection_timeout: "30" enable_performance_counters: "true" host: VCSA_FQDN password: SOLUTION_USER_PASSWORD performance_counter_end_time: "" performance_counter_query_timeout: "15" performance_counter_start_time: "" port: "443" sdk_path: "" ssl_config: "No Verify" username: SOLUTION_USER_ACCOUNT
Replace the following:
VCSA_FQDN
: the fully qualified domain name (FQDN) of the vCenter Server Appliance in your private cloudSOLUTION_USER_PASSWORD
: the password that corresponds to the solution user account being usedSOLUTION_USER_ACCOUNT
: the solution user account that the agent uses to report information
Configure the agent to access the service account for reporting
The standalone agent needs access to Google Cloud Observability to send metrics and logs. Configure the agent to access the project used for reporting through the service account that has monitoring and logging permissions.
On your agent host VM, copy and configure the log_agent.yaml
file:
Copy
log_agent.example.yaml
tolog_agent.yaml
before editing:cp /opt/bpagent/config/log_agent.example.yaml /opt/bpagent/config/log_agent.yaml
At the bottom of
log_agent.yaml
, enter your project name and the full path to the JSON key file. For agent host VMs created in Compute Engine, remove or comment out thecredentials_file
line.... - id: my_project_destination project_id:
PROJECT_ID
type: google_cloud_output credentials_file: /opt/bpagent/config/destinations/google_cloud/JSON_KEY_FILE
Replace the following:
PROJECT_ID
: ID of the project where you want to output logsJSON_KEY_FILE
: name of your service account private key file. Remove or comment out this line for agent host VMs created in Compute Engine.
Collect metrics and logs
To collect metrics or logs, the standalone agent must be running on your agent host VM. Connect to your agent host VM and use the following commands to start or stop the agent:
To start the agent on your host VM, run the following:
systemctl start bpagent
To stop the agent on your host VM, run the following:
systemctl stop bpagent
Configure a private cloud for syslog forwarding
VMware Engine integrates with Cloud Logging by forwarding syslog messages from vCenter and NSX-T to the standalone agent. The standalone agent is configured to parse both vCenter and NSX-T logs for Cloud Logging to read.
To forward syslog messages from VMware Engine, do the following:
- For vCenter syslog forwarding, follow the steps in Forward vCenter syslog messages.
In the Server field, enter the IP address or host name of your agent host
VM. The standalone agent uses the TCP communication protocol and listens
on port
5142
. The standalone agent must be running for the syslog configuration to connect successfully. - For NSX-T syslog forwarding, follow the steps in Forward NSX-T syslog messages.
In the FQDN or IP Address field, enter the IP address or host name of your
agent host VM. The standalone agent uses the TCP communication protocol
and listens on port
5142
.
Uninstalling the agent
To remove the agent from a VM, connect to the agent VM and run the following commands in the command line:
Stop and disable the standalone agent:
systemctl stop bpagent
systemctl disable bpagent
Run the following commands to remove the standalone agent service:
rm /etc/systemd/system/bpagent.service
rm -rf /opt/bpagent
Update the system configuration based on your service changes and clear out any failed units:
systemctl daemon-reload
systemctl reset-failed
Cloud Monitoring dashboards
After you enable metrics forwarding, you can install predefined dashboards for VMware Engine. The following dashboards provide you with aggregated information across all sources that you specify:
- Overview dashboard: High-level view that lists key resources like data centers, clusters, and VMs.
- Contention dashboard: Resource utilization for storage, CPU, memory, and networking to help you locate top VMs and hosts by resource demand.
- Virtual machine performance dashboard: Virtual machine (VM) instance performance indicators that can be filtered by instance name and used to compare the performance of multiple VMs with each other.
To access a VMware Engine dashboard, do the following:
In the Google Cloud console, go to Monitoring > Dashboards.
In the Sample Library tab, select the VMware category.
Select the dashboard of interest and click file_downloadImport.
The definitions for these dashboards are also stored on GitHub. For steps to install or view the definitions as custom dashboards, see Install sample dashboards.
Cloud Monitoring alerts
You can use metrics from your integration to trigger alerts and notifications based on custom thresholds and incidents. For example, you can have Cloud Monitoring send you an SMS notification when someone creates a new VM in your private cloud.
For details, see Introduction to alerting.
List of collected metrics
Once you successfully enable metrics forwarding, you can see metrics in
the Cloud Monitoring Metrics Explorer. Resource types and metric types from
VMware Engine begin with the prefix
external.googleapis.com/vmware/vcenter.
in the Metrics Explorer.
Here's the full list of metrics collected for VMware Engine, with the prefix omitted:
Resource and metric type | Description |
---|---|
cluster.cpu.available |
CPU available in a cluster, in megahertz |
cluster.memory.available |
Memory available in a cluster, in bytes |
cluster.cpu.threads |
Number of CPU threads in a cluster |
cluster.cpu.effective |
Effective CPU in a cluster from all running hosts. Hosts that are unresponsive or in maintenance mode are not counted. |
cluster.effective_hosts |
Number of effective hosts in a cluster. Hosts that are unresponsive or in maintenance mode are not counted. |
cluster.memory.effective |
Effective memory in a cluster from all running hosts. Hosts that are unresponsive or in maintenance mode are not counted. |
cluster.hosts |
Number of hosts in a cluster |
cluster.vsan.latency |
vSAN latency of a cluster, in microseconds |
cluster.vsan.throughput |
vSAN read-write throughput of a cluster, in bytes |
cluster.vsan.iops |
vSAN IOPS of a cluster |
cluster.vsan.congestions |
vSAN congestion value of a cluster |
cluster.vsan.oio |
vSAN outstanding I/O (oio) in a cluster |
datacenter.cpu.average_host_utilization |
Average host utilization of a datacenter, as a percentage |
datacenter.clusters |
Number of clusters in a datacenter |
datacenter.datastores |
Number of datastores in a datacenter |
datacenter.hosts |
Number of hosts in a datacenter |
datacenter.host_systems |
Number of host systems in a datacenter |
datacenter.hosts/powered_on |
Number of powered on hosts in a datacenter |
datacenter.hosts/powered_off |
Number of powered off hosts in a datacenter |
datacenter.disk/space |
Total disk space in a datacenter, in terabytes |
datastore.capacity_bytes |
Capacity of a datastore, in bytes |
datastore.capacity_utilization |
Capacity utilization of a datastore, as a percentage |
host_system.network.transmitted_packets |
Number of network packets transmitted by the host system |
host_system.network.received_packets |
Number of network packets received by the host system |
host_system.dropped_packets |
Number of network packets dropped by the host system |
host_system.network.adapters |
Number of host system network adapters |
host_system.memory.utilization |
Memory utilization of the host system, as a percentage |
host_system.memory.utilization_ratio |
Memory utilization ratio of the host system |
host_system.memory.used |
Memory used by the host system, in megabytes |
host_system.disk.read |
Disk read of the host system, in kilobytes per second |
host_system.disk_latency |
Disk latency of the host system, in milliseconds |
host_system.cpu.usage |
CPU usage of the host system, as a percentage |
host_system.cpu.utilization_ratio |
CPU utilization ratio of the host system |
host_system.cpu.capacity |
CPU capacity of the host system, in megahertz |
host_system.cpu.reserved_capacity |
Reserved CPU capacity of the host system, in megahertz |
host_system.cpu.average_speed |
Average CPU speed of the host system, in megahertz |
host_system.cpu.used |
CPU used by the host system, in megahertz |
host_system.vsan.throughput |
vSAN read-write throughput of the host system, in bytes |
host_system.vsan.iops |
vSAN IOPS of the host system |
host_system.vsan.latency |
vSAN latency of the host system, in microseconds |
host_system.vsan.client_cache_hits |
vSAN client cache hits of the host system |
host_system.vsan.client_cache_hit_rate |
vSAN client cache hit rate of the host system, as a percentage |
host_system.vsan.congestions |
vSAN congestion value of the host system |
resource_pool.memory.swapped_bytes |
vCenter swapped memory, in megabytes |
resource_pool.memory.shared_bytes |
vCenter shared memory, in megabytes |
resource_pool.memory.private_bytes |
vCenter private memory, in megabytes |
resource_pool.memory.shares |
Number of vCenter memory shares |
resource_pool.memory.overhead_usage_bytes |
vCenter memory overhead usage, in megabytes |
resource_pool.memory.host_usage_bytes |
vCenter memory host usage, in megabytes |
resource_pool.memory.active_guest_usage_bytes |
vCenter memory active guest usage, in megabytes |
resource_pool.memory.balloon_size |
Size of the vCenter balloon memory, in megabytes |
resource_pool.cpu.usage |
CPU used by vCenter, in megahertz |
resource_pool.cpu.shares |
Number of CPU shares in the resource pool, in megabytes |
vm.memory.ballooned |
Size of the VM balloon memory, in megabytes |
vm.network.throughput_bytes |
Network throughput of the VM, in kilobytes per second |
vm.memory.used_percent |
Memory used by the VM, as a percentage of available memory |
vm.memory.usage_bytes |
Memory used by the VM, in megabytes |
vm.disk.throughput_bytes |
Disk read-write throughput of the VM, in kilobytes per second |
vm.disk.used_percent |
Disk usage of the VM, as a percentage of available storage |
vm.disk.usage_bytes |
Disk usage of the VM, in bytes |
vm.disk.free_bytes |
Available disk space of the VM, in bytes |
vm.cpu.used_percent |
CPU usage of the VM, as a percentage of available CPU |
vm.cpu.usage |
CPU usage of the VM, in megahertz |
vm.cpu.ready_percent |
CPU of the VM that's ready but unable to run, as a percentage |
vm.vsan.throughput |
vSAN read-write throughput of the VM, in bytes |
vm.vsan.iops |
vSAN IOPS of the VM |
vm.vsan.latency |
vSAN latency of the VM, in microseconds |
vm.vsan.readCount |
vSAN read count of the VM |
vm.vsan.writeCount |
vSAN write count of the VM |
vsphere.cpu.available |
CPU available across clusters managed by vSphere, in gigahertz |
vsphere.memory.available |
Memory available across clusters managed by vSphere, in gigabytes |
vsphere.clusters.total |
Number of clusters managed by vSphere |
vsphere.clusters |
Count of clusters managed by vSphere with the cluster status color code (like green, grey, red, or yellow) |
vsphere.cpu.cpus |
Total number of host system CPU cores managed by vSphere |
vsphere.datacenters |
Number of datacenters managed by vSphere |
vsphere.datastores.total |
Number of datastores in vSphere |
vsphere.datastores |
Count of datastores managed by vSphere with the datastore status color code (like green, grey, red, or yellow) |
vsphere.disk.space |
Total attached disk space in vSphere, in terabytes |
vsphere.host_systems |
Number of host systems in vSphere |
vsphere.hosts |
Count of host systems managed by vSphere with the host system status color code (like green, grey, red, or yellow) |
vsphere.network.adapters |
Number of network adapters in vSphere |
vsphere.hosts.powered_off |
Number of powered off hosts in vSphere |
vsphere.hosts.powered_on |
Number of powered on hosts in vSphere |
vsphere.virtual_machines.total |
Number of VMs across all vSphere clusters |
vsphere.virtual_machines |
Count of VMs managed by vSphere with the VM status color code (like green, grey, red, or yellow) |