Google Distributed Cloud air-gapped 1.13.1 release notes

June 28, 2024

Google Distributed Cloud (GDC) air-gapped 1.13.1 is available.
See the product overview to learn about the features of Distributed Cloud.

Updated the Canonical Ubuntu OS image version to 20240515 to apply the latest security patches and important updates. To take advantage of the bug and security vulnerability fixes, you must upgrade all nodes with each release. The following security vulnerabilities are fixed:

CVE-2015-1197
CVE-2016-9840
CVE-2016-9841
CVE-2018-25032
CVE-2022-37434
CVE-2023-4421
CVE-2023-5388
CVE-2023-6135
CVE-2023-7207
CVE-2023-24023
CVE-2023-46838
CVE-2023-47233
CVE-2023-52530
CVE-2023-52600
CVE-2023-52603
CVE-2024-0607
CVE-2024-1086
CVE-2024-1441
CVE-2024-2494
CVE-2024-2496
CVE-2024-2961
CVE-2024-3094
CVE-2024-23851
CVE-2024-24806
CVE-2024-26581
CVE-2024-26589
CVE-2024-26614
CVE-2024-26622
CVE-2024-28182
CVE-2024-28834
CVE-2024-34397

Updated the Rocky OS image version to 20240506 to apply the latest security patches and important updates. The following security vulnerabilities are fixed:

The following container image security vulnerabilities are fixed:

Fixed a vulnerability with databases running as containers in the system cluster.

Billing:

Added the capability to create billing accounts and link them to an organization or project from the GDC console.

Added the capability to enable partner billing when creating an organization to let Google charge the partner directly.

Cluster management:

Added the capability to view container workloads from the GDC console.

Custom IP addressing:

Added the ability to override the IP address that is assigned to organizations to enable Direct Connect (DX) Interconnect features.

Database Service:

Added a major upgrade to deliver improved security and reliability. All database workloads now run on service cluster. This upgrade necessitates the removal of existing databases. To safeguard your data, ensure to export and delete all existing database clusters before the upgrade. Refer to the database service documentation on how to export and import data.
Added a feature for AlloyDB to support same-zone high availability (HA).
Added the ability for AlloyDB to support backup, restore, and point-in-time recovery features.
Added the ability for AlloyDB to support data import, export, and advanced migration features.

Dynamic expansion:

Add additional compute and storage resources with dynamic expansion without needing to complete a redeployment. GDC versions prior to 1.13.1 only allowed the addition of hardware at a redeployment. This type of expansion is known as a static expansion.

Harbor-as-a-Service:

Added Harbor-as-a-Service (HaaS) which is a fully managed service that stores and manages container images using Harbor.

Machine types:

Added new machine types for worker nodes and VM workloads.

Marketplace:

Introduced customizable configuration of marketplace services.
Starburst Enterprise (BYOL) is available in the air-gapped marketplace. Starburst Enterprise provides a fast, scalable, distributed MPP SQL engine for your data lakehouse with query federation to many other data sources.
Prisma Cloud Compute Edition by Palo Alto Networks (BYOL) is available in the air-gapped marketplace. Prisma Cloud Compute Edition by Palo Alto Networks offers modern protections for distributed applications.

Multi-zone deployments:

Added Multi-zone functionality, which provides cloud-like high availability and disaster recovery capabilities as-a-service to simplify managing resources across GDC zones. Multi-zone deployment capabilities are in Preview.

Public Key Infrastructure:

When issuing web certificates, you can configure different PKI modes after org creation. The configurable modes include Infra PKI Fully Managed, BYO-SubCA, BYO-Cert with ACME, and BYO-Cert.

Object storage:

Added a bucket Spec.location field to specify the zone where its objects are located. During bucket creation, if no value is provided, the field is automatically populated with the name of the zone where the bucket is created. Existing buckets automatically have their field populated with the name of the zone where they reside.

Virtual machines (VM):

Added support for importing your own Ubuntu 22.04 OS image for virtual disks.

Vertex AI:

Introduced Vertex AI Chirp Speech-to-Text, which is a universal speech model.
Added supported languages on Vertex AI Translation to translate from English.
Introduced PyTorch as a supported framework for Vertex AI Online Predictions.

VPN:

Added the capability to extend a peer network to a user's VM in an organization of a GDC zone.

Artifact Registry:

When creating the root admin cluster, the operation might fail if there is a long list of servers when bootstrapping.

Backup and restore:

Attempting to restore a backup to a quota-constrained user cluster fails.

Billing:

Billing metrics are not correctly emitted to the cortex because of the missing MetricsProxySidecar.

Block storage:

Virtual machine launcher pods fail to map volumes.
Storage-related failures might make the system unusable.
Persistent volumes are created with an incorrect size.
When an organization is deactivated, there might be an issue deleting a StorageVirtualMachine.
Secrets and certificates are not cleaned up after deactivating an organization.
A deletion reconciliation failure can occur in the StorageVirtualMachine.
Ansible jobs get stuck during the bare metal upgrade.

Cluster management:

The machine-init job fails during cluster provisioning.
The connection for a database pod running in the service cluster to an object storage bucket in the org admin cluster fails.
The preflight check fails.
User clusters when recreated might be stuck in reconciling.

Database Service:

For user-facing databases, the initial provisioning, resizing, or enabling HA on an existing database cluster takes up to 40 minutes longer than before, and the performance is two to three times slower than before.
The database service clone does not work for a storage quota-constrained cluster due to a problem with backup and restore.
The Iops enforcement might impact storage performance.

DNS:

DNSSEC must be explicitly turned off in resolved.conf.

Harbor:

Deleting Harbor instances doesn't delete the associated registry mirrors. The nodepool might be stuck in a state of Provisioning.

Infrastructure as code (IAC):

Excessive GitLab token creation risks filling GitLab databases.

Key Management Service (KMS):

When the kms-rootkey-controller memory usage exceeds the 600Mi limit, the controller enters a CrashLoopBackOff due to an OOMKilled status.

Logging:

The object storage audit logger can't resolve the DNS host.

Monitoring:

Dashboards don't display Vertex AI metrics.
The mon-cortex pod has a reconciliation error.
The metrics-server-exporter pod in the system cluster is crash looping.
The mon-prober-backend-prometheus-config ConfigMap gets reset to include no probe jobs, and alert MON-A0001 is triggered.
After configuring the Monitoring service to send alerts, multiple duplicate alerts are automatically created.
The ObservabilityPipeline object shows Reconciler error logs that you must ignore.

Multi-zone bootstrap:

There are no specific roles for bootstrapping multi-zone deployments.
The Bootstrap resource that is created is incompatible with the logic that processes it.
A required resource is not created during bootstrap, causing components that rely on this resource to not function correctly.

Networking:

The node is not reachable.
There are connectivity issues to Database Service instances.
A PodCIDR is not assigned to nodes even though a ClusterCIDRConfig is created.
A VM node has drifted or inaccurate time.
The multi-zone EVPN interconnect session peering IP addresses being generated are incorrect.
Node is not reachable on the Data Network.

Object storage:

Deleting an organization might not be successful.

Operating system:

In rare situations, pods are stuck in the init state on a particular node.
The bm-system-machine-preflight-check Ansible job for a bare metal or VM node fails with Either ip_tables or nf_tables kernel module must be loaded.

Operations Suite Infrastructure (OI):

For Hardware 3.0, Launch Smart Storage Administration (SSA) is no longer needed.

Perimeter security:

The org system cluster gets stuck during the organization bootstrap.

The PANW firewall AddressGroups don't update with the OCITcidr-claim changes, resulting in unresolvable iac.gdch.domain.example domains.

Platform security:

When PKI BYO SubCA mode generates a new certificate signing request (CSR) while a previously signed certificate is uploaded to the SubCA, the reconciler doesn't check if the new CSR matches the old signed certificate and marks the cert-manager CertificateRequest custom resource (CR) as Ready. This occurs during SubCA certificate renewal or manual rotation.

A known issue in cert-manager results in unsuccessful issuance of PKI bring-your-own (BYO) certificates with Automated Certificate Management Environment (ACME).

Physical servers:

The server is stuck at the provisioning state.
The server bootstrap fails due to POST issues on the HPE server.
The server is stuck in the provisioning state.

Resource Manager:

The status of a project is not displayed in the GDC console.

Upgrade:

The bm-system and other jobs running ansible playbook are stuck at gathering facts.
The management IP of a server is unreachable during upgrade.
The upgrade fails in the iac-zoneselection-global subcomponent.

Vertex AI:

The MonitoringTarget shows a Not Ready status when user clusters are being created, causing pre-trained APIs to continually show an Enabling state in the user interface.
The Translation frontend pod and service fail to initialize because the ODS system cluster secret is outdated.

Virtual machines:

The BYO image import fails for qcow2 and raw images.
Provisioning a disk from a custom image fails.
The object storage upgrade shows an error during the postflight or preflight check.

Billing:

Fixed an issue with the invoice generator job failing to create an invoice custom resource due to the invalid name GDCH_INTERNAL.

Networking:

Fixed an issue with upgrade failing due to a unsuccessful generation of the hairpinlink custom resource.
Red herring errors `Got error on getting port speed` is shown on network installation.

Add-on Manager:

The Google Distributed Cloud version is updated to 1.29.100-gke.251 to apply the latest security patches and important updates.

See Google Distributed Cloud 1.29.100-gke.251 release notes for details.

Version update:

The Debian-based image version is updated to bookworm-v1.0.1-gke.1.

Operations Suite Infrastructure (OI):

The OI Marvin account, used for configuration management in the OI infrastructure environment, has a 60-day expiration period.