Google Distributed Cloud air-gapped 1.13.1 release notes

June 28, 2024


Google Distributed Cloud (GDC) air-gapped 1.13.1 is available.
See the product overview to learn about the features of Distributed Cloud.

Updated the Canonical Ubuntu OS image version to 20240515 to apply the latest security patches and important updates. To take advantage of the bug and security vulnerability fixes, you must upgrade all nodes with each release. The following security vulnerabilities are fixed:



The following container image security vulnerabilities are fixed:


Fixed a vulnerability with databases running as containers in the system cluster.


Billing:

  • Added the capability to enable partner billing when creating an organization to let Google charge the partner directly.

Cluster management:

Custom IP addressing:

  • Added the ability to override the IP address that is assigned to organizations to enable Direct Connect (DX) Interconnect features.

Database Service:

  • Added a major upgrade to deliver improved security and reliability. All database workloads now run on service cluster. This upgrade necessitates the removal of existing databases. To safeguard your data, ensure to export and delete all existing database clusters before the upgrade. Refer to the database service documentation on how to export and import data.
  • Added a feature for AlloyDB to support same-zone high availability (HA).
  • Added the ability for AlloyDB to support backup, restore, and point-in-time recovery features.
  • Added the ability for AlloyDB to support data import, export, and advanced migration features.

Dynamic expansion:

  • Add additional compute and storage resources with dynamic expansion without needing to complete a redeployment. GDC versions prior to 1.13.1 only allowed the addition of hardware at a redeployment. This type of expansion is known as a static expansion.

Harbor-as-a-Service:

  • Added Harbor-as-a-Service (HaaS) which is a fully managed service that stores and manages container images using Harbor.

Machine types:

Marketplace:

  • Introduced customizable configuration of marketplace services.
  • Starburst Enterprise (BYOL) is available in the air-gapped marketplace. Starburst Enterprise provides a fast, scalable, distributed MPP SQL engine for your data lakehouse with query federation to many other data sources.
  • Prisma Cloud Compute Edition by Palo Alto Networks (BYOL) is available in the air-gapped marketplace. Prisma Cloud Compute Edition by Palo Alto Networks offers modern protections for distributed applications.

Multi-zone deployments:

  • Added Multi-zone functionality, which provides cloud-like high availability and disaster recovery capabilities as-a-service to simplify managing resources across GDC zones. Multi-zone deployment capabilities are in Preview.

Public Key Infrastructure:

  • When issuing web certificates, you can configure different PKI modes after org creation. The configurable modes include Infra PKI Fully Managed, BYO-SubCA, BYO-Cert with ACME, and BYO-Cert.

Object storage:

  • Added a bucket Spec.location field to specify the zone where its objects are located. During bucket creation, if no value is provided, the field is automatically populated with the name of the zone where the bucket is created. Existing buckets automatically have their field populated with the name of the zone where they reside.

Virtual machines (VM):

Vertex AI:

VPN:


Backup and restore:

  • Attempting to restore a backup to a quota-constrained user cluster fails.

Billing:

  • Billing metrics are not correctly emitted to the cortex because of the missing MetricsProxySidecar.

Block storage:

  • Virtual machine launcher pods fail to map volumes.
  • Storage-related failures might make the system unusable.
  • Persistent volumes are created with an incorrect size.
  • When an organization is deactivated, there might be an issue deleting a StorageVirtualMachine.
  • Secrets and certificates are not cleaned up after deactivating an organization.
  • A deletion reconciliation failure can occur in the StorageVirtualMachine.
  • Ansible jobs get stuck during the bare metal upgrade.

Cluster management:

  • The machine-init job fails during cluster provisioning.
  • The connection for a database pod running in the service cluster to an object storage bucket in the org admin cluster fails.
  • The preflight check fails.

Database Service:

  • For user-facing databases, the initial provisioning, resizing, or enabling HA on an existing database cluster takes up to 40 minutes longer than before, and the performance is two to three times slower than before.
  • The database service clone does not work for a storage quota-constrained cluster due to a problem with backup and restore.
  • The Iops enforcement might impact storage performance.

DNS:

  • DNSSEC must be explicitly turned off in resolved.conf.

Infrastructure as code (IAC):

  • Excessive GitLab token creation risks filling GitLab databases.

Key Management Service (KMS):

  • When the kms-rootkey-controller memory usage exceeds the 600Mi limit, the controller enters a CrashLoopBackOff due to an OOMKilled status.

Monitoring:

  • Dashboards don't display Vertex AI metrics.
  • The mon-cortex pod has a reconciliation error.
  • The metrics-server-exporter pod in the system cluster is crash looping.
  • The mon-prober-backend-prometheus-config ConfigMap gets reset to include no probe jobs, and alert MON-A0001 is triggered.
  • After configuring the Monitoring service to send alerts, multiple duplicate alerts are automatically created.
  • The ObservabilityPipeline object shows Reconciler error logs that you must ignore.

Multi-zone bootstrap:

  • There are no specific roles for bootstrapping multi-zone deployments.
  • The Bootstrap resource that is created is incompatible with the logic that processes it.
  • A required resource is not created during bootstrap, causing components that rely on this resource to not function correctly.

Networking:

  • The node is not reachable.
  • There are connectivity issues to Database Service instances.
  • A PodCIDR is not assigned to nodes even though a ClusterCIDRConfig is created.
  • A VM node has drifted or inaccurate time.

Object storage:

  • Deleting an organization might not be successful.

Operating system:

  • In rare situations, pods are stuck in the init state on a particular node.
  • The bm-system-machine-preflight-check Ansible job for a bare metal or VM node fails with Either ip_tables or nf_tables kernel module must be loaded.

Operations Suite Infrastructure (OI):

  • For Hardware 3.0, Launch Smart Storage Administration (SSA) is no longer needed.

Perimeter security:

  • The org system cluster gets stuck during the organization bootstrap.
  • The PANW firewall AddressGroups don't update with the OCITcidr-claim changes, resulting in unresolvable iac.gdch.domain.example domains.

Platform security:

  • When PKI BYO SubCA mode generates a new certificate signing request (CSR) while a previously signed certificate is uploaded to the SubCA, the reconciler doesn't check if the new CSR matches the old signed certificate and marks the cert-manager CertificateRequest custom resource (CR) as Ready. This occurs during SubCA certificate renewal or manual rotation.
  • A known issue in cert-manager results in unsuccessful issuance of PKI bring-your-own (BYO) certificates with Automated Certificate Management Environment (ACME).

Physical servers:

  • The server is stuck at the provisioning state.
  • The server bootstrap fails due to POST issues on the HPE server.
  • The server is stuck in the provisioning state.

Upgrade:

  • The bm-system and other jobs running ansible playbook are stuck at gathering facts.
  • The management IP of a server is unreachable during upgrade.
  • The upgrade fails in the iac-zoneselection-global subcomponent.

Vertex AI:

  • The MonitoringTarget shows a Not Ready status when user clusters are being created, causing pre-trained APIs to continually show an Enabling state in the user interface.
  • The Translation frontend pod and service fail to initialize because the ODS system cluster secret is outdated.

Virtual machines:

  • The BYO image import fails for qcow2 and raw images.
  • Provisioning a disk from a custom image fails.
  • The object storage upgrade shows an error during the postflight or preflight check.

Networking:

  • Fixed an issue with upgrade failing due to a unsuccessful generation of the hairpinlink custom resource.

Add-on Manager:

Version update:

  • The Debian-based image version is updated to bookworm-v1.0.1-gke.1.

Operations Suite Infrastructure (OI):

  • The OI Marvin account, used for configuration management in the OI infrastructure environment, has a 60-day expiration period.