Tech 8 min read

Vertex AI SDK bucket squatting: Model.upload model-swap RCE, fixed in 1.148.0

IkesanContents

TL;DR

What happened Vertex AI SDK for Python (google-cloud-aiplatform) Model.upload() derived a predictable default staging bucket name when staging_bucket was not set, opening a model-swap path to RCE. Unit 42 confirmed 1.139.0 and 1.140.0 as vulnerable

Who Anyone uploading a local model with Model.upload() without an explicit staging_bucket, where the target project/region default staging bucket does not yet exist and an attacker pre-created a same-named Cloud Storage bucket. The attacker needs no permission on the victim project

What to do Upgrade google-cloud-aiplatform to 1.148.0 or later, and pass a staging_bucket you own on model upload:

  1. pip install -U google-cloud-aiplatform to 1.148.0+
  2. Set an explicit staging_bucket (a Cloud Storage bucket in your project) on aiplatform.init() or Model.upload()
  3. Check the SDK version in notebooks, CI, training pipelines, and batch runners, not just requirements.txt

Palo Alto Networks Unit 42 disclosed a flaw in the model upload path of the Vertex AI SDK for Python that goes from bucket squatting (an attacker claiming a predictable Cloud Storage bucket name first) to RCE (remote code execution).
It sits in the default staging bucket handling of Model.upload(): when staging_bucket was not set, the SDK built a name of the form PROJECT-vertex-staging-REGION from the project ID and region.

Google added a UUID-derived salt to the auto-created staging bucket name in 1.144.0 on March 31, 2026, and added bucket ownership verification in 1.148.0, released on April 15, 2026. As of Unit 42’s disclosure, no CVE for this issue appears in Google’s security bulletins or in Unit 42’s article.

It only checked exists(), not the owner

The problem was that the SDK checked whether a bucket with that name existed, and when it did, never verified that the calling project actually owned it.
Cloud Storage bucket names are unique across all of Google Cloud, so if an attacker creates a bucket with the same name in a different project first, the victim SDK’s existence check passes.

Per Unit 42, the attacker only needs to know the victim project ID and region.
Project IDs leak through documentation, sample code, public logs, error output, and config on GitHub.
Logging into the victim project or stealing an OAuth token is not the entry point.

The conditions are narrow.
It holds when the default staging bucket has not yet been created in the target region, and staging_bucket is not set explicitly in aiplatform.init() or Model.upload(). Registering a model to Vertex AI from a new project, or in a region used for the first time, is the most likely match.

The model gets swapped within the 2.5s after upload

The attacker grants allAuthenticatedUsers read/write on the squatted bucket so that both the upload from the victim SDK and the read from the Vertex AI service agent go through.
On top of that, they wire a Cloud Function to the Cloud Storage object finalize event, and the moment the victim’s uploaded model.joblib is detected, they swap it for a malicious joblib file.

In Unit 42’s PoC (proof-of-concept code), the window between the victim SDK’s upload and the Vertex AI service agent reading it was about 2.5 seconds.
The Cloud Function reacted in about 804 milliseconds, swapped the model in about 1.4 seconds, and about 2.46 seconds later the service agent read the swapped file.

flowchart TD
  A["Attacker creates GCS bucket<br/>with the predictable name"] --> B["Victim runs Model.upload()<br/>with no staging_bucket"]
  B --> C["SDK uploads the model<br/>to the attacker bucket"]
  C --> D["Cloud Function fires<br/>on object finalize"]
  D --> E["joblib model swapped<br/>for a malicious file"]
  E --> F["Vertex AI service agent<br/>reads the swapped file"]
  F --> G["pickle/joblib deserialization<br/>runs code at deploy time"]

  style G fill:#991b1b,color:#fff

The point where RCE executes is the Python deserialization on the model serving side.
pickle and joblib carry a mechanism that can lead to arbitrary code execution at load time.
They are used as model artifacts as a matter of course in machine learning, but a design that calls joblib.load() / pickle.load() on an untrusted file allows code execution at that point.

A leaked token reached other model artifacts and log data

Unit 42 tried a payload (the malicious code itself) that, from the swapped model, accesses the Google Compute Engine metadata server and exfiltrates the serving container’s service account OAuth token.
In Unit 42’s test environment, that token could read surrounding resources beyond the single victim deployment.

The confirmed scope was broad. Within the same Google-managed tenant project, it could read the Cloud Storage bucket and the full TensorFlow model artifacts of another deployment. On the victim project side, it could enumerate BigQuery dataset names, table names, and access control lists. From Cloud Logging on the Google-managed tenant side, it surfaced GKE cluster names, running prediction deployments, Google-internal container image URIs, and even Kubernetes system IDs.

The swap itself amounts to “my model gets replaced,” but the service account token the swapped model obtained reached information beyond the unit of deployment.

For an example of a convenient dev/ops path leading to credentials, there is the earlier post on CVE-2025-24964, where the Vitest API server’s WebSocket led to RCE on a local dev machine.
Vitest is a local dev server and this Vertex AI case is the staging step of a cloud SDK, so the targets differ, but in both cases the permissions were taken from a peripheral dev/registration flow rather than the production service itself.

The Miasma posts also looked at this on the assumption that GitHub, npm, Azure, GCP, AWS, and Kubernetes credentials collect on dev machines and CI runners.
In Miasma, which took down 73 Microsoft repositories, opening a source repository was the entry point.
This Vertex AI case makes the temporary bucket of a model upload the entry point.

1.148.0 adds the ownership check

Google’s fix came in two stages.
Per Unit 42, 1.144.0 on March 31, 2026 added a uuid4-derived salt to the staging bucket name. After that, 1.148.0 on April 15, 2026 added bucket ownership verification to Model.upload().
The python-aiplatform changelog on GitHub also records bucket ownership verification for Model.upload() under 1.148.0 Bug Fixes.

The version to confirm is 1.148.0 or later.
It is not just “the name became harder to guess”; it includes the fix that verifies ownership when using an existing bucket.

python -m pip show google-cloud-aiplatform
python -m pip freeze | grep google-cloud-aiplatform

In notebooks, the local virtual environment and the kernel can be loading different versions.
Check the google-cloud-aiplatform version separately in Jupyter / Colab Enterprise / Workbench / CI / and the base image of training jobs. Confirm it in the runtime environment, not just requirements.txt.

Set staging_bucket yourself and pass it explicitly

In addition to upgrading the SDK, pass a Cloud Storage bucket you own as staging_bucket on model upload.
Even when you hand the SDK a local directory as artifact_uri, the staging destination is Cloud Storage.
Pin whether that temporary location is managed under your own project and your own IAM.

from google.cloud import aiplatform

aiplatform.init(
    project="my-project",
    location="us-central1",
    staging_bucket="gs://my-project-vertex-staging-us-central1",
)

model = aiplatform.Model.upload(
    display_name="my-model",
    artifact_uri="local_model_dir",
    serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest",
)

Create this bucket ahead of time with Terraform or similar, and manage it with Uniform bucket-level access, Public Access Prevention, minimal IAM, and logging.
Fixing the name, owner, and permissions on the environment-setup side is easier to trace after an incident than leaving it to “create one if the SDK has none.”

Watch for the same name-generation pattern as CVE-2026-2473

Google’s security bulletins list CVE-2026-2473 as GCP-2026-012, published on February 20, 2026.
That one was a predictable bucket name issue in Vertex AI Experiments 1.21.0 up to but not including 1.133.0, leading from a pre-created Cloud Storage bucket to cross-tenant RCE (remote code execution crossing another tenant boundary), model theft, and model poisoning (an attacker swapping the artifact).
Google states it is mitigated in 1.133.0 and later.

This Model.upload() issue is not CVE-2026-2473 itself.
But the check points are close: name generation, existence check, the missing ownership check, and reachability into the Google-managed tenant side.

As an SDK version, confirm that google-cloud-aiplatform is 1.148.0 or later, and in environments using Vertex AI Experiments, that nothing below 1.133.0 remains. In code, look at whether Model.upload(), aiplatform.init(), and pipeline definitions omit staging_bucket. And in Cloud Storage, check for buckets in the PROJECT-vertex-staging-REGION form or old default names that have no known owner, overly broad IAM, or no logging.

In public information, no real-world exploitation of this Model.upload() issue has been confirmed.
Still, if a vulnerable SDK stays in notebooks or CI, the conditions line up the moment the same code runs again in a new region or new project.