PageJack in Action: CVE-2022-0995 exploit

PageJack is a Linux kernel exploitation technique useful to generate a Use After Free (UAF) in the page allocator. In this article we provide a detailed example of how to use it to exploit a Linux kernel vulnerability from 2022. Introduction In this article, we will explore how a relatively old CVE can be exploited using PageJack , a modern kernel exploitation technique introduced in 2024 by Zhiyun Qian at Black Hat USA . You can find a link to the full exploit at the end of this article. The vulnerability (CVE-2022-0995) CVE-2022-0995 is an out-of-bounds (OOB) write vulnerability caused by an incorrect bounds check in the watch_queue event notification mechanism of the Linux kernel. It affects kernel version 5.17 and above and can lead to privilege escalation. Root cause analysis In Linux systems, the kernel needs a mechanism to notify user space about various events. To achieve this, it implements an internal pipe-backed ring buffer used to store messages generated by the kernel. These messages can then be retrieved from user space using the read() system call. A process can specify which event sources it wants to monitor through an ioctl. Filters can also be applied so that only selected source types and sub-events are delivered, thus allowing certain types of notifications to be ignored. When a process adds a filter, the kernel invokes the watch_queue_set_filter() function. However, in kernel version 5.17 and above, a flaw in this function can lead to an out-of-bounds write in the kernel heap. watch_queue_set_filter() implementation If a user wants to set a filter for kernel messages, they must provide a list of filters that the kernel will use. To do so, the user supplies two structures: struct watch_notification_filter { __u32 nr_filters ; __u32 __reserved ; struct watch_notification_type_filter filters []; }; struct watch_notification_type_filter { __u32 type ; __u32 info_filter ; __u32 info_mask ; __u32 subtype_filter [ 8 ]; }; The user can specify the number of filters they want to apply, as well as the type of each filter, and those filters are passed to the kernel through the ioctl IOC_WATCH_QUEUE_SET_FILTER . The kernel-side handler for this ioctl is the watch_queue_set_filter() function. It takes two parameters: a pipe_inode_info structure (which represents the pipe in the kernel) the filter list provided by the user The purpose of this function is to copy all the filters set in user-space into the kernel. To do this, the kernel first copies the filter from user space, counts the number of valid filters provided by the user, and then copies these filters into the kernel heap. This is done with two for loops: long watch_queue_set_filter ( struct pipe_inode_info * pipe , struct watch_notification_filter __user * _filter ) { struct watch_notification_type_filter * tf ; // Filter list struct watch_notification_filter filter ; struct watch_type_filter * q ; struct watch_filter * wfilter ; int ret , nr_filter = 0 , i ; ... if ( copy_from_user ( & filter , _filter , sizeof ( filter )) != 0 ) return - EFAULT ; ... tf = memdup_user ( _filter -> filters , filter . nr_filters * sizeof ( * tf )); ... for ( i = 0 ; i = sizeof ( wfilter -> type_filter ) * 8 ) continue ; nr_filter ++ ; } ... wfilter = kzalloc ( struct_size ( wfilter , filters , nr_filter ), GFP_KERNEL ); // Alloc enough space for the filters ... q = wfilter -> filters ; for ( i = 0 ; i = sizeof ( wfilter -> type_filter ) * BITS_PER_LONG ) continue ; q -> type = tf [ i ]. type ; q -> info_filter = tf [ i ]. info_filter ; q -> info_mask = tf [ i ]. info_mask ; q -> subtype_filter [ 0 ] = tf [ i ]. subtype_filter [ 0 ]; __set_bit ( q -> type , wfilter -> type_filter ); q ++ ; } ... } Here, tf is a copy of the filter list provided by the user. The first for loop counts the number of valid filters. In this loop, the validity of a filter type is checked using: if (tf[i].type >= sizeof(wfilter->type_filter) * 8) After counting the valid filters, the function allocates enough memory to store them. Here, kzalloc() allocates a kernel object whose size depends on the value of nr_filter . Since the filters come from user-space, we can control the number of filters and, consequently, the size of the allocation. In the second for loop, the filter values are copied into kernel heap memory. The function checks if the user provided filter type is valid, using: if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) Out Of bounds bug This code is the root cause of the out-of-bounds write vulnerability. The problem is that sizeof(wfilter->type_filter) * BITS_PER_LONG is not equal to sizeof(wfilter->type_filter) * 8 . More precisely, in the first loop the type is checked to be less than 128, while in the second loop it is checked to be less than 1024. Because of this bug, the second loop can accept a filter type that was not accounted for during the allocation in the first loop. Here we have two out-of-bounds (OOB) issues: The second loop can write out...

PageJack in Action: CVE-2022-0995 exploit

Summary

Published Analysis

Linked Entities