Please Stop Naming Vulnerabilities: Exploring 6 Previously Unknown Remote Kernel Bugs Affecting Android Phones
Prelude
In today’s world everyone knows that a security vulnerability isn’t really a security vulnerability unless it has been given a name other than a CVE, a Hype Krew has been hired to promote it, a blog post has been written for it, and a Blakhat talk is delivered. To continue with the tradition of naming and hyping bugs, I present to you “Please Stop Naming Vulnerabilities”, 6 remote (proximal) Kernel memory corruption bugs present in some routers and Android phones, including Google’s Pixel(XL) and Nexus 5x.
Introduction
For the past two years I spent time auditing various Android kernels looking for vulnerabilities. Over these two years I’ve found and reported around fifty bugs. In late 2016 and early 2017 I became burned-out from hunting bugs as it became a race between some very talented researchers and I. For example Derrek and I spent 2 months auditing and coding up PoCs for some bugs in a wifi driver where nearly 95% of our reports were duplicates with the Chinese teams. It was becoming hard to compete with teams who do this 40 hours a week while I do it as a hobby over coffee on Saturday mornings.
After the burnout subsided, I wanted to find some super cool bugs to reignite my interest in hunting, and, of course, find a reason to use this domain. As a result I decided to start looking for remote memory corruption bugs. There are 8 remote bugs to talk about, but as of November 2017 only 6 are released in bulletins. I wrote about each bug and planned to dislocse all at once, but I guess we’re doing this in batches of 6. And, for what it’s worth, all the fixes are in the public repos, which means you can go find them.
I would like to thank Steve and Robert for proofreading this and making it coherent.
Technical Introduction
The bugs I found are in the qcacld Qualcomm/Atheros wifi driver which is shipped in at least two phones I am aware of (Pixel’s, Pixel’s Gen2, and 5x’s). Unlike the Broadcom wifi SoC that Gal Beniamini blogged about, or Nitay owned, the Qualcomm SoC is a partial SoftMAC, meaning some Media Access Control (MAC) Sublayer Management Entity (MLME) is handled in host software, not on hardware or in SoC firmware. Because of this, the source code for handling any sort of 802.11 management frames must be in the driver and is thus available to look at.
Knowing where to look turned out to be more difficult than I hoped for. Anyone who has found bugs in the qcacld driver or written code for it, can attest to its confusing interfaces. Just to show you how out of control this driver truly is let’s see how many lines of code it took to implement it:
As you can see above, there are nearly 691k lines of code for this single wifi driver. My plan was to look for some interrupt from the wifi chipset firmware to host driver notifying it of available packets and frames. From there I planned to follow the call chains until I found a bug. Unfortunately I could no pinpoint the true entry into the host drive and eventually gave up. In a last ditch effort, I started greping for features I knew a wifi driver might have to implement. At the time of hunting (Feburary and March 2017) for these bugs Gal hadn’t written his blog post so I didn’t have any of the good tips he provided. I knew there were de-authentication frames, so I started grepping for that.
Luckily I was greeted with the following:
I quickly zoned in on CORE/MAC/src/pe/lim/limProcessDeauthFrame.c:96:
. From there I decided to see what else was in this directory:
As you can see there a plethora of files dedicated to management frames. It seemed as if I found the right location. The last thing I needed to do before auditing was confirm this code was actually used. I put a pr_err("%s: HELLO WORLD!\n", __func__);
in the main function handler for 802.11 management frames and confirmed that the code was in use.
Basics for the bug:
Before I can explain the bugs, it is requird that you understand the surrounding code. At some point in the call chain after notification from the firmware of packets, we’ll enter into this function:
In this function we’ll verify the packet is a management frame, we’ll parse the subtype and call the correct management handler:
Once the correct handler is called, we need to parse the packet from a raw over-the-wire (over-the-air) form to a c-style structure we can use.
This conversion happens in the behemoth file dot11f.c
which is file that is compiler generated and handles the conversions.
An example of the over-the-wire raw bytes to a c-style struct can be seen below:
Interestingly, after we parse from over-the-wire bytes to the c-style structure, we’ll also have to parse it even further from a c-style structure to another c-style structure. The dot11f c-style structures have every field that can be in the 80211 management frames. So for example, if the management frame can contain a list of capabilities, there will be a struct in the dot11f “out” structure for capabilities. The driver may not want to use these capabilities, so instead of passing around a large structure with ever single 80211 field it won’t need, it will further pull out items from the large structure into smaller structures. That way we’re only allocating memory for things the driver actually will use. This further refinement is handled in the following functions:
After refinement the driver uses the data however it needs to; what really happens after isn’t super relevant because that’s not where the bugs are. Below is a small graphic illustrating the data flow of packets.
The best bug (CVE-2017-11013):
Now that we have a general understanding of the high level and general data flow, let’s jump into the first and best bug.
The first bug is present in the dot11f.c
file which as I’ve alluded to above is tasked with parsing over-the-wire packets to a c-style structure. When the dot11f.c
file gets an over-the-wire packet the packet is structured in a series of Information Elements (IE)s where each IE is a Type-length-value (TLV). The IE consists of a one byte tag, one byte length and up to 255 byte value (Image stolen From Gal’s blog post on project 0):
The tags are standardized to represent specific types of elements.
The dot11f.c
code is littered with IE definitions which represent what could possibly be contained in a packet. For instance, let’s say we, on our phones try to connect to our home network. At some point in the protocol chain, the Access Point will send back to us an association response. When the phone gets the association response packet, the driver will eventually call dot11fUnpackAssocResponse
which will immediately call UnpackCore
:
Let’s decompose the call to UnpackCore
a bit. pCtx
is the driver’s state structure, pBuf
is the over-the-wire list of TLVs, nBuf
is the total size of pBuf
, FFS_AssocResponse
is a list of mandatory IEs that must be in pBuf, and IES_AssocResponse
is a list of optional IEs that can be in pBuf. pFrm
is our “out” structure, meaning the parsed IEs will be translated into the members of the tDot11fAssocResponse
struct.
I’m about to dump a ton of struct definitions here, but they’re all important to understand the bug.Ssorry and try and follow along!
Let’s take a look at the definition of the IES_AssocResponse
structure. Remember this is the list of optional IE members the packet can contain:
Now take a look at the definition of the “out” structure tDot11fAssocResponse
:
For each member of tDot11fAssocResponse
structure there is a corresponding tIEDefn
structure in IES_AssocResponse
array. Now translate the tIEDefn
structure to English with respect to the out structure.
For the IES_AssocResponse list, which as a reminder is the optional IEs in the raw packet, it looks like the blow example. Each member is a tIEDefn we described above. As a quick example take a few of the members from below and map them the definitions above to make sure you understand how it’s setup! (I’ve removed a majority of the definitions because it’s so large):
So now that we understand some of the structure definitions the parser uses let’s take a look at the relevant parser code in UnpackCore
:
And the definition of FindIEDefn translated with comments:
At a high level FindIeDefn will loop over the entire optional IE
list, in our case IES_AssocResponse
seeing if the buffer,
at the current index, contains the tag for the IE. So the function
will start at index 0 of the IES_AssocResponse
array and say:
Does the buffer contain this eid? no, let’s try index 1, does it contain
this eid, no? let’s go to index 2, does it contain this eid, yes, okay
return the tIEDefn structure representing this eid.
Continuation of Unpack core immediately following the call to FindIEDefn:
I’ve given enough information now that you should be able to spot the bug. If you’re like me and you like to try and see the bugs in blog posts before they spoil them go back up and keep reading the code. If you need a hint: What happens when we send another WMMTSPEC IE in our Association response, how will it get parsed, what index will it get stored into on the 2nd iteration, what about the 3rd and 4th time we send a WMMTSPEC IE?
The problem with the parsing code is we never validate countOffset\num_
against arraybound
. Remember above I said countOffset\num_
represents how many elements we’ve currently parsed, and arraybound
represents how many available slots we have. So I as an adversary can send let’s say 15 WMMTSPEC IE’s and the parsing code will happily continue parsing them, storing them into slots 0->14. Remember we only have 4 slots for WMMTSPECS:
So if I send 15 WMMTSPEC IE’s we will overflow into the WscAssocRes, P2PAssocRes, VHTCaps etc until we blow outside of our tDot11fAssocResponse structure.
The reason why this bug is so good is because you can target different types of memory. Since the bug is in the generic parsing code for IEs, each “out” structure is allocated in different locations. Some are allocated on the stack, some are in .bss. So you have a wide variety of locations you can overflow; you just need to find an 802.11 management packet that has an arraybound IE and see where the out struct is allocated. You can even go from .bss to heap, which I’ll show in a bit. This bug would be an excellent target for a true proximal kernel remote code execution, because you have controlled data, and you have a variety of locations you can overflow into.
One of the most “ugh” moments of this bug was when I stumbled across this code in UnpackCore
:
I was surprised to see that for ONE IE type they had done a patch to fix memory corruption. I wanted to see under what circumstances this was fixed so I did a git blame and found the offending commit:
On October 14th 2014 QCA had the opportunity to kill this entire bug. Instead no one had the thought to say “Hey if this IE is vuln maybe there are more?”, so the bug lived on. This is a good time for me to point out that ALL developers need to think security while doing patches/writing code. Especially if it’s embedded systems/driver/C/C++ code. Had this developer had more training with security this bug could have been prevented.
The starts of an exploit using the above bug.
I want to thank Joshua J. Drake for helping with rewriting the PoCs so we didn’t have to rely on scapy and FakeAP. Also thanks for helping me with working towards an exploit.
Google Android Rewards program gives you the option to submit a working exploit that will get you up to $150k for remote kernel compromise:
I was very close to taking my 3 weeks of vacation from my job, and spending full time using this bug to get remote code execution. I emailed the Android Security guys to get clarification and it turns out my bug didn’t qualify. First because it’s proximal not remote, and second because I submitted all my bugs prior to June 1st I was only eligible for around $22k. Still a lot of money but that’s not worth me burning 3 weeks of vacation time.
My last thought on this: what they want is essentially a remote kernel compromise from SMS or webpage or something, not BT/WIFI stack. It’s my unsolicited opinion that $150k for remote kernel compromise through userland vector (requiring a long chain of exploits) on a Google Pixel phone is far too low. Even Zerodium pays too little, not that you should ever sell to those guys. It may be easy when you use some shit-old version of Android where Towel root still works, but when you’re using an up-to-date Google phone, getting kernel execution is no longer an easy task. This is why we’re seeing all these super bad bugs: Remote wifi via FW/Remote wifi via this driver. Google and Apple have successfully locked down kernel intrusion via a local route. Android’s SELinux policy is very strict, that coupled with forcing least privledge makes a local kernel compromise very difficult on Pixel phones.
Okay back to the technical details. I said above that this bug is good because there are a few regions of memory we can overflow. We can target BSS, Heap, and stack. Unfortunately for stack it’s not 2016 anymore where we could smash the stack and control the IP, thanks to Jeff’s commit:
And unfortunately there are not any pointers or anything useful between our struct and the stack cookie. There are other locations in .bss which we can overflow, which can be a useful target. Under some BSS allocations there is this gigantic VosContext structure which we can overflow into. Another thing we can do is overflow a bss segment but not deep enough to hit VoScontext. This won’t cause any side effects because we don’t smash anything important. Later on there are a few locations where we copy our overflowed structures from .bss to heap:
From above, the parsing code overflowed ar.RICDataDesc[]
and we control how many it overflowed, thus we control ar.num_RICDataDesc
as well. So we can control how many RicDataDescs
we shove into &pAssocRsp->RICData[cnt]
where pAssocRsp is our heap allocation. So using this one simple trick we can pivot from a bss overflow to a heap overflow with mostly controlled data.
Unfortunately, at this point we both lost interest and stopped here.
Bug 2 (CVE-2017-9714) and 3 : Remote Kernel DoS (infinite loops)
These bugs probably won’t be exciting for the people looking to make exploits or love memory corruption. But these ones are super cute to me because they’re integer overflow bugs. When I was attending the University of Utah for my BS and MS I worked for a time and took classes from John Regehr his blog. This guy is one of the few experts of Undefined behavior in C and taught me literally everything I know about integers in C. One of my favorite blog posts, which is now defunct was “A Quiz About Integers in C”. Luckily I located it after doing some digging on Google. You can see the quiz here. The quiz highlights the type of stuff you need to know when auditing C code. A lot of times you’ll run into really strange scenarios which end up causing exploitable bugs later in the code.
When we put the phone into Access point mode for tethering there are some other management packets that are opened up for the driver to parse. Things like association requests, authentication requests, etc. If the phone is in AP mode and we send an association request we can send along something called “opaque RSN data” or “WPA Opaque data”. I don’t know what this crap is, but you can send it along which is all we care about.
If we do send it the 80211 parsing code in UnpackCore will place it nicely for us into the pAssocReq
variable and will set pAssocReq->rsnPresent
to one. We’ll continue on and eventually get to here in /lim/limProcessAssocReqFrame.c:
When we package along some RSN Data we will call dot11fUnpackIeRSN
and pass the length/data to that function. Now the unpack function is tasked with converting the raw bytes to a c-style struct saving the results into the stack-based Dot11IERSN variable.
Let’s take a look at exactly what dot11fUnpackIeRSN
does:
You can see we extract a pwise_cipher_suite_count
from the raw buffer via the framesntohs
call. Immediately after we verify that pwise_cipher_suite_count
is less than four. If it’s larger we unset the present flag and return DOT11F_SKIPPED_BAD_IE;
So this sorta sucks, they parse it correctly, and return an error if it’s bad. We can’t do anything… or can we?
If we take a step back look at how we call dot11fUnpackIeRSN
you can see we call it the following way:
See anything broken there? The calling function doesn’t ever check the return value of dot11fUnpackIeRSN
. Continuing on with the code we will then call limCheckRxRSNIeMatch
:
Take a look at the first loop and see if you can reason about why we can infinitely loop there. Here are some hints: What is the type of pwise_cipher_suite_count, (how did we extract pwise_cipher_suite_count)? What is the type of i?
Spoiler:
Since we fully control pwise_cipher_suite_count and it is a u16 (we extracted it with framesntohs
(frames-network-to-host-short)) and the count variable (i) in the loop is of type u8 we can cause an infinite loop. If we send pwise_cipher_suite_count as 31337 dot11funpackiersn will fail, never unset pwise_cipher to 0, but it does return bad ie. We never check the return value and continue on into limCheckRxRSNIeMatch
. We then enter into the for loop where pwise_cipher_suite_count is a u16 (31337) and u8 maxes out at 255 so the u8(i) keeps looping from 255 to 0 and will obviously never get to 31337 due to width constraints. So the for loop keeps spinning until the watchdogs kick in and nuke the kernel after awhile.
From above I said you can send WPA opaque data, which is true. It’s the same bug as above, a unchecked return value leading to a u8/u16 mismatch infinite loop.
Bug 3 Remote Dos part deux
This was one of the bugs on the chopping block for this blog post. The original fix was actually incorrect, so I re-reported the bug and the new fix is in the works.
Bug 4 Heap buffer overflow (CVE-2017-9714)
There is this “new” (circa 2014 it seems) portion of 802.11 management frames which can include some QoS type information. You can send it alone or you can append it onto 802.11 association reponse. Specifically, if we append a QoS Mapset element along with our association response we can cause some corruption, let’s see how.
As you can see in the above example, we set num_dscp_exceptions and then verify that the length is less than 60. Once we verify that we start walking out of the call chain and end up in sirConvertAssocRespFrame2Struct
. At the end of this function we do the following:
We start converting from one struct to another struct by calling ConvertQosMapsetFrame( pMac, &pAssocRsp->QosMapSet, &ar.QosMapSet);
Looking at this function, it doesn’t really make any sense. If you go back up to the “Bug 4 Heap buffer overflow” header and look at the dot11fUnpackIeQosMapSet function again you can see we can put anywhere from 0 -> 60 bytes worth of data into the dscp_exceptions
array. What ever amount of data we memcpy into that array we will set the size in pDst->num_dscp_exceptions
.
Now let’s look again at the ConvertQosMapsetFrame
function:
As you can see in the above example, if we have > 58 we set the max, but there is no minimum check. So, what if we send 14 num_dscp_exceptions? Well, we do 14 - 16, which promotes to integer math so we get (INT_MIN + 1) or 0xfffffffe. When we divide int(INT_MIN + 1) by 2 we get, INT_MIN. When we stuff a full 32 bit INT_MIN into a u8 we get 255 (we just take the bottom byte of the int).
So by sending 14 num_dscp_exceptions we can trick the code into storing 255 into Qos->num_dscp_exceptions
then looping on it.
Above you can see there is 42 slots available for dscp_exceptions, so we blow off the end of this structure and into adjacent heap objects.
Bug 5 (CVE-2017-11014) Another Heap overflow.
The Qcacld driver has support for Roaming action frames which, I think are part of the 802.11k spec. This action frame is designed to assist 802.11 with Neighbor AP discovery, without having to do scans themself. It’s actually a cool idea because it was designed for power constraint devices, so they don’t have to continuously scan for networks to roam to. Instead they ask the current associated AP for list of known neighbors, and the AP will do the scan and return some info for the station.
The interesting thing is that the driver doesn’t even have to ask for this data. The access point can just send the roam action frame and qcacld will happily parse it.
So let’s send a Roam action frame and see what happens!
First we’ll get our way into the limProcessActionFrame()
function.
From there, since we’re sending a radio measure request we’ll enter __limProcessRadioMeasureRequest
Like every bug and ever packet we’ll hop into the dot11fUnpack function to parse it into our c-style structure. We’ll pass the following optional IEs array into UnpackCore. Remember from above, the IEs array describes the type of elements the parser expects to potentially be in the packet.
In this bug we’ll be focusing on the APChannel report functionality one can send with this packet.
As you can see by the definition there is a num_APChannelReport, so we could use the first bug (CVE-2017-11013) on this structure. But let’s pretend it’s been fixed. Looking at this definition, we can send 2 AP channel reports (the number after the string is the arraybound). Now check out how we parse a AP channel report:
As you can see we bound the amount of channels by 50, which is okay as the pDst structure has 50 slots:
From above, we’re allowed to send 2 AP channel reports. In this scenario let’s send 2 reports of 50 each. Now we’ll unwind the call stack and land back into __limProcessRadioMeasureRequest
where we’ll call rrmProcessRadioMeasurementRequest( pMac, pHdr->sa, &frm, psessionEntry );
&from contains our parsed frame, or pDst
as it’s referred to in the parsing core.
Inside of rrmProcessRadioMeasurementRequest
we see the following code (I skipped some irrelevant stuff):
You can see we call rrmProcessBeaconReportReq
and one of the parameters we pass in is &pRRMReq->MeasurementRequest[i]
. The structure we’re passing in contains our 2 AP reports. Now stepping into that function (and skipping some more code):
In the above code, we loop over our 2 AP channel reports and we store the channel list into pChanList
. After each list we move the pointer forward, deeper into the destination array. Remember, we sent 50 channels per AP channel report. So after the first copy, we will move the pointer up 50 bytes. Let’s take a look at what tANI_U8 *pChanList = pSmeBcnReportReq->channelList.channelNumber;
this channelNumber array looks like:
So pretty straight forward, let’s see how they typedef SIR_ESE_MAX_MEAS_IE_REQS
:
So we’re storing 100 bytes (50 per ap report) into an array that was provisioned for 8 bytes… Nonsense.
Bug 6 (CVE-2017-11015) Another heap overflow
If we send a specially crafted authentication frame to the phone while it’s in AP mode we will eventually land in sirConvertAuthFrame2Struct
which will immediately call dot11fUnpackAuthentication
.
For an authentication packet we can attach some specific TLVs:
The one of interest to us for our bug is the Challenge text. The challenge text is only useful in the WEP encryption scheme. Regardless if the phone’s AP has WEP enabled or not you can append this to the authentication frame and have the driver parse it. UnpackCore will call the specific parsing function for challenge text and we’ll arrive here:
As you can see they bound the amount of challenge text to 253 bytes. When this function completes, UnpackCore, and dot11fUnpackAuthentication finish we’ll end up back in sirConvertAuthFrame2Struct
where we will do the following:
This all looks pretty sane, but we need to see what this pAuth
structure looks like. It comes in as a parameter to our auth convert function:
sirConvertAuthFrame2Struct(tpAniSirGlobal pMac,
tANI_U8 *pFrame,
tANI_U32 nFrame,
tpSirMacAuthFrameBody pAuth)
So it has type: tpSirMacAuthFrameBody
which we find here:
Finally, we need to see how large SIR_MAC_AUTH_CHALLENGE_LENGTH
really is.
#define SIR_MAC_AUTH_CHALLENGE_LENGTH 128
So challenge text has enough space for 128 bytes… but remember above, in the raw packet parsing we made the max length 253 bytes! So when we do the memcpy:
vos_mem_copy( pAuth->challengeText, auth.ChallengeText.text, auth.ChallengeText.num_text );
we copy 125 more bytes than the structure has allocated.
Bug 7 (CVE-???-????) Remote kernel stack disclosure
This bug is quite strange. It seems to be a case of a //TODO
that never got TODO’d and thus lead to a pretty bad bug. Luckily this doesn’t affect the Nexus 5x nor the Pixel phones because while in AP mode, only WPA2 auth can be used. However this will probably affect other phones, and routers. When an AP is set up with WEP encryption (something you should never do) during the authentication phase of the 80211 protocol, the access point will send some “challenge text” to the authenticating station. It’s up to the station to use the encryption key to encrypt the challenge text and send it back to the access point. The access point will encrypt the challenge text as well and compare the response from the station. Assuming they match, the station is allowed on the network, and if not then the station must start over.
Now that we got that out of the way, let’s look at some code. When the driver recieves an authentication frame we’ll eventually enter ```limProcessAuthFrame`` in limProcessAuthFrame.c.
The prototype and local functions look like the following:
The thing I really want to point out with that image, is the challenge text array. It is a 128 byte char array, which we will in a bug-free world fill with some data for the station to encrypt.
Assuming we are in AP mode with a WEP key enabled we’ll continue into this function, and eventually get here:
Here we’re supposed to get some random bytes. Instead we don’t do that. Since C doesn’t pre-initalize anything on the stack, what ever valuable stack contents are there during this funciton call will be leaked in an 802.11 frame.
Bug 8 (CVE-???-????) Remote heap overflow
Coming to you sometime in the future – don’t know when.