reynir.dk/posts/2024-08-21-OpenVPN-and-MirageVPN.markdown

241 lines
14 KiB
Markdown
Raw Normal View History

---
title: Discoveries made implementing MirageVPN, a OpenVPN-compatible VPN library
date: 2024-08-21
description: Discoveries made implementing MirageVPN, a OpenVPN-compatible VPN library
---
At [Robur][robur] we have been busy at work implementing our OpenVPN™-compatible MirageVPN software.
Recently we have implemented the [server side][miragevpn-server].
In order to implement this side of the protocol I studied parts of the OpenVPN™ source code and performed experiments to understand what the implementation does at the protocol level.
Studying the OpenVPN™ implementation has lead me to discover two security issues: CVE-2024-28882 and CVE-2024-5594.
In this article I will talk about the relevant parts of the protocol, and describe the security issues in detail.
A VPN establishes a secure tunnel in which (usually) IP packets are sent.
The OpenVPN protocol establishes a TLS tunnel[^openvpn-tls] with which key material and configuration options are negotiated.
Once established the TLS tunnel is used to exchange so-called control channel messages.
They are NUL-terminated (well, more on that later) text messages sent in a single TLS record frame (mostly, more on that later).
I will describe two (groups) of control channel messages (and a bonus control channel message):
* `EXIT`, `RESTART`, and `HALT`
* `PUSH_REQUEST` / `PUSH_REPLY`
* (`AUTH_FAILED`)
The `EXIT`, `RESTART`, and `HALT` messages share similarity.
They are all three used to signal to the client that it should disconnect[^disconnect] from the server.
`HALT` tells the client to disconnect and suggests the client should terminate.
`RESTART` also tells the client to disconnect and suggests the client can reconnect either to the same server or the next server if multiple are configured depending on flags in the message.
`EXIT` tells the *peer* that it is exiting and the *peer* should disconnect.
The last one can be sent by either the server or the client and is useful when the underlying transport is UDP.
It informs the peer that the sender is exiting and will (soon) not be receiving and ACK'ing messages; for UDP the peer would otherwise (re)send messages until a timeout.
Because the underlying transport can either be TCP or UDP the sender may have no guarantees that the message arrives.
OpenVPN's control channel implements a reliable layer with ACKs and retransmissions to work around that.
To accomodate this OpenVPN™ will wait five seconds before disconnecting to allow for retransmission of the exit message.
### The bug
While I was working on implementing more control channel message types I modified a client application that connects to a server and sends pings over the tunnel - instead of ICMPv4 echo requests I modified it to send the `EXIT` control channel message once a second.
In the server logs I saw that the server successfully received the `EXIT` message!
But nothing else happened.
The server just kept receiving `EXIT` messages but for some reason it never disconnected the client.
Curious about this behavior I dived into the OpenVPN™ source code and found that on each `EXIT` message it (re)schedules an exit (disconnect) timer! That is, every time the server receives an `EXIT` message it'll go "OK! I'll shut down this connection in five seconds" forgetting it promised to do so earlier, too.
### Implications
At first this seemed like a relatively benign bug.
What's the worst that could happen if a client says "let's stop in five second! No, five seconds from now! No, five seconds from now!" etc?
Well, it turns out the same timer is used when the server sends an exit message.
Ok, so what?
The client can hold open a resource it *was* authorized to use *longer*.
So we have a somewhat boring potential denial of service attack.
Then I learned more about the management interface.
The management interface is a text protocol to communicate with the OpenVPN server (or client) and query for information or send commands.
One command is the `client-kill` command.
The documentation says to use this command to "[i]mmediately kill a client instance[...]".
In practice it sends an exit message to the client (either a custom one or the default `RESTART`).
I learnt that it shares code paths with the exit control messages to schedule an exit (disconnect)[^kill-immediately].
That is, `client-kill` schedules the same five second timer.
Thus a malicious client can, instead of exiting on receiving an exit or `RESTART` message, send back repeatedly `EXIT` to the server to reset the five second timer.
This way the client can indefinitely delay the exit/disconnect assuming sufficiently stable and responsive network.
This is suddenly not so good.
The application using the management interface might be enforcing a security policy which we can now circumvent!
The client might be a former employee in a company, and the security team might want to revoke access to the internal network for the former employee, and in that process uses `client-kill` to kick off all of his connecting clients.
The former employee, if prepared, can circumvent this by sending back `EXIT` messages repeatedly and thus keep unauthorized access.
Or a commercial VPN service may try to enforce a data transfer limit with the same mechanism which is then rather easily circumvented by sending `EXIT` messages.
Does anyone use the management interface in this way?
I don't know.
If you do or are aware of software that enforces policies this way please do reach out to [me][contact].
It would be interesting to hear and discuss.
The OpenVPN security@ mailing list took it seriously enough to assign it CVE-2024-28882.
## OpenVPN configuration language
Next up we have `PUSH_REQUEST` / `PUSH_REPLY`.
As the names suggest it's a request/response protocol.
It is used to communicate configuration options from the server to the client.
These options include routes, ip address configuration, negotiated cryptographic algorithms.
The client signals it would like to receive configuration options from the server by sending the `PUSH_REQUEST` control channel message[^proto-push-request].
The server then sends a `PUSH_REPLY` message.
The format of a `PUSH_REPLY` message is `PUSH_REPLY,` followed by a comma separated list of OpenVPN configuration directives terminated by a NUL byte as in other control channel messages.
Note that this means pushed configuration directives cannot contain commas.
When implementing the `push` server configuration directive, which tells the server to send the parameter of `push` as a configuration option to the client in the `PUSH_REPLY`, I studied how exactly OpenVPN™ parses configuration options.
I learned some quirks of the configuration language which I find surprising and somewhat hard to implement.
I will not cover all corners of the configuration language.
In some sense you could say the configuration language of OpenVPN™ is line based.
At least, the first step to parsing configuration directives as OpenVPN 2.X does is to read one line at a time and parse it as one configuration directive[^inline-files].
A line is whatever `fgets()` says it is - this includes the newline if not at the end of the file[^configuration-newlines].
This is how it is for configuration files.
However, if it is a `PUSH_REPLY` a *"line"* is the text string up to a comma or the end of file (or, importantly, a NUL byte).
This "line" tokenization is done by repeatedly calling OpenVPN™'s `buf_parse(buf, ',', line, sizeof(line))` function.
```C
/* file: src/openvpn/buffer.c */
bool
buf_parse(struct buffer *buf, const int delim, char *line, const int size)
{
bool eol = false;
int n = 0;
int c;
ASSERT(size > 0);
do
{
c = buf_read_u8(buf);
if (c < 0)
{
eol = true;
}
if (c <= 0 || c == delim)
{
c = 0;
}
if (n >= size)
{
break;
}
line[n++] = c;
}
while (c);
line[size-1] = '\0';
return !(eol && !strlen(line));
}
```
`buf_parse()` takes a `struct buffer*` which is a pointer to a byte array with a offset and length field, a delimiter character (in our case `','`), a destination buffer `line` and its length `size`.
It calls `buf_read_u8()` which returns the first character in the buffer and advances the offset and decrements the length, or returns `-1` if the buffer is empty.
In essence, `buf_parse()` "reads" from the buffer and copies over to `line` until it encounters `delim`, a NUL byte or the end of the buffer.
In that case a NUL byte is written to `line`.
What is interesting is that a NUL byte is effectively considered a delimiter, too, and that it is consumed by `buf_parse()`.
Next, let's look at how incoming control channel messages are handled (modified for brevity):
```C
/* file: src/openvpn/forward.c (before fix) */
/*
* Handle incoming configuration
* messages on the control channel.
*/
static void
check_incoming_control_channel(struct context *c, struct buffer buf)
{
/* force null termination of message */
buf_null_terminate(&buf);
/* enforce character class restrictions */
string_mod(BSTR(&buf), CC_PRINT, CC_CRLF, 0);
if (buf_string_match_head_str(&buf, "AUTH_FAILED"))
{
receive_auth_failed(c, &buf);
}
else if (buf_string_match_head_str(&buf, "PUSH_"))
{
incoming_push_message(c, &buf);
}
/* SNIP */
}
```
First, the buffer is ensured to be NUL terminated by replacing the last byte with a NUL byte.
This is already somewhat questionable as it could make an otherwise invalid message valid.
Next, character class restrictions are "enforced".
What this roughly does is removing non-printable characters and carriage returns and line feeds from the C string.
The macro `BSTR()` returns the underlying buffer behind the `struct buffer` with the offset added.
Notably, `string_mod()` works on (NUL terminated) C strings and not `struct buffer`s.
As an example, the string (with the usual C escape sequences):
"PUSH_REPLY,line \nfeeds\n,are\n,removed\n\000"
becomes
"PUSH_REPLY,line feeds,are,removed\000ed\n\000"
As you can see, if interpreted as a C string we have removed the line feeds.
But what is this at the end?
It is the same last 4 bytes from the original string.
More generally, it is the last N bytes from the original string if the original string has N line feeds (or other disallowed characters).
The whole buffer is still passed to the push reply parsing.
Remember that the "line" parser will not only consume commas as the line delimiter, but also NUL bytes!
This means the configuration directives are parsed as lines:
```C
"line feeds"
"are"
"removed"
"ed\n"
```
With this technique we can now inject (almost; the exception is NUL) arbitrary bytes as configuration directive lines.
This is bad because the configuration directive is printed to the console if it doesn't parse.
As a proof of concept I sent a `PUSH_REPLY` with an embedded BEL character, and the OpenVPN™ client prints to console (abbreviated):
Unrecognized option or missing or extra parameter(s): ^G
The `^G` is how the BEL character is printed in my terminal.
I was also able to hear an audible bell.
A more thorough explanation on how terminal escape sequences can be exploited can be found on [G-Reasearch's blog](https://www.gresearch.com/news/g-research-the-terminal-escapes/).
### The fix
The fix also is also a first step towards decoupling the control channel messaging from the TLS record frames.
First, the data is split on NUL bytes in order to get the control channel message(s), and then messages are rejected if they contain illegal characters.
This solves the vulnerability described previously.
Unfortunately, it turns out that especially for the `AUTH_FAILED` control channel message it is easy to create invalid messages:
If 2FA is implemented using the script mechanism sending custom messages they easily end with a newline asking the client to enter the verification code.
I believe in 2.6.12 the client tolerates trailing newline characters.
2024-08-22 08:05:28 +00:00
## Conclusion
The first bug, the timer rescheduling bug, is at least 20 years old!
It hasn't always been exploitable, but the bug itself goes back as far as the git history does.
I haven't attempted further software archeology to find the exact time of introduction.
Either way, it's old and gone unnoticed for quite a while.
I think this shows that diversity in implementations is a great way to exercise corner cases, push forward (protocol) documentation efforts and get thorough code review by motivated peers.
This work was funded by [the EU NGI Assure Fund through NLnet](https://nlnet.nl/project/MirageVPN/).
In my opinion, this shows that funding one open source project can have a positive impact on other open source projects, too.
<!-- XXX: contact assumes reynir.dk -->
[robur]: https://robur.coop/
[miragevpn-server]: https://blog.robur.coop/articles/miragevpn-server.html
[contact]: /contact.html
[^openvpn-tls]: This is not always the case. It is possible to use static shared secret keys, but it is mostly considered deprecated.
[^disconnect]: I say "disconnect" even when the underlying transport is the connection-less UDP.
[^kill-immediately]: As the alert reader might have realized this is inaccurate. It does not kill the client "immediately" as it will wait five seconds after the exit message is sent before exiting. At best this will kill a cooperating client once it's received the kill message.
[^proto-push-request]: There is another mechanism to request a `PUSH_REPLY` earlier with less roundtrips, but let's ignore that for now. The exact message is `PUSH_REQUEST<NUL-BYTE>` as messages need to be NUL-terminated.
[^inline-files]: An exception being inline files which can span multiple lines. They vaguely resemble XML tags with an open `<tag>` and close `</tag>` each on their own line with the data in between. I doubt these are sent in `PUSH_REPLY`s, but I can't rule out without diving into the source code that it isn't possible to send inline files.
[^configuration-newlines]: This results in the quirk that it is possible to sort-of escape a newline in a configuration directive. But since the line splitting is done *first* it's not possible to continue the directive on the next line! I believe this is mostly useless, but it is a way to inject line feeds in configuration options without modifying the OpenVPN source code.