rpc.mountd can be blocked by a bad client

b***@public.gmane.org

2014-10-09 13:42:28 UTC

-----Original Message-----
From: Strösser, Bodo
Sent: Thursday, September 25, 2014 12:22 PM
To: 'NeilBrown'
Subject: RE: rpc.mountd can be blocked by a bad client

-----Original Message-----
Sent: Thursday, September 25, 2014 2:32 AM
To: Strösser, Bodo
Subject: Re: rpc.mountd can be blocked by a bad client
On Wed, 24 Sep 2014 12:57:09 +0200 "Strösser, Bodo"

Hello,
a few days ago we had some trouble with a NFS server. The clients
most of the time no longer could mount any shares, but in rare
cases they had success.
We found out, that during the times when mounts failed, rpc.mountd
hung on a write() to a TCP socket. netstat showed, that Send-Q was
full and Recv-Q counted up slowly. After a long time the write
ended with an error ("TCP timeout" IIRC) and rpc.mountd worked
normally for a short while until it again hung on write() for the
same reason.
The problem was caused by a MTU size configured wrong. So, one
single bad client (or as much clients as the number of threads used
by rpc.mountd) can block rpc.mountd entirely.
But what will happen, if someone intentionally sends RPC requests,
but doesn't read() the answers? I wrote a small tool to test this
situation. It fires DUMP requests to rpc.mountd as fast as
possible, but does not read from the socket. The result is the
same as with the problem above: rpc.mountd hangs in write() and no
longer responds to other requests while no TCP timeout breaks up
this situation.
So it's quite easy to intentionally block rpc.mountd from remote.

That's rather nasty.
We could possibly set the socket to be non-blocking, or we could set an alarm
just before handling a request.
Probably rpc_dispatch() in support/nfs/rpcdispatch.c would be the best place
to put the timeout.
catch SIGALRM (don't set SA_RESTART)
alarm(10);
call svc_sendreply
alarm(0);

I also thought about changing the socket to non-blocking. But I'm not sure: is it
possible to have such big RPC replies, that they don't fit into the socket
buffer? If so, write() would put the first part into the buffer and a second
write for the rest would fail, as probably the first part isn't acked yet, right?
So, non-blocking needs to be combined with a handling of buffer-full situations,
I guess. Such a handling together with a timeout for starving connections would
be a clean solution.
To do that, one would have to replace the tcp write routine of the rpc library.
That means to change the xdrs's pointer to the write function. I don't know,
whether that can be done in a portable way, which works at the different platforms.
About setting a alarm timeout: I'm not sure, that rpc_dispatch() is the right
place for it. mountd uses mount_dispatch() which has an exit via svcerr_auth(),
that again sends a reply. So the timeout you suggest should be inserted in
mount_dispatch(), I think.
OTOH, a timeout will shorten the hang, but bad clients can still slow down mountd
extremely.
BTW: AFAICS on Linux with libtirpc, using the control SVCGET_CONNMAXREC, the socket
indirectly can set to non-blocking. That seems to result in write_vc() doing a max.
2 second loop of write() until it gives up.

Meanwhile I've found some time to do further investigations.

rpcbind uses the above mentioned rpc_control(SVCSET_CONNMAXREC) to switch
to nonblocking mode of libtirpc. So I tested a similar attack to rpcbind.
The nonblocking mode shows two positive effects:
- an attacker sending requests as fast as possible to rpcbind will have no
success. As soon as rpcbind/libtirpc finds more than one request readable
at the socket, it closes the connection.
- if the socket buffer is full, the write() fail with -EAGAIN. libtirpc
uses a loop to retry the write for max. 2 seconds. Then it closes the
connection.

Unfortunately the write retry loop in libtirpc has a bug. It increments
the length of and decrements the pointer to the retry buffer on each failed
write().
I've sent a patch to libtirpc-devel about 10 days ago, but didn't get a
response yet.

Regarding rpc.mountd, I've found, that using multiple processes (e.g. -t 4)
doesn't work well. When using libtirpc or when not using libtirpc but setting
-p xxxx option, the listening sockets (tcp listener and udp socket) are not
in non-blocking mode. Thus, if a single connection request comes in, all
threads wake up from the select(), but only one accept() succeeds. All other
threads will wait in accept() for further connection requests.
If a RPC-request comes in via UDP, what happens is very similar: all threads
wake up, one thread handles the request, all others wait in read() for
further UDP requests.
As TCP connections are assigned to specific threads, all connections handled
by one thread will be block as long as the thread waits in accept() or read().
Thus, I've written two patches (see below), that set all listeners to
non-blocking in support/nfs/*

The third patch below inserts rpc_control(SVCSET_CONNMAXREC) into
nfs_svc_create()s in support/nfs/svc_create.c for the case of libtirpc.
That patch hardens rpc.mount against DOS attacks (and probably also statd,
as it also uses nfs_svc_create()).

The patches below are for nfs-util-1.3.1, but this version is untested!
(Couldn't build because of dependencies and now I'm running out of time)

My version of the patches for nfs-util-1.2.3-18.33.1 is tested on SLES11-SP3.
Please see the third patch as a RFC only. I'm not sure, whether setting
MAXREC might have negative side effects as I'm not a RPC expert.

Bodo

One other point: AFAICS on Linux with libtirpc the listening socket of mountd is
in blocking mode. Would that be a problem when running multiple "threads"?
The comment in svc_socket.c/svc_socket(), where the listening socket is set to
non-blocking, sounds very reasonable. But AFAICS if libtirpc is used, O_NONBLOCK
currently isn't set.
Bodo Stroesser

if the alarm fires while svc_sendreply is writing to the socket it should get
an error and close the connection.
This would only fix mountd (as it is the only process to use rpc_dispatch).
Is a similar thing needed for statd I wonder?? It isn't so important.
NeilBrown

Please CC me, I'm not on the list.
Best regards,
Bodo

---------------------------------------

From: Bodo Stroesser <***@ts.fujitsu.com>
Date: Thu, 09 Oct 2014 13:06:19 +0200
Subject: [PATCH] nfs-util: mountd: set nonblocking mode if no libtirpc

If mountd is built without libtirpc and it is started using "-p XXX" option,
the tcp listeners and the sockets waiting for UDP messages are not in
non-blocking mode. Thus if running with multiple threads (-t XX),
all threads will wake up from select on a connection request or a UDP message,
but only one thread will succeed. All others will wait on accept() or read()
for the next event.

Signed-off-by: Bodo Stroesser <***@ts.fujitsu.com>
---

--- nfs-utils-1.3.1/support/include/nfslib.h 2014-10-09 12:52:30.000000000 +0200
+++ nfs-utils-1.3.1/support/include/nfslib.h 2014-10-09 12:53:37.000000000 +0200
@@ -174,6 +174,7 @@ void closeall(int min);

int svctcp_socket (u_long __number, int __reuse);
int svcudp_socket (u_long __number);
+int svcsock_nonblock (int __sock);

/* Misc shared code prototypes */
size_t strlcat(char *, const char *, size_t);
--- nfs-utils-1.3.1/support/nfs/svc_socket.c 2014-10-09 12:56:14.000000000 +0200
+++ nfs-utils-1.3.1/support/nfs/svc_socket.c 2014-10-09 13:10:44.000000000 +0200
@@ -76,6 +76,39 @@ int getservport(u_long number, const cha
return 0;
}

+int
+svcsock_nonblock(int sock)
+{
+ int flags;
+
+ if (sock < 0)
+ return sock;
+
+ /* This socket might be shared among multiple processes
+ * if mountd is run multi-threaded. So it is safest to
+ * make it non-blocking, else all threads might wake
+ * one will get the data, and the others will block
+ * indefinitely.
+ * In all cases, transaction on this socket are atomic
+ * (accept for TCP, packet-read and packet-write for UDP)
+ * so O_NONBLOCK will not confuse unprepared code causing
+ * it to corrupt messages.
+ * It generally safest to have O_NONBLOCK when doing an accept
+ * as if we get a RST after the SYN and before accept runs,
+ * we can block despite being told there was an acceptable
+ * connection.
+ */
+ if ((flags = fcntl(sock, F_GETFL)) < 0)
+ perror(_("svc_socket: can't get socket flags"));
+ else if (fcntl(sock, F_SETFL, flags|O_NONBLOCK) < 0)
+ perror(_("svc_socket: can't set socket flags"));
+ else
+ return sock;
+
+ (void) __close(sock);
+ return -1;
+}
+
static int
svc_socket (u_long number, int type, int protocol, int reuse)
{
@@ -113,38 +146,7 @@ svc_socket (u_long number, int type, int
sock = -1;
}

- if (sock >= 0)
- {
- /* This socket might be shared among multiple processes
- * if mountd is run multi-threaded. So it is safest to
- * make it non-blocking, else all threads might wake
- * one will get the data, and the others will block
- * indefinitely.
- * In all cases, transaction on this socket are atomic
- * (accept for TCP, packet-read and packet-write for UDP)
- * so O_NONBLOCK will not confuse unprepared code causing
- * it to corrupt messages.
- * It generally safest to have O_NONBLOCK when doing an accept
- * as if we get a RST after the SYN and before accept runs,
- * we can block despite being told there was an acceptable
- * connection.
- */
- int flags;
- if ((flags = fcntl(sock, F_GETFL)) < 0)
- {
- perror (_("svc_socket: can't get socket flags"));
- (void) __close (sock);
- sock = -1;
- }
- else if (fcntl(sock, F_SETFL, flags|O_NONBLOCK) < 0)
- {
- perror (_("svc_socket: can't set socket flags"));
- (void) __close (sock);
- sock = -1;
- }
- }
-
- return sock;
+ return svcsock_nonblock(sock);
}

/*
--- nfs-utils-1.3.1/support/nfs/rpcmisc.c 2014-10-08 21:22:04.000000000 +0200
+++ nfs-utils-1.3.1/support/nfs/rpcmisc.c 2014-10-08 21:22:36.000000000 +0200
@@ -104,7 +104,7 @@ makesock(int port, int proto)
return -1;
}

- return sock;
+ return svcsock_nonblock(sock);
}

void

--------------------------------------------------

From: Bodo Stroesser <***@ts.fujitsu.com>
Date: Thu, 09 Oct 2014 13:07:33 +0200
Subject: [PATCH] nfs-util: mountd: set nonblocking mode with libtirpc

If mountd is built with libtirpc the tcp listeners and the sockets
waiting for UDP messages are not in non-blocking mode. Thus if running
with multiple threads (-t XX), all threads will wake up from select on
a connection request or a UDP message, but only one thread will succeed.
All others will wait on accept() or read() for the next event.

Signed-off-by: Bodo Stroesser <***@ts.fujitsu.com>
---

--- nfs-utils-1.2.3/support/nfs/svc_create.c 2014-10-08 21:39:01.000000000 +0200
+++ nfs-utils-1.2.3/support/nfs/svc_create.c 2014-10-08 22:20:02.000000000 +0200
@@ -277,6 +277,12 @@
"(%s, %u, %s)", name, version, nconf->nc_netid);
return 0;
}
+ if (svcsock_nonblock(xprt->xp_fd) < 0) {
+ /* close() already done by svcsock_nonblock() */
+ xprt->xp_fd = RPC_ANYFD;
+ SVC_DESTROY(xprt);
+ return 0;
+ }

if (!svc_reg(xprt, program, version, dispatch, nconf)) {
/* svc_reg(3) destroys @xprt in this case */
@@ -332,6 +338,7 @@
int fd;

fd = svc_create_sock(ai->ai_addr, ai->ai_addrlen, nconf);
+ fd = svcsock_nonblock(fd);
if (fd == -1)
goto out_free;

--------------------------------------------------

From: Bodo Stroesser <***@ts.fujitsu.com>
Date: Thu, 09 Oct 2014 13:06:19 +0200
Subject: [PATCH] nfs-util: mountd: set libtirpc nonblocking mode to avoid DOS

This patch is experimental. In works fine in that it removes the vulnerability
against a DOS attack. rpc.mountd can be blocked by a bad client, that sends
many RPC requests by never reads the responses. This might happen intentionally
or caused by a wrong network config (MTU).
The patch switches on the nonblocking mode of libtirpc. In that mode writes can
block for a max. of 2 seconds. Attacker are forced to send requests slower, as
libtirpc will close a connection if it finds two requests to read at the same
time.
I do not know, whether setting MAXREC could cause trouble e.g. with big replies.

Signed-off-by: Bodo Stroesser <***@ts.fujitsu.com>
---

--- nfs-utils-1.2.3/support/nfs/svc_create.c 2014-10-09 12:09:15.000000000 +0200
+++ nfs-utils-1.2.3/support/nfs/svc_create.c 2014-10-09 12:13:32.000000000 +0200
@@ -49,6 +49,8 @@

#ifdef HAVE_LIBTIRPC

+#include <rpc/rpc_com.h>
+
#define SVC_CREATE_XPRT_CACHE_SIZE (8)
static SVCXPRT *svc_create_xprt_cache[SVC_CREATE_XPRT_CACHE_SIZE] = { NULL, };

@@ -401,6 +403,7 @@
const struct sigaction create_sigaction = {
.sa_handler = SIG_IGN,
};
+ int maxrec = RPC_MAXDATASIZE;
unsigned int visible, up, servport;
struct netconfig *nconf;
void *handlep;
@@ -412,6 +415,20 @@
*/
(void)sigaction(SIGPIPE, &create_sigaction, NULL);

+ /*
+ * Setting MAXREC also enables non-blocking mode for tcp connections.
+ * This avoids DOS attacks by a client sending many requests but never
+ * reading the reply:
+ * - if a second request already is present for reading in the socket,
+ * after the first request just was read, libtirpc will break the
+ * connection. Thus an attacker can't simply send requests as fast as
+ * he can without waiting for the response.
+ * - if the write buffer of the socket is full, the next write() will
+ * fail with EAGAIN. libtirpc will retry the write in a loop for max.
+ * 2 seconds. If write still fails, the connection will be closed.
+ */
+ rpc_control(RPC_SVC_CONNMAXREC_SET, &maxrec);
+
handlep = setnetconfig();
if (handlep == NULL) {
xlog(L_ERROR, "Failed to access local netconfig database: %s",
��{.n�+��+%��lzwm��b�맲��r��zX��߲)��w*jg��ݢj.�۰\��M��gj��a��' ��ޢ�