Could somebody please enlighten me as to what is supposed to happen in this situation?

Discussion:

Trond Myklebust

2014-09-27 15:22:29 UTC

The scenario is this:
Server
======
boot (B1)
Client
======
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
LOCK(reclaim)
RECLAIM_COMPLETE
(lift GRACE period)
reboot (B2)
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
reboot (while GRACE period
still being enforced) (B3)
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)

What should be the server response to the above OPEN(reclaim) from the
client after reboot (B3)?

Cheers
Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jeff Layton

2014-09-27 18:40:55 UTC

Permalink

On Sat, 27 Sep 2014 11:22:29 -0400
Trond Myklebust <trond.myklebust-7I+n7zu2hftEKMMhf/***@public.gmane.org> wrote:

My take (quite possibly wrong, but...)

Post by Trond Myklebust
Server
======
boot (B1)
Client
======
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
LOCK(reclaim)
RECLAIM_COMPLETE
(lift GRACE period)

At this point, we'd deny reclaim from any client that has not issued a
RECLAIM_COMPLETE. In the case of the Linux server with nfsdcltrack, we
clean out any client records that have not issued a RECLAIM_COMPLETE.

Post by Trond Myklebust
reboot (B2)
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
reboot (while GRACE period
still being enforced) (B3)
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
What should be the server response to the above OPEN(reclaim) from the
client after reboot (B3)?

My expectation is that it would be granted. There was a
RECLAIM_COMPLETE issued during the boot where the grace period was last
lifted, and that should be enough to allow the client to issue reclaims
on any subsequent reboot, until the grace period is lifted again.

Doing anything else would be a pretty unfriendly way for the server to
behave. In the face of rapid reboots (a not-uncommon occurrence when
patching, etc), you'd lose state unless the client just happened to get
in there quickly enough to issue a RECLAIM_COMPLETE between each reboot.

That was the situation with the legacy client tracker in knfsd. When
testing, it was trivial to reboot the machine quickly twice and on the
second reboot nothing could be reclaimed.

--
Jeff Layton <jlayton-7I+n7zu2hftEKMMhf/***@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Trond Myklebust

2014-09-27 19:25:12 UTC

Permalink

On Sat, Sep 27, 2014 at 2:40 PM, Jeff Layton

Post by Jeff Layton
On Sat, 27 Sep 2014 11:22:29 -0400
My take (quite possibly wrong, but...)

Post by Trond Myklebust
Server
======
boot (B1)
Client
======
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
LOCK(reclaim)
RECLAIM_COMPLETE
(lift GRACE period)

My expectation is that it would be granted. There was a
RECLAIM_COMPLETE issued during the boot where the grace period was last
lifted, and that should be enough to allow the client to issue reclaims
on any subsequent reboot, until the grace period is lifted again.
Doing anything else would be a pretty unfriendly way for the server to
behave. In the face of rapid reboots (a not-uncommon occurrence when
patching, etc), you'd lose state unless the client just happened to get
in there quickly enough to issue a RECLAIM_COMPLETE between each reboot.
That was the situation with the legacy client tracker in knfsd. When
testing, it was trivial to reboot the machine quickly twice and on the
second reboot nothing could be reclaimed.

So now, what if the following scenario:

Server
======
boot (B1')
Client
======
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
LOCK(reclaim)
RECLAIM_COMPLETE
(lift GRACE period)
reboot (B2')
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
(lift GRACE period)
reboot (B3')
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)

What should happen to the OPEN(reclaim) in (B3')?
--
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust-7I+n7zu2hftEKMMhf/***@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jeff Layton

2014-09-27 19:50:45 UTC

Permalink

On Sat, 27 Sep 2014 15:25:12 -0400

Post by Trond Myklebust
On Sat, Sep 27, 2014 at 2:40 PM, Jeff Layton

Post by Jeff Layton
On Sat, 27 Sep 2014 11:22:29 -0400
My take (quite possibly wrong, but...)

Post by Trond Myklebust
Server
======
boot (B1)
Client
======
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
LOCK(reclaim)
RECLAIM_COMPLETE
(lift GRACE period)

My expectation is that it would be granted. There was a
RECLAIM_COMPLETE issued during the boot where the grace period was last
lifted, and that should be enough to allow the client to issue reclaims
on any subsequent reboot, until the grace period is lifted again.
Doing anything else would be a pretty unfriendly way for the server to
behave. In the face of rapid reboots (a not-uncommon occurrence when
patching, etc), you'd lose state unless the client just happened to get
in there quickly enough to issue a RECLAIM_COMPLETE between each reboot.
That was the situation with the legacy client tracker in knfsd. When
testing, it was trivial to reboot the machine quickly twice and on the
second reboot nothing could be reclaimed.

Server
======
boot (B1')
Client
======
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
LOCK(reclaim)
RECLAIM_COMPLETE
(lift GRACE period (G1))
reboot (B2')
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
(lift GRACE period (G2))
reboot (B3')
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
What should happen to the OPEN(reclaim) in (B3')?

(Let's call the lifting of grace periods 'G1' and 'G2'...)

Denied.

There was no RECLAIM_COMPLETE issued between B2 and G2. It's possible
that client2 could creep in between G2 and B3 and acquire locks that
conflict with ones that were not reclaimed by client1 between B2 and
G2. So, we can't allow any reclaims for client1 after B3.

I should add a clarification here too. I'm assuming that the server in
this case just tracks the minimum required to allow state to be
reclaimed. If it (for instance) tracked on stable storage all of the
locks that it ever granted such that it knows that there were no
conflicts, then it could be more lenient about allowing client1 to
reclaim after B3.

Trond Myklebust

2014-09-27 20:27:15 UTC

Permalink

On Sat, Sep 27, 2014 at 3:50 PM, Jeff Layton

Post by Jeff Layton
On Sat, 27 Sep 2014 15:25:12 -0400

Post by Trond Myklebust
On Sat, Sep 27, 2014 at 2:40 PM, Jeff Layton

Post by Jeff Layton
On Sat, 27 Sep 2014 11:22:29 -0400
My take (quite possibly wrong, but...)

Post by Trond Myklebust
Server
======
boot (B1)
Client
======
EXCHANGE_ID
CREATE_SESSION
OPEN(reclaim)
LOCK(reclaim)
RECLAIM_COMPLETE
(lift GRACE period)

My expectation is that it would be granted. There was a
RECLAIM_COMPLETE issued during the boot where the grace period was last
lifted, and that should be enough to allow the client to issue reclaims
on any subsequent reboot, until the grace period is lifted again.
Doing anything else would be a pretty unfriendly way for the server to
behave. In the face of rapid reboots (a not-uncommon occurrence when
patching, etc), you'd lose state unless the client just happened to get
in there quickly enough to issue a RECLAIM_COMPLETE between each reboot.

Where is the evidence that this is a problem for NFS and for NFS
client recovery?