Olga Kornievskaia
2014-10-13 18:51:56 UTC
I'd like to hear community's thought about how to properly handle
failures during delegation recall process and/or thoughts about a
proposed fixed listed at the end.
There are two problems seen during the following situation:
A client get a cb_call for a delegation it currently holds. Consider
the case where the client has a delegated lock for this open. Callback
thread will send an open with delegation_cur, followed by a lock
operation, and finally delegreturn.
Problem#1: the client will send a lock operation regardless of whether
or not the open succeeded. This is a new_owner lock and in nfs4xdr.c,
the lock operation will choose to use the open_stateid. However, when
the open failed, the stateid is 0. Thus, we send an erroneous stateid
of 0.
Problem#2: if the open fails with admin_revoked, bad_stateid errors,
it leads to an infinite loop of sending an open with deleg_cur and
getting a bad_stateid error back.
It seems to me that we shouldn't even be trying to recover if we get a
bad_stateid-type of errors on open with deleg_cur because they are
unrecoverable.
Furthermore, I propose that if we get an error in
nfs4_open_delegation_recall() then we mark any delegation locks as
lost and in nfs4_lock_delegation_recall() don't attempt to recover
lost lock.
I have tested to following code as a fix:
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 5aa55c1..523fae0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1682,6 +1682,22 @@ int nfs4_open_delegation_recall(struct
nfs_open_context *ctx, struct nfs4_state
nfs4_stateid_copy(&opendata->o_arg.u.delegation, stateid);
err = nfs4_open_recover(opendata, state);
nfs4_opendata_put(opendata);
+ switch(err) {
+ case -NFS4ERR_DELEG_REVOKED:
+ case -NFS4ERR_ADMIN_REVOKED:
+ case -NFS4ERR_BAD_STATEID:
+ case -NFS4ERR_OPENMODE: {
+ struct nfs4_lock_state *lock;
+ /* go through open locks and mark them lost */
+ spin_lock(&state->state_lock);
+ list_for_each_entry(lock, &state->lock_states,
ls_locks) {
+ if (!test_bit(NFS_LOCK_INITIALIZED,
&lock->ls_flags))
+ set_bit(NFS_LOCK_LOST, &lock->ls_flags);
+ }
+ spin_unlock(&state->state_lock);
+ return 0;
+ }
+ }
return nfs4_handle_delegation_recall_error(server, state, stateid, err);
}
@@ -5957,6 +5973,10 @@ int nfs4_lock_delegation_recall(struct
file_lock *fl, struct nfs4_state *state,
err = nfs4_set_lock_state(state, fl);
if (err != 0)
return err;
+ if (test_bit(NFS_LOCK_LOST, &fl->fl_u.nfs4_fl.owner->ls_flags)) {
+ pr_warn_ratelimited("NFS: %s: Lock reclaim failed!\n",
__func__);
+ return -EIO;
+ }
err = _nfs4_do_setlk(state, F_SETLK, fl, NFS_LOCK_NEW);
return nfs4_handle_delegation_recall_error(server, state, stateid, err);
}
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
failures during delegation recall process and/or thoughts about a
proposed fixed listed at the end.
There are two problems seen during the following situation:
A client get a cb_call for a delegation it currently holds. Consider
the case where the client has a delegated lock for this open. Callback
thread will send an open with delegation_cur, followed by a lock
operation, and finally delegreturn.
Problem#1: the client will send a lock operation regardless of whether
or not the open succeeded. This is a new_owner lock and in nfs4xdr.c,
the lock operation will choose to use the open_stateid. However, when
the open failed, the stateid is 0. Thus, we send an erroneous stateid
of 0.
Problem#2: if the open fails with admin_revoked, bad_stateid errors,
it leads to an infinite loop of sending an open with deleg_cur and
getting a bad_stateid error back.
It seems to me that we shouldn't even be trying to recover if we get a
bad_stateid-type of errors on open with deleg_cur because they are
unrecoverable.
Furthermore, I propose that if we get an error in
nfs4_open_delegation_recall() then we mark any delegation locks as
lost and in nfs4_lock_delegation_recall() don't attempt to recover
lost lock.
I have tested to following code as a fix:
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 5aa55c1..523fae0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1682,6 +1682,22 @@ int nfs4_open_delegation_recall(struct
nfs_open_context *ctx, struct nfs4_state
nfs4_stateid_copy(&opendata->o_arg.u.delegation, stateid);
err = nfs4_open_recover(opendata, state);
nfs4_opendata_put(opendata);
+ switch(err) {
+ case -NFS4ERR_DELEG_REVOKED:
+ case -NFS4ERR_ADMIN_REVOKED:
+ case -NFS4ERR_BAD_STATEID:
+ case -NFS4ERR_OPENMODE: {
+ struct nfs4_lock_state *lock;
+ /* go through open locks and mark them lost */
+ spin_lock(&state->state_lock);
+ list_for_each_entry(lock, &state->lock_states,
ls_locks) {
+ if (!test_bit(NFS_LOCK_INITIALIZED,
&lock->ls_flags))
+ set_bit(NFS_LOCK_LOST, &lock->ls_flags);
+ }
+ spin_unlock(&state->state_lock);
+ return 0;
+ }
+ }
return nfs4_handle_delegation_recall_error(server, state, stateid, err);
}
@@ -5957,6 +5973,10 @@ int nfs4_lock_delegation_recall(struct
file_lock *fl, struct nfs4_state *state,
err = nfs4_set_lock_state(state, fl);
if (err != 0)
return err;
+ if (test_bit(NFS_LOCK_LOST, &fl->fl_u.nfs4_fl.owner->ls_flags)) {
+ pr_warn_ratelimited("NFS: %s: Lock reclaim failed!\n",
__func__);
+ return -EIO;
+ }
err = _nfs4_do_setlk(state, F_SETLK, fl, NFS_LOCK_NEW);
return nfs4_handle_delegation_recall_error(server, state, stateid, err);
}
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html