mps reinitialization triggered system panic

prateek sethi prateekrootkey at gmail.com
Sun May 29 09:52:44 UTC 2016


I have fix this issue by doing multiple retries for mps request sync in
case of failure. According to my test results almost 15% changes are there
to hit this issue without using this fix. Which is certainly not a good
ratio.

Please comment if anything is wrong in this.

Index: mps.c
===================================================================
--- mps.c       (revision 2474)
+++ mps.c       (working copy)

@@ -986,8 +985,8 @@
 {
        MPI2_DEFAULT_REPLY *reply;
        MPI2_IOC_FACTS_REQUEST request;
-       int error, req_sz, reply_sz;
-
+       int error, req_sz, reply_sz, retry = 0;
+
        MPS_FUNCTRACE(sc);

        req_sz = sizeof(MPI2_IOC_FACTS_REQUEST);
@@ -996,8 +995,13 @@

        bzero(&request, req_sz);
        request.Function = MPI2_FUNCTION_IOC_FACTS;
-       error = mps_request_sync(sc, &request, reply, req_sz, reply_sz, 5);
-
+       while(retry < 5){
+               error = mps_request_sync(sc, &request, reply, req_sz,
reply_sz, 5);
+               if(!error)
+                       break;
+               mps_dprint(sc, mps_request_sync is failed retry
%d\n",retry);
+                retry++;
+       }
        return (error);
 }

@@ -1006,7 +1010,7 @@
 {
        MPI2_IOC_INIT_REQUEST   init;
        MPI2_DEFAULT_REPLY      reply;
-       int req_sz, reply_sz, error;
+       int req_sz, reply_sz, error, retry = 0;
        struct timeval now;
        uint64_t time_in_msec;

@@ -1041,11 +1045,16 @@
        time_in_msec = (now.tv_sec * 1000 + now.tv_usec/1000);
        init.TimeStamp.High = htole32((time_in_msec >> 32) & 0xFFFFFFFF);
        init.TimeStamp.Low = htole32(time_in_msec & 0xFFFFFFFF);
+       while(retry < 5){
+               error = mps_request_sync(sc, &init, &reply, req_sz,
reply_sz, 5);
+               if ((reply.IOCStatus & MPI2_IOCSTATUS_MASK) !=
MPI2_IOCSTATUS_SUCCESS)
+                       error = ENXIO;
+                if(!error)
+                        break;
+                mps_dprint(sc, MPS_FAULT, " mps_request_sync is failed
retry %d\n",retry);
+                retry++;
+        }

-       error = mps_request_sync(sc, &init, &reply, req_sz, reply_sz, 5);
-       if ((reply.IOCStatus & MPI2_IOCSTATUS_MASK) !=
MPI2_IOCSTATUS_SUCCESS)
-               error = ENXIO;
-
        mps_dprint(sc, MPS_INIT, "IOCInit status= 0x%x\n", reply.IOCStatus);
        return (error);
 }


On Tue, Apr 12, 2016 at 6:43 PM, prateek sethi <prateekrootkey at gmail.com>
wrote:

> I am using LSI SAS2308 HBA card with freebsd 9.2. My system got panic
> during mps reinitialization process. I tried to debug the core and found
> that driver was failed to allocate iocfacts, triggered this panic. I also
> got "*Doorbell failed to activate*" message in the core file.
>
> Please help me to find the answers of some questions like:-
>
> 1. What can be the reasons for Doorbell activation failure?
> 2. How I can fix this issue?
> 3. If it is a H/w issue, how it can be fixed after reboot?
> 4. *Why a driver failure should panic a system?*
>
>
> I think panic is not a good option to get recover from some error. There
> should be some another way to handle this failure.
>
> I am a beginner for the drivers and if I am predicting or asking something
> wrong please correct me.
>
>
> Regards
> Prateek
>


More information about the freebsd-drivers mailing list