1 SCSI RDMA Protocol (SRP) Target driver for Linux
2 =================================================
4 The SRP target driver has been designed to work on top of the Linux
5 InfiniBand kernel drivers -- either the InfiniBand drivers included
6 with a Linux distribution of the OFED InfiniBand drivers. For more
7 information about using the SRP target driver in combination with
8 OFED, see also README.ofed.
10 The SRP target driver has been implemented as an SCST driver. This
11 makes it possible to support a lot of I/O modes on real and virtual
12 devices. A few examples of supported device handlers are:
14 1. scst_disk. This device handler implements transparent pass-through
15 of SCSI commands and allows SRP to access and to export real
16 SCSI devices, i.e. disks, hardware RAID volumes, tape libraries
19 2. scst_vdisk, either in fileio or in blockio mode. This device handler
20 allows to export software RAID volumes, LVM volumes, IDE disks, and
21 normal files as SRP LUNs.
23 3. nullio. The nullio device handler allows to measure the performance
24 of the SRP target implementation without performing any actual I/O.
30 Proceed as follows to compile and install the SRP target driver:
32 1. To minimize QUEUE_FULL conditions, apply the
33 scst_increase_max_tgt_cmds patch as follows:
36 patch -p0 < srpt/patches/scst_increase_max_tgt_cmds.patch
38 This patch increases SCST's per-device queue size from 48 to 64. This
39 helps to avoid QUEUE_FULL conditions because the size of the transmit
40 queue in Linux' SRP initiator is also 64.
42 Note: the SCSI layer of kernel 2.6.33 will have dynamic queue depth
43 adjustment. When using SRP initiator systems with kernel 2.6.33 or later,
44 this patch is less important.
46 2. Now compile and install SRPT:
49 make -s scst_clean scst scst_install
50 make -s srpt_clean srpt srpt_install
51 make -s scstadm scstadm_install
53 3. Edit the installed file /etc/init.d/scst and add ib_srpt to the
54 SCST_MODULES variable.
56 4. Configure SCST such that it will be started during system boot:
60 The ib_srpt kernel module supports the following parameters:
61 * srp_max_message_size (unsigned integer)
62 Maximum size of an SRP control message in bytes. Examples of SRP control
63 messages are: login request, logout request, data transfer request, ...
64 The larger this parameter, the more scatter/gather list elements can be
65 sent at once. Use the following formula to compute an appropriate value
66 for this parameter: 68 + 16 * (max_sg_elem_count). The default value of
67 this parameter is 2116, which corresponds to an sg list with 128 elements.
68 * srp_max_rdma_size (unsigned integer)
69 Maximum number of bytes that may be transferred at once via RDMA. Defaults
70 to 65536 bytes, which is sufficient to use the full bandwidth of low-latency
71 HCA's such as Mellanox' ConnectX series. Increasing this value may decrease
72 latency for applications transferring large amounts of data at once via
75 Whether incoming SRP requests will be processed in the IB interrupt that
76 was triggered by the request (thread=0) or on the context of a separate
77 thread (thread=1). The choice thread=0 results in the best performance,
78 while thread=1 makes debugging easier. If a kernel oops is triggered inside
79 an interrupt handler the system will be halted. As a result the call trace
80 associated with the kernel oops will not be written to the kernel log in
81 /var/log/messages. When using thread=1 however, the SRPT code runs in thread
82 context. Any kernel oops generated in thread context will cause the offending
83 thread to be killed. Other threads will keep running and call traces will be
84 written to the on-disk kernel log.
85 * trace_flag (unsigned integer, only available in debug builds)
86 The individual bits of the trace_flag parameter define which categories of
87 trace messages should be sent to the kernel log and which ones not.
90 Configuring the SRP Target System
91 ---------------------------------
93 First of all, create the file /etc/scst.conf. Below you can find an
94 example of how you can create this file using the scstadmin tool:
97 /etc/init.d/scst start
99 scstadmin -ClearConfig /etc/scst.conf
100 scstadmin -adddev disk01 -path /dev/ram0 -handler vdisk -options NV_CACHE
101 scstadmin -adddev disk02 -path /dev/ram1 -handler vdisk -options NV_CACHE
102 scstadmin -assigndev disk01 -group Default -lun 0
103 scstadmin -assigndev disk02 -group Default -lun 1
104 scstadmin -assigndev 4:0:0:0 -group Default -lun 2
105 scstadmin -WriteConfig /etc/scst.conf
108 Now load the new configuration:
110 /etc/init.d/scst reload
113 Configuring the SRP Initiator System
114 ------------------------------------
116 First of all, load the SRP kernel module as follows:
120 Next, discover the new SRP target by running the ibsrpdm command:
124 Now let the initiator system log in to the target system:
126 ibsrpdm -c | while read target_info; do echo "${target_info}" > /sys/class/infiniband_srp/${SRP_HCA_NAME}/add_target; done
128 Finally run lsscsi to display the details of the newly discovered SCSI disks:
132 SRP targets can be recognized in the output of lsscsi by looking for
133 the disk names assigned on the SCST target ("disk01" in the example below):
135 [8:0:0:0] disk SCST_FIO disk01 102 /dev/sdb
138 * You can edit /etc/infiniband/openib.conf to load srp driver and srp HA daemon
139 automatically ie. set SRP_LOAD=yes, and SRPHA_ENABLE=yes
140 * To set up and use high availability feature you need dm-multipath driver
142 * Please refer to the OFED-1.x user manual for more in-detail instructions
143 on how to enable and how to use the HA feature. See e.g. http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_user_manual_1_40_1.pdf.
146 Performance Notes - Initiator Side
147 ----------------------------------
149 * For latency sensitive applications, using the noop scheduler at the initiator
150 side can give significantly better results than with other schedulers.
152 * The following parameters have a small but measureable impact on SRP
154 * /sys/class/block/${dev}/queue/rq_affinity
155 * /proc/irq/${ib_int_no}/smp_affinity
158 Performance Notes - Target Side
159 ----------------------------------
161 * In some cases, for instance working with SSD devices, which consume 100%
162 of a single CPU load for data transfers in their internal threads, to
163 maximize IOPS it can be needed to assign for those threads dedicated
164 CPUs using Linux CPU affinity facilities. No IRQ processing should be
165 done on those CPUs. Check that using /proc/interrupts. See taskset
166 command and Documentation/IRQ-affinity.txt in your kernel's source tree
167 for how to assign CPU affinity to tasks and IRQs.
169 The reason for that is that processing of coming commands in SIRQ context
170 can be done on the same CPUs as SSD devices' threads doing data
171 transfers. As the result, those threads won't receive all the CPU power
174 Alternatively to CPU affinity assignment, you can try to enable SRP
175 target's internal thread. It will allows Linux CPU scheduler to better
176 distribute load among available CPUs. To enable SRP target driver's
177 internal thread you should load ib_srpt module with parameter
181 Send questions about this driver to scst-devel@lists.sourceforge.net, CC:
182 Vu Pham <vuhuong@mellanox.com> and Bart Van Assche <bart.vanassche@gmail.com>.