Windows OpenFabrics

User's Manual

Release 2.1

08/18/2009

Overview

The Windows OpenFabrics (WinOF) package is composed of software modules intended for use on Microsoft Windows based computer systems connected via an InfiniBand fabric.

The Windows OpenFabrics software package contains the following:

OpenFabrics Infiniband core drivers and Upper Level Protocols (ULPs):

OpenFabrics utilities:

Documentation

 

WinOF Features

 

 

 

Tools


The OpenFabrics Alliance Windows release contains a set of user mode tools which are designed to faciliate the smooth operation of an Windows OpenFabrics installation. These tools are available from a command window (cmd.exe) as the installation path '%SystemDrive%\Program Files\WinOF' is appended to the system wide search path registry entry. A start menu short-cut 'WinOF Cmd Window' is provided to faciliate correction tool operation.

IPoIB Partition Management

Infiniband Subnet Management

QLogic VNIC Child Device Management

Performance

Diagnostics

OFED Diagnostics

<return-to-top>

 

User mode micro-benchmarks


The following user-mode test programs are intended as useful micro-benchmarks for HW or SW tuning and/or functional testing.

Tests use CPU cycle counters to get time stamps without context switch.

Tests measure round-trip time but report half of that as one-way latency
(i.e.. May not be sufficiently accurate for asymmetrical configurations).

Min/Median/Max result is reported.
The median (vs. average) is less sensitive to extreme scores.
Typically the "Max" value is the first value measured.

larger samples only marginally help. The default (1000) is pretty good.
Note that an array of cycles_t (typically unsigned long) is allocated
once to collect samples and again to store the difference between them.
Really big sample sizes (e.g. 1 million) might expose other problems
with the program.

"-H" option will dump the histogram for additional statistical analysis.
See xgraph, ygraph, r-base (http://www.r-project.org/), pspp, or other
statistical math programs.

Architectures tested: x86, x86_64, ia64

Also see winverbs performance tools.


ib_send_lat.exe      - latency test with send transactions

Usage:

ib_send_lat start a server and wait for connection
ib_send_lat <host> connect to server at <host>

Options:

-p, --port=<port> listen on/connect to port <port> (default 18515)
-c, --connection=<RC/UC> connection type RC/UC (default RC)
-m, --mtu=<mtu> mtu size (default 2048)
-d, --ib-dev=<dev> use IB device <dev> (default first device found)
-i, --ib-port=<port> use port <port> of IB device (default 1)
-s, --size=<size> size of message to exchange (default 1)
-t, --tx-depth=<dep> size of tx queue (default 50)
-l, --signal signal completion on each msg
-a, --all Run sizes from 2 till 2^23
-n, --iters=<iters> number of exchanges (at least 2, default 1000)
-C, --report-cycles report times in cpu cycle units (default microseconds)
-H, --report-histogram print out all results (default print summary only)
-U, --report-unsorted (implies -H) print out unsorted results (default sorted)
-V, --version display version number
-e, --events sleep on CQ events (default poll)


ib_send_bw.exe     - BW (BandWidth) test with send transactions

Usage:

ib_send_bw start a server and wait for connection
ib_send_bw <host> connect to server at <host>

Options:

-p, --port=<port> listen on/connect to port <port> (default 18515)
-d, --ib-dev=<dev> use IB device <dev> (default first device found)
-i, --ib-port=<port> use port <port> of IB device (default 1)
-c, --connection=<RC/UC> connection type RC/UC/UD (default RC)
-m, --mtu=<mtu> mtu size (default 1024)
-s, --size=<size> size of message to exchange (default 65536)
-a, --all Run sizes from 2 till 2^23
-t, --tx-depth=<dep> size of tx queue (default 300)
-n, --iters=<iters> number of exchanges (at least 2, default 1000)
-b, --bidirectional measure bidirectional bandwidth (default unidirectional)
-V, --version display version number
-e, --events sleep on CQ events (default poll)


ib_write_lat.exe      - latency test with RDMA write transactions

Usage:

ib_write_lat start a server and wait for connection
ib_write_lat <host> connect to server at <host>

Options:

-p, --port=<port> listen on/connect to port <port> (default 18515)
-c, --connection=<RC/UC> connection type RC/UC (default RC)
-m, --mtu=<mtu> mtu size (default 1024)
-d, --ib-dev=<dev> use IB device <dev> (default first device found)
-i, --ib-port=<port> use port <port> of IB device (default 1)
-s, --size=<size> size of message to exchange (default 1)
-a, --all Run sizes from 2 till 2^23
-t, --tx-depth=<dep> size of tx queue (default 50)
-n, --iters=<iters> number of exchanges (at least 2, default 1000)
-C, --report-cycles report times in cpu cycle units (default microseconds)
-H, --report-histogram print out all results (default print summary only)
-U, --report-unsorted (implies -H) print out unsorted results (default sorted)
-V, --version display version number


ib_write_bw.exe     - BW test with RDMA write transactions

Usage:

ib_write_bw                # start a server and wait for connection
ib_write_bw <host>    # connect to server at <host>

Options:

-p, --port=<port> listen on/connect to port <port> (default 18515)
-d, --ib-dev=<dev> use IB device <dev> (default first device found)
-i, --ib-port=<port> use port <port> of IB device (default 1)
-c, --connection=<RC/UC> connection type RC/UC (default RC)
-m, --mtu=<mtu> mtu size (default 1024)
-g, --post=<num of posts> number of posts for each qp in the chain (default tx_depth)
-q, --qp=<num of qp's> Num of qp's(default 1)
-s, --size=<size> size of message to exchange (default 65536)
-a, --all Run sizes from 2 till 2^23
-t, --tx-depth=<dep> size of tx queue (default 100)
-n, --iters=<iters> number of exchanges (at least 2, default 5000)
-b, --bidirectional measure bidirectional bandwidth (default unidirectional)
-V, --version display version number

<return-to-top>


 


ttcp - Test TCP performance

TTCP accesses the Windows socket layer, hence it does not access IB verbs directly. IPoIB or WSD layers are invoked beneath the socket layer depending on configuration. TTCP is included as a quick baseline performance check.

Usage: ttcp -t [-options] host 
       ttcp -r [-options]
Common options:
	-l ##	length of bufs read from or written to network (default 8192)
	-u	use UDP instead of TCP
	-p ##	port number to send to or listen at (default 5001)
	-A	align the start of buffers to this modulus (default 16384)
	-O	start buffers at this offset from the modulus (default 0)
	-d	set SO_DEBUG socket option
	-b ##	set socket buffer size (if supported)
	-f X	format for rate: k,K = kilo{bit,byte}; m,M = mega; g,G = giga
Options specific to -t:
	-n##	number of source bufs written to network (default 2048)
	-D	don't buffer TCP writes (sets TCP_NODELAY socket option)
Options specific to -r:
	-B	for -s, only output full blocks as specified by -l (for TAR)
	-T	"touch": access each byte as it's read

Requires a receiver (server) side and a transmitter (client) side, host1 and host2 are IPoIB connected hosts.

at host1 (receiver)        ttcp -r -f M -l 4096

at host2 (transmitter)    ttcp -t -f M -l 4096 -n1000 host1

<return-to-top>

 

 

Diagnostics


IBADDR(8) OFED Diagnostics

NAME
ibaddr - query InfiniBand address(es)

SYNOPSIS
ibaddr [-d(ebug)] [-D(irect)] [-G(uid)] [-l(id_show)] [-g(id_show)] [-C ca_name] [-P ca_port] [-t(imeout) timeout_ms] [-V(ersion)] [-h(elp)] [<lid | dr_path | guid>]

DESCRIPTION
Display the lid (and range) as well as the GID address of the port
specified (by DR path, lid, or GUID) or the local port by default.

Note: this utility can be used as simple address resolver.

OPTIONS
-G, --Guid
show lid range and gid for GUID address

-l, --lid_show
show lid range only

-L, --Lid_show
show lid range (in decimal) only

-g, --gid_show
show gid address only


COMMON OPTIONS
Most WinOF diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-D use directed path address arguments. The path
is a comma separated list of out ports.
Examples:
"0" # self port
"0,1,2,1,4" # out via port 1, then 2, ...

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
ibaddr # local port´s address

ibaddr 32 # show lid range and gid of lid 32

ibaddr -G 0x8f1040023 # same but using guid address

ibaddr -l 32 # show lid range only

ibaddr -L 32 # show decimal lid range only

ibaddr -g 32 # show gid address only


SEE ALSO
ibroute(8), ibtracert(8)

AUTHOR
Hal Rosenstock
<halr@voltaire.com>


OFED June 18, 2007 IBADDR(8)
 

 

IBLINKINFO(8) OFED Diagnostics
 

NAME
iblinkinfo - report link info for all links in the fabric


SYNOPSIS
iblinkinfo [-Rhcdl -C <ca_name> -P <ca_port> -v <lt,hoq,vlstall> -S <guid> -D<direct_route>]


DESCRIPTION
iblinkinfo reports the link info for each port of each switch active
in the IB fabric.


OPTIONS
-R Recalculate the ibnetdiscover information, ie do not use the
cached information. This option is slower but should be used if
the diag tools have not been used for some time or if there are
other reasons to believe the fabric has changed.

-S <guid>
Output only the switch specified by <guid> (hex format)

-D <direct_route>
Output only the switch specified by the direct route path.

-l Print all information for each link on one line. Default is to
print a header with the switch information and then a list for
each port (useful for grep´ing output).

-d Print only switches which have a port in the "Down" state.

-v <lt,hoq,vlstall>
Verify additional switch settings (<Life-
Time>,<HoqLife>,<VLStallCount>)

-c Print port capabilities (enabled and supported values)

-C <ca_name> use the specified ca_name for the search.

-P <ca_port> use the specified ca_port for the search.



AUTHOR
Ira Weiny <weiny2@llnl.gov>


OFED Jan 24, 2008 IBLINKINFO(8)

<return-to-top>
 

 

IBNETDISCOVER(8) OFED Diagnostics
 

NAME
ibnetdiscover - discover InfiniBand topology


SYNOPSIS
ibnetdiscover [-d(ebug)] [-e(rr_show)] [-v(erbose)] [-s(how)] [-l(ist)]
[-g(rouping)] [-H(ca_list)] [-S(witch_list)] [-R(outer_list)] [-C
ca_name] [-P ca_port] [-t(imeout) timeout_ms] [-V(ersion)] [--node-
name-map <node-name-map>] [-p(orts)] [-h(elp)] [<topology-file>]


DESCRIPTION
ibnetdiscover performs IB subnet discovery and outputs a human readable
topology file. GUIDs, node types, and port numbers are displayed as
well as port LIDs and NodeDescriptions. All nodes (and links) are dis-
played (full topology). Optionally, this utility can be used to list
the current connected nodes by nodetype. The output is printed to
standard output unless a topology file is specified.


OPTIONS
-l, --list
List of connected nodes

-g, --grouping
Show grouping. Grouping correlates IB nodes by different vendor
specific schemes. It may also show the switch external ports
correspondence.

-H, --Hca_list
List of connected CAs

-S, --Switch_list
List of connected switches

-R, --Router_list
List of connected routers

-s, --show
Show progress information during discovery.

--node-name-map <node-name-map>
Specify a node name map. The node name map file maps GUIDs to
more user friendly names. See file format below.

-p, --ports
Obtain a ports report which is a list of connected ports with
relevant information (like LID, portnum, GUID, width, speed, and
NodeDescription).


COMMON OPTIONS
Most OpenIB diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


TOPOLOGY FILE FORMAT
The topology file format is human readable and largely intuitive. Most
identifiers are given textual names like vendor ID (vendid), device ID
(device ID), GUIDs of various types (sysimgguid, caguid, switchguid,
etc.). PortGUIDs are shown in parentheses (). For switches, this is
shown on the switchguid line. For CA and router ports, it is shown on
the connectivity lines. The IB node is identified followed by the num-
ber of ports and a quoted the node GUID. On the right of this line is
a comment (#) followed by the NodeDescription in quotes. If the node
is a switch, this line also contains whether switch port 0 is base or
enhanced, and the LID and LMC of port 0. Subsequent lines pertaining
to this node show the connectivity. On the left is the port number of
the current node. On the right is the peer node (node at other end of
link). It is identified in quotes with nodetype followed by - followed
by NodeGUID with the port number in square brackets. Further on the
right is a comment (#). What follows the comment is dependent on the
node type. If it it a switch node, it is followed by the NodeDescrip-
tion in quotes and the LID of the peer node. If it is a CA or router
node, it is followed by the local LID and LMC and then followed by the
NodeDescription in quotes and the LID of the peer node. The active
link width and speed are then appended to the end of this output line.

An example of this is:
#
# Topology file: generated on Tue Jun 5 14:15:10 2007
#
# Max of 3 hops discovered
# Initiated from node 0008f10403960558 port 0008f10403960559

Non-Chassis Nodes

vendid=0x8f1
devid=0x5a06
sysimgguid=0x5442ba00003000
switchguid=0x5442ba00003080(5442ba00003080)
Switch 24 "S-005442ba00003080" # "ISR9024 Voltaire" base port 0 lid 6 lmc 0
[22] "H-0008f10403961354"[1](8f10403961355) # "MT23108 InfiniHost Mellanox Technologies" lid 4 4xSDR
[10] "S-0008f10400410015"[1] # "SW-6IB4 Voltaire" lid 3 4xSDR
[8] "H-0008f10403960558"[2](8f1040396055a) # "MT23108 InfiniHost Mellanox Technologies" lid 14 4xSDR
[6] "S-0008f10400410015"[3] # "SW-6IB4 Voltaire" lid 3 4xSDR
[12] "H-0008f10403960558"[1](8f10403960559) # "MT23108 InfiniHost Mellanox Technologies" lid 10 4xSDR

vendid=0x8f1
devid=0x5a05
switchguid=0x8f10400410015(8f10400410015)
Switch 8 "S-0008f10400410015" # "SW-6IB4 Voltaire" base port 0 lid 3 lmc 0
[6] "H-0008f10403960984"[1](8f10403960985) # "MT23108 InfiniHost Mellanox Technologies" lid 16 4xSDR
[4] "H-005442b100004900"[1](5442b100004901) # "MT23108 InfiniHost Mellanox Technologies" lid 12 4xSDR
[1] "S-005442ba00003080"[10] # "ISR9024 Voltaire" lid 6 1xSDR
[3] "S-005442ba00003080"[6] # "ISR9024 Voltaire" lid 6 4xSDR

vendid=0x2c9
devid=0x5a44
caguid=0x8f10403960984
Ca 2 "H-0008f10403960984" # "MT23108 InfiniHost Mellanox Technologies"
[1](8f10403960985) "S-0008f10400410015"[6] # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR

vendid=0x2c9
devid=0x5a44
caguid=0x5442b100004900
Ca 2 "H-005442b100004900" # "MT23108 InfiniHost Mellanox Technologies"
[1](5442b100004901) "S-0008f10400410015"[4] # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR

vendid=0x2c9
devid=0x5a44
caguid=0x8f10403961354
Ca 2 "H-0008f10403961354" # "MT23108 InfiniHost Mellanox Technologies"
[1](8f10403961355) "S-005442ba00003080"[22] # lid 4 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR

vendid=0x2c9
devid=0x5a44
caguid=0x8f10403960558
Ca 2 "H-0008f10403960558" # "MT23108 InfiniHost Mellanox Technologies"
[2](8f1040396055a) "S-005442ba00003080"[8] # lid 14 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR
[1](8f10403960559) "S-005442ba00003080"[12] # lid 10 lmc 1 "ISR9024 Voltaire" lid 6 1xSDR

When grouping is used, IB nodes are organized into chasses which are
numbered. Nodes which cannot be determined to be in a chassis are dis-
played as "Non-Chassis Nodes". External ports are also shown on the
connectivity lines.



NODE NAME MAP FILE FORMAT
The node name map is used to specify user friendly names for nodes in
the output. GUIDs are used to perform the lookup.


Generically:

# comment
<guid> "<name>"


Example:

# IB1
# Line cards
0x0008f104003f125c "IB1 (Rack 11 slot 1 ) ISR9288/ISR9096
Voltaire sLB-24D"
0x0008f104003f125d "IB1 (Rack 11 slot 1 ) ISR9288/ISR9096
Voltaire sLB-24D"
0x0008f104003f10d2 "IB1 (Rack 11 slot 2 ) ISR9288/ISR9096
Voltaire sLB-24D"
0x0008f104003f10d3 "IB1 (Rack 11 slot 2 ) ISR9288/ISR9096
Voltaire sLB-24D"
0x0008f104003f10bf "IB1 (Rack 11 slot 12 ) ISR9288/ISR9096
Voltaire sLB-24D"
# Spines
0x0008f10400400e2d "IB1 (Rack 11 spine 1 ) ISR9288 Voltaire
sFB-12D"
0x0008f10400400e2e "IB1 (Rack 11 spine 1 ) ISR9288 Voltaire
sFB-12D"
0x0008f10400400e2f "IB1 (Rack 11 spine 1 ) ISR9288 Voltaire
sFB-12D"
0x0008f10400400e31 "IB1 (Rack 11 spine 2 ) ISR9288 Voltaire
sFB-12D"
0x0008f10400400e32 "IB1 (Rack 11 spine 2 ) ISR9288 Voltaire
sFB-12D"
# GUID Node Name
0x0008f10400411a08 "SW1 (Rack 3) ISR9024 Voltaire 9024D"
0x0008f10400411a28 "SW2 (Rack 3) ISR9024 Voltaire 9024D"
0x0008f10400411a34 "SW3 (Rack 3) ISR9024 Voltaire 9024D"
0x0008f104004119d0 "SW4 (Rack 3) ISR9024 Voltaire 9024D"


AUTHORS
Hal Rosenstock    <halr@voltaire.com>

Ira Weiny    <weiny2@llnl.gov>


OFED January 3, 2008 IBNETDISCOVER(8)

<return-to-top>
 

 

IBPING(8) OFED Diagnostics
 

NAME
ibping - ping an InfiniBand address


SYNOPSIS
ibping [-d(ebug)] [-e(rr_show)] [-v(erbose)] [-G(uid)] [-C ca_name] [-P
ca_port] [-s smlid] [-t(imeout) timeout_ms] [-V(ersion)] [-c
ping_count] [-f(lood)] [-o oui] [-S(erver)] [-h(elp)] <dest lid | guid>


DESCRIPTION
ibping uses vendor mads to validate connectivity between IB nodes. On
exit, (IP) ping like output is show. ibping is run as client/server.
Default is to run as client. Note also that a default ping server is
implemented within the kernel.


OPTIONS
-c stop after count packets

-f, --flood
flood destination: send packets back to back without delay

-o, --oui
use specified OUI number to multiplex vendor mads

-S, --Server
start in server mode (do not return)


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


AUTHOR
Hal Rosenstock <halr@voltaire.com>


OFED August 11, 2006 IBPING(8)


<return-to-top>

 

IBPORTSTATE(8) OFED Diagnostics


NAME
ibportstate - handle port (physical) state and link speed of an Infini-
Band port


SYNOPSIS
ibportstate [-d(ebug)] [-e(rr_show)] [-v(erbose)] [-D(irect)] [-G(uid)] [-s smlid] [-V(ersion)] [-C ca_name] [-P ca_port] [-t(imeout) time-out_ms] [-h(elp)] <dest dr_path|lid|guid> <portnum> [<op>]


DESCRIPTION
ibportstate allows the port state and port physical state of an IB port
to be queried (in addition to link width and speed being validated rel-
ative to the peer port when the port queried is a switch port), or a
switch port to be disabled, enabled, or reset. It also allows the link
speed enabled on any IB port to be adjusted.


OPTIONS
op Port operations allowed
supported ops: enable, disable, reset, speed, query
Default is query

ops enable, disable, and reset are only allowed on switch ports
(An error is indicated if attempted on CA or router ports)
speed op is allowed on any port
speed values are legal values for PortInfo:LinkSpeedEnabled
(An error is indicated if PortInfo:LinkSpeedSupported does not support
this setting)
(NOTE: Speed changes are not effected until the port goes through
link renegotiation)
query also validates port characteristics (link width and speed)
based on the peer port. This checking is done when the port
queried is a switch port as it relies on combined routing
(an initial LID route with directed routing to the peer) which
can only be done on a switch. This peer port validation feature
of query op requires LID routing to be functioning in the subnet.


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-D use directed path address arguments. The path
is a comma separated list of out ports.
Examples:
"0" # self port
"0,1,2,1,4" # out via port 1, then 2, ...

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
ibportstate 3 1 disable # by lid

ibportstate -G 0x2C9000100D051 1 enable # by guid

ibportstate -D 0 1 # (query) by direct route

ibportstate 3 1 reset # by lid

ibportstate 3 1 speed 1 # by lid


AUTHOR
Hal Rosenstock <halr@voltaire.com>


OFED October 19, 2006 IBPORTSTATE(8)


<return-to-top>

 

IBQUERYERRORS(8) OFED Diagnostics
 

NAME
ibqueryerrors - query and report non-zero IB port counters


SYNOPSIS
ibqueryerrors [-a -c -r -R -C <ca_name> -P <ca_port> -s
<err1,err2,...> -S <switch_guid> -D <direct_route> -d]


DESCRIPTION
ibqueryerrors reports the port counters of switches. This is simi-
lar to ibcheckerrors with the additional ability to filter out selected
errors, include the optional transmit and receive data counters, report
actions to remedy a non-zero count, and report full link information
for the link reported.


OPTIONS
-a Report an action to take. Some of the counters are not errors
in and of themselves. This reports some more information on
what the counters mean and what actions can/should be taken if
they are non-zero.

-c Suppress some of the common "side effect" counters. These coun-
ters usually do not indicate an error condition and can be usu-
ally be safely ignored.

-r Report the port information. This includes LID, port, external
port (if applicable), link speed setting, remote GUID, remote
port, remote external port (if applicable), and remote node
description information.

-R Recalculate the ibnetdiscover information, ie do not use the
cached information. This option is slower but should be used if
the diag tools have not been used for some time or if there are
other reasons to believe that the fabric has changed.

-s <err1,err2,...>
Suppress the errors listed in the comma separated list provided.

-S <switch_guid>
Report results only for the switch specified. (hex format)

-D <direct_route>
Report results only for the switch specified by the direct route
path.

-d Include the optional transmit and receive data counters.

-C <ca_name> use the specified ca_name for the search.

-P <ca_port> use the specified ca_port for the search.

AUTHOR
Ira Weiny <weiny2@llnl.gov>


OFED Jan 24, 2008 IBQUERYERRORS(8)


<return-to-top>

 

IBROUTE(8) OFED Diagnostics
 

NAME
ibroute - query InfiniBand switch forwarding tables


SYNOPSIS
ibroute [-d(ebug)] [-a(ll)] [-n(o_dests)] [-v(erbose)] [-D(irect)]
[-G(uid)] [-M(ulticast)] [-s smlid] [-C ca_name] [-P ca_port] [-t(ime-
out) timeout_ms] [-V(ersion)] [-h(elp)] [<dest dr_path|lid|guid>
[<startlid> [<endlid>]]]


DESCRIPTION
ibroute uses SMPs to display the forwarding tables (unicast (LinearFor-
wardingTable or LFT) or multicast (MulticastForwardingTable or MFT))
for the specified switch LID and the optional lid (mlid) range. The
default range is all valid entries in the range 1...FDBTop.


OPTIONS
-a, --all
show all lids in range, even invalid entries

-n, --no_dests
do not try to resolve destinations

-M, --Multicast
show multicast forwarding tables In this case, the range parame-
ters are specifying the mlid range.


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-D use directed path address arguments. The path
is a comma separated list of out ports.
Examples:
"0" # self port
"0,1,2,1,4" # out via port 1, then 2, ...

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
Unicast examples

ibroute 4 # dump all lids with valid out ports of switch with lid 4

ibroute -a 4 # same, but dump all lids, even with invalid out ports

ibroute -n 4 # simple dump format - no destination resolution

ibroute 4 10 # dump lids starting from 10 (up to FDBTop)

ibroute 4 0x10 0x20 # dump lid range

ibroute -G 0x08f1040023 # resolve switch by GUID

ibroute -D 0,1 # resolve switch by direct path


Multicast examples

ibroute -M 4 # dump all non empty mlids of switch with lid 4

ibroute -M 4 0xc010 0xc020 # same, but with range

ibroute -M -n 4 # simple dump format


SEE ALSO
ibtracert(8)

AUTHOR
Hal Rosenstock <halr@voltaire.com>


OFED July 25, 2006 IBROUTE(8)


<return-to-top>
 

 


ibv_devinfo - print CA (Channel Adapter) attributes

usage: ibv_devinfo  [options]

Options:
   -d, --ib-dev=<dev> use IB device <dev> (default: first device found)
    -i, --ib-port=<port> use port <port> of IB device (default: all ports)
    -l, --list print only the IB devices names
    -v, --verbose print all the attributes of the IB device(s)

<return-to-top>
 


IBSTAT(8) OFED Diagnostics

NAME
ibstat - query basic status of InfiniBand device(s)


SYNOPSIS
ibstat [-d(ebug)] [-l(ist_of_cas)] [-s(hort)] [-p(ort_list)] [-V(ersion)] [-h] <ca_name> [portnum]


DESCRIPTION
ibstat is a binary which displays basic information obtained from the
local IB driver. Output includes LID, SMLID, port state, link width
active, and port physical state.

It is similar to the ibstatus utility but implemented as a binary
rather than a script. It has options to list CAs and/or ports and dis-
plays more information than ibstatus.


OPTIONS
-l, --list_of_cas
list all IB devices

-s, --short
short output

-p, --port_list
show port list

ca_name
InfiniBand device name

portnum
port number of InfiniBand device


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-D use directed path address arguments. The path
is a comma separated list of out ports.
Examples:
"0" # self port
"0,1,2,1,4" # out via port 1, then 2, ...

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
ibstat # display status of all ports on all IB devices

ibstat -l # list all IB devices

ibstat -p # show port guids

ibstat ibv_device0 2 # show status of port 2 of ’hca0’


SEE ALSO
ibstatus(8)


AUTHOR
Hal Rosenstock <halr@voltaire.com>


OFED July 25, 2006 IBSTAT(8)

<return-to-top>
 

 

IBSYSSTAT(8) OFED Diagnostics
 

NAME
ibsysstat - system status on an InfiniBand address


SYNOPSIS
ibsysstat [-d(ebug)] [-e(rr_show)] [-v(erbose)] [-G(uid)] [-C ca_name]
[-P ca_port] [-s smlid] [-t(imeout) timeout_ms] [-V(ersion)] [-o oui]
[-S(erver)] [-h(elp)] <dest lid | guid> [<op>]


DESCRIPTION
ibsysstat uses vendor mads to validate connectivity between IB nodes
and obtain other information about the IB node. ibsysstat is run as
client/server. Default is to run as client.


OPTIONS
Current supported operations:
ping - verify connectivity to server (default)
host - obtain host information from server
cpu - obtain cpu information from server

-o, --oui
use specified OUI number to multiplex vendor mads

-S, --Server
start in server mode (do not return)



COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


AUTHOR
Hal Rosenstock    <halr@voltaire.com>


OFED August 11, 2006 IBSYSSTAT(8)


<return-to-top>

 

IBTRACERT(8) OFED Diagnostics


NAME
ibtracert- trace InfiniBand path


SYNOPSIS
ibtracert [-d(ebug)] [-v(erbose)] [-D(irect)] [-G(uids)] [-n(o_info)]
[-m mlid] [-s smlid] [-C ca_name] [-P ca_port] [-t(imeout) timeout_ms]
[-V(ersion)] [--node-name--map <node-name-map>] [-h(elp)] [<dest
dr_path|lid|guid> [<startlid> [<endlid>]]]


DESCRIPTION
ibtracert uses SMPs to trace the path from a source GID/LID to a desti-
nation GID/LID. Each hop along the path is displayed until the destina-
tion is reached or a hop does not respond. By using the -m option, mul-
ticast path tracing can be performed between source and destination
nodes.


OPTIONS
-n, --no_info
simple format; don’t show additional information

-m show the multicast trace of the specified mlid

--node-name-map <node-name-map>
Specify a node name map. The node name map file maps GUIDs to
more user friendly names. See ibnetdiscover(8) for node name
map file format.


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-D use directed path address arguments. The path
is a comma separated list of out ports.
Examples:
"0" # self port
"0,1,2,1,4" # out via port 1, then 2, ...

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
Unicast examples

ibtracert 4 16 # show path between lids 4 and 16

ibtracert -n 4 16 # same, but using simple output format

ibtracert -G 0x8f1040396522d 0x002c9000100d051 # use guid addresses


Multicast example

ibtracert -m 0xc000 4 16 # show multicast path of mlid 0xc000
between lids 4 and 16


SEE ALSO
ibroute(8)


AUTHOR
    Hal Rosenstock    <halr@voltaire.com>

    Ira Weiny    <weiny2@llnl.gov>

OFED April 14, 2007 IBTRACERT(8)

<return-to-top>

 

PERFQUERY(8) OFED Diagnostics


NAME
perfquery - query InfiniBand port counters


SYNOPSIS
perfquery [-d(ebug)] [-G(uid)] [-x|--extended] [-X|--xmtsl]
[-S|--rcvsl] [-a(ll_ports)] [-l(oop_ports)] [-r(eset_after_read)]
[-R(eset_only)] [-C ca_name] [-P ca_port] [-t(imeout) timeout_ms]
[-V(ersion)] [-h(elp)] [<lid|guid> [[port] [reset_mask]]]


DESCRIPTION
perfquery uses PerfMgt GMPs to obtain the PortCounters (basic perfor-
mance and error counters), PortExtendedCounters, PortXmitDataSL, or
PortRcvDataSL from the PMA at the node/port specified. Optionally shows
aggregated counters for all ports of node. Also, optionally, reset
after read, or only reset counters.

Note: In PortCounters, PortCountersExtended, PortXmitDataSL, and PortR-
cvDataSL, components that represent Data (e.g. PortXmitData and PortR-
cvData) indicate octets divided by 4 rather than just octets.

Note: Inputting a port of 255 indicates an operation be performed on
all ports.


OPTIONS
-x, --extended
show extended port counters rather than (basic) port counters.
Note that extended port counters attribute is optional.

-X, --xmtsl
show transmit data SL counter. This is an optional counter for
QoS.

-S, --rcvsl
show receive data SL counter. This is an optional counter for
QoS.

-a, --all_ports
show aggregated counters for all ports of the destination lid or
reset all counters for all ports. If the destination lid does
not support the AllPortSelect flag, all ports will be iterated
through to emulate AllPortSelect behavior.

-l, --loop_ports
If all ports are selected by the user (either through the -a
option or port 255) iterate through each port rather than doing
than aggregate operation.

-r, --reset_after_read
reset counters after read

-R, --Reset_only
only reset counters


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
perfquery # read local port performance counters

perfquery 32 1 # read performance counters from lid 32, port 1

perfquery -x 32 1 # read extended performance counters from lid 32, port 1

perfquery -a 32 # read perf counters from lid 32, all ports

perfquery -r 32 1 # read performance counters and reset

perfquery -x -r 32 1 # read extended performance counters and reset

perfquery -R 0x20 1 # reset performance counters of port 1 only

perfquery -x -R 0x20 1 # reset extended performance counters of port 1 only

perfquery -R -a 32 # reset performance counters of all ports

perfquery -R 32 2 0x0fff # reset only error counters of port 2

perfquery -R 32 2 0xf000 # reset only non-error counters of port 2


AUTHOR
Hal Rosenstock    <halr@voltaire.com>


OFED March 10, 2009 PERFQUERY(8)


<return-to-top>

 

SAQUERY(8) OFED Diagnostics


NAME
saquery - query InfiniBand subnet administration attributes


SYNOPSIS
saquery [-h] [-d] [-p] [-N] [--list | -D] [-S] [-I] [-L] [-l] [-G] [-O]
[-U] [-c] [-s] [-g] [-m] [-x] [-C ca_name] [-P ca_port] [--smkey val]
[-t(imeout) <msec>] [--src-to-dst <src:dst>] [--sgid-to-dgid
<sgid-dgid>] [--node-name-map <node-name-map>] [<name> | <lid> |
<guid>]


DESCRIPTION
saquery issues the selected SA query. Node records are queried by
default.


OPTIONS
-p get PathRecord info

-N get NodeRecord info

--list | -D
get NodeDescriptions of CAs only

-S get ServiceRecord info

-I get InformInfoRecord (subscription) info

-L return the Lids of the name specified

-l return the unique Lid of the name specified

-G return the Guids of the name specified

-O return the name for the Lid specified

-U return the name for the Guid specified

-c get the SA’s class port info

-s return the PortInfoRecords with isSM or isSMdisabled capability
mask bit on

-g get multicast group info

-m get multicast member info. If a group is specified, limit the
output to the group specified and print one line containing only
the GUID and node description for each entry. Example: saquery
-m 0xc000

-x get LinkRecord info

--src-to-dst
get a PathRecord for <src:dst> where src and dst are either node
names or LIDs

--sgid-to-dgid
get a PathRecord for sgid to dgid where both GIDs are in an IPv6
format acceptable to inet_pton(3).

-C <ca_name>
use the specified ca_name.

-P <ca_port>
use the specified ca_port.

--smkey <val>
use SM_Key value for the query. Will be used only with "trusted"
queries. If non-numeric value (like ’x’) is specified then
saquery will prompt for a value.

-t, -timeout <msec>
Specify SA query response timeout in milliseconds. Default is
100 milliseconds. You may want to use this option if IB_TIMEOUT
is indicated.

--node-name-map <node-name-map>
Specify a node name map. The node name map file maps GUIDs to
more user friendly names. See ibnetdiscover(8) for node name
map file format. Only used with the -O and -U options.

Supported query names (and aliases):
ClassPortInfo (CPI)
NodeRecord (NR) [lid]
PortInfoRecord (PIR) [[lid]/[port]]
SL2VLTableRecord (SL2VL) [[lid]/[in_port]/[out_port]]
PKeyTableRecord (PKTR) [[lid]/[port]/[block]]
VLArbitrationTableRecord (VLAR) [[lid]/[port]/[block]]
InformInfoRecord (IIR)
LinkRecord (LR) [[from_lid]/[from_port]] [[to_lid]/[to_port]]
ServiceRecord (SR)
PathRecord (PR)
MCMemberRecord (MCMR)
LFTRecord (LFTR) [[lid]/[block]]
MFTRecord (MFTR) [[mlid]/[position]/[block]]

-d enable debugging

-h show help


AUTHORS
Ira Weiny <weiny2@llnl.gov>

Hal Rosenstock <halr@voltaire.com>


OFED October 19, 2008 SAQUERY(8)

<return-to-top>

 

SMINFO(8) OFED Diagnostics


NAME
sminfo - query InfiniBand SMInfo attribute


SYNOPSIS
sminfo [-d(ebug)] [-e(rr_show)] -s state -p prio -a activity
[-D(irect)] [-G(uid)] [-C ca_name] [-P ca_port] [-t(imeout) time-
out_ms] [-V(ersion)] [-h(elp)] sm_lid | sm_dr_path [modifier]


DESCRIPTION
Optionally set and display the output of a sminfo query in human read-
able format. The target SM is the one listed in the local port info, or
the SM specified by the optional SM lid or by the SM direct routed
path.

Note: using sminfo for any purposes other then simple query may be very
dangerous, and may result in a malfunction of the target SM.


OPTIONS
-s set SM state
0 - not active
1 - discovering
2 - standby
3 - master

-p set priority (0-15)

-a set activity count


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-D use directed path address arguments. The path
is a comma separated list of out ports.
Examples:
"0" # self port
"0,1,2,1,4" # out via port 1, then 2, ...

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
sminfo         # local port´s sminfo

sminfo 32     # show sminfo of lid 32

sminfo -G 0x8f1040023     # same but using guid address


SEE ALSO
smpdump(8)


AUTHOR
Hal Rosenstock    <halr@voltaire.com>

OFED July 25, 2006 SMINFO(8)


<return-to-top>

 

SMPDUMP(8) OFED Diagnostics


NAME
smpdump - dump InfiniBand subnet management attributes


SYNOPSIS
smpdump [-s(ring)] [-D(irect)] [-C ca_name] [-P ca_port] [-t(imeout)
timeout_ms] [-V(ersion)] [-h(elp)] <dlid|dr_path> <attr> [mod]


DESCRIPTION
smpdump is a general purpose SMP utility which gets SM attributes from
a specified SMA. The result is dumped in hex by default.


OPTIONS
attr IBA attribute ID for SM attribute

mod IBA modifier for SM attribute


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-D use directed path address arguments. The path
is a comma separated list of out ports.
Examples:
"0" # self port
"0,1,2,1,4" # out via port 1, then 2, ...

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
Direct Routed Examples

smpdump -D 0,1,2,3,5 16 # NODE DESC

smpdump -D 0,1,2 0x15 2 # PORT INFO, port 2

LID Routed Examples

smpdump 3 0x15 2 # PORT INFO, lid 3 port 2

smpdump 0xa0 0x11 # NODE INFO, lid 0xa0


SEE ALSO
smpquery(8)


AUTHOR
Hal Rosenstock    <halr@voltaire.com>


OFED July 25, 2006 SMPDUMP(8)


<return-to-top>

 

SMPQUERY(8) OFED Diagnostics


NAME
smpquery - query InfiniBand subnet management attributes


SYNOPSIS
smpquery [-d(ebug)] [-e(rr_show)] [-v(erbose)] [-D(irect)] [-G(uid)]
[-C ca_name] [-P ca_port] [-t(imeout) timeout_ms] [--node-name-map
node-name-map] [-V(ersion)] [-h(elp)] <op> <dest dr_path|lid|guid> [op
params]


DESCRIPTION
smpquery allows a basic subset of standard SMP queries including the
following: node info, node description, switch info, port info. Fields
are displayed in human readable format.


OPTIONS
Current supported operations and their parameters:
nodeinfo <addr>
nodedesc <addr>
portinfo <addr> [<portnum>] # default port is zero
switchinfo <addr>
pkeys <addr> [<portnum>]
sl2vl <addr> [<portnum>]
vlarb <addr> [<portnum>]
guids <addr>


--node-name-map <node-name-map>
Specify a node name map. The node name map file maps GUIDs to
more user friendly names. See ibnetdiscover(8) for node name
map file format.


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-D use directed path address arguments. The path
is a comma separated list of out ports.
Examples:
"0" # self port
"0,1,2,1,4" # out via port 1, then 2, ...

-c use combined route address arguments. The
address is a combination of a LID and a direct route path.
The LID specified is the DLID and the local LID is used
as the DrSLID.

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
smpquery portinfo 3 1 # portinfo by lid, with port modifier

smpquery -G switchinfo 0x2C9000100D051 1 # switchinfo by guid

smpquery -D nodeinfo 0 # nodeinfo by direct route

smpquery -c nodeinfo 6 0,12 # nodeinfo by combined route


SEE ALSO
smpdump(8)


AUTHOR
Hal Rosenstock <halr@voltaire.com>


OFED March 14, 2007 SMPQUERY(8)

<return-to-top>

 

VENDSTAT(8) OFED Diagnostics

NAME
vendstat - query InfiniBand vendor specific functions


SYNOPSIS
vendstat [-d(ebug)] [-G(uid)] [-N] [-w] [-i] [-c <num,num>] [-C ca_name] [-P ca_port] [-t(imeout) timeout_ms] [-V(ersion)] [-h(elp)] <lid|guid>


DESCRIPTION
vendstat uses vendor specific MADs to access beyond the IB spec vendor
specific functionality. Currently, there is support for Mellanox InfiniSwitch-III (IS3) and InfiniSwitch-IV (IS4).


OPTIONS
-N show IS3 general information.

-w show IS3 port xmit wait counters.

-i show IS4 counter group info.

-c <num,num>
configure IS4 counter groups.

Configure IS4 counter groups 0 and 1. Such configuration is not
persistent across IS4 reboot. First number is for counter group
0 and second is for counter group 1.

Group 0 counter config values:
0 - PortXmitDataSL0-7
1 - PortXmitDataSL8-15
2 - PortRcvDataSL0-7

Group 1 counter config values:
1 - PortXmitDataSL8-15
2 - PortRcvDataSL0-7
8 - PortRcvDataSL8-15


COMMON OPTIONS
Most OFED diagnostics take the following common flags. The exact list
of supported flags per utility can be found in the usage message and
can be shown using the util_name -h syntax.

# Debugging flags

-d raise the IB debugging level.
May be used several times (-ddd or -d -d -d).

-e show send and receive errors (timeouts and others)

-h show the usage message

-v increase the application verbosity level.
May be used several times (-vv or -v -v -v)

-V show the version info.

# Addressing flags

-G use GUID address argument. In most cases, it is the Port GUID.
Example:
"0x08f1040023"

-s <smlid> use ’smlid’ as the target lid for SM/SA queries.

# Other common flags:

-C <ca_name> use the specified ca_name.

-P <ca_port> use the specified ca_port.

-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected by
the following criteria:

1. the first port that is ACTIVE.

2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is attempted to
be fulfilled, and will fail if it is not possible.


EXAMPLES
vendstat -N 6 # read IS3 general information

vendstat -w 6 # read IS3 port xmit wait counters

vendstat -i 6 12 # read IS4 port 12 counter group info

vendstat -c 0,1 6 12 # configure IS4 port 12 counter groups for PortXmitDataSL

vendstat -c 2,8 6 12 # configure IS4 port 12 counter groups for PortRcvDataSL


AUTHOR
Hal Rosenstock    <halr@voltaire.com>


OFED April 16, 2009 VENDSTAT(8)

<return-to-top>
 


ib_limits - Infiniband verbs tests

Usage: ib_limits [options]

Options:
-m or --memory
    Direct ib_limits to test memory registration
-c or --cq
    Direct ib_limits to test CQ creation
-r or --resize_cq
    direct ib_limits to test CQ resize
-q or --qp
    Directs ib_limits to test QP creation
-v or --verbose
    Enable verbosity level to debug console.
-h or --help
    Display this usage info then exit.

<return-to-top>

 


cmtest - Connection Manager Tests

Usage: cmtest [options]

    Options:

 -s --server This option directs cmtest to act as a Server
 -l --local This option specifies the local endpoint.
 -r --remote This option specifies the remote endpoint LID as a hex integer 0x; see vstat command for active port LID hex integer.
 -c --connect This option specifies the number of connections to open. Default of 1.
 -m --msize This option specifies the byte size of each message. Default is 100 bytes.
 -n --nmsgs This option specifies the number of messages to send at a time.
 -p --permsg This option indicates if a separate buffer should be used per message. Default is one buffer for all messages.
 -i --iterate This option specifies the number of times to loop through 'nmsgs'. Default of 1.
 -v --verbose This option enables verbosity level to debug console.
 -h --help Display this usage info then exit.

<return-to-top>

 

InfiniBand Partition Management

The part_man.exe application allows creating, deleting and viewing existing host partitions.

Usage : part_man.exe <show|add|rem> <port_guid> <pkey1 pkey2 ...>

show - – shows existing partitions

Expected results after execution part_man.exe show

1.      Output has a format 

port_guid1   pkey1  pkey2  pkey3  pkey4  pkey5  pkey6  pkey7  pkey8

port_guid2   pkey1  pkey2  pkey3  pkey4  pkey5  pkey6  pkey7  pkey8

where port_guid is a port guid in hexadecimal format, pkey – values of partition key (in hex format) for this port.

Default partition key (0xFFFF) is not shown and can not be created by the part_man.exe.

 

add - create new partition(s) on specified port

port_guid  add   <port_guid>  <pkey1>   <pkey2>

creates new partition(s) on port specified by port_guid parameter (in hexadecimal format) and pkey – new partition key value in hexadecimal format (e.g. 0xABCD or ABCD).

Port guid is taken form vstat output and has a following format:

XXXX:XXXX:XXXX:XXXX.

Vstat prints node guid, so user has to add 1 to node guid value to obtain port guid. For example, if node guid is 0008:f104:0397:7ccc, port guid will be

0008:f104:0397:7ccd – for the first port,

0008:f104:0397:7cce – for the second port.

 

Expected results of execution part_man.exe add 0x0D99:9703:04f1:0800 0xABCD

1.      part_man.exe output ends up with …Done message.

2.      A new instance of a Network Adapter named “OpenFabrics IPoIB Adapter Partition” will appear in Device manager window. 
If the new adapter appears with yellow label, manual device driver installation is required.

3.      New adapter name ends with “Partition”, e.g. “OpenFabrics IPoIB Adapter Partition”.

 

rem – removes partition key on specified port.

part_man.exe rem <port_guid> <pkey1>  <pkey2>

Port_guid – in hexadecimal format (same as for add command), identifies port for operation.

Expected results after execution part_man rem <port_guid>  <pkey>

1.      Application prints …Done message.

2.      In device manager window IPoIB network adapter will disappear.

3.      Execution of  part_man.exe show will not show removed adapter.

 

<return-to-top>

 


PrintIP - print ip adapters and their addresses

PrintIP is used to print IP adapters and their addresses, or ARP (Address Resolution Protocol) and IP address.

Usage:
    printip <print_ips>
    printip <remoteip> <ip>        (example printip remoteip 10.10.2.20)

<return-to-top>

 



vstat - HCA Stats and Counters

Display HCA (Host channel Adapter) attributes.

Usage: vstat [-v] [-c]
          -v - verbose mode
          -c - HCA error/statistic counters

Includes Node GUID, Subnet Manager and port LIDs.

<return-to-top>

 

Subnet Management with OpenSM Rev: openib-1.2.0


A single running process (opensm.exe) is required to configure and thus make an Infiniband subnet useable.  For most cases, InfiniBand Subnet Management as a Windows service is sufficient to correctly configure most InfiniBand fabrics.

The Infiniband subnet management process (opensm) may exist on a Windows (WinOF) node or a Linux (OFED) node.

Limit the number of OpenSM processes per IB fabric; one SM is sufficient although redundant SMs are supported. You do not need a Subnet Manager per node/system.

OpenIB Subnet Management as a Windows Service

InfiniBand subnet management (OpenSM), as a Windows service, is installed by default, although it is NOT started by default. There are two ways to enable the InfiniBand Subnet Management service.

  1. Reset the installed OpenSM service "InfiniBand Subnet Management" to start automatically; From a command window type 'services.msc'.
    Locate the InfiniBand Subnet Management view and select the start option; additionally select the startup option 'Automatic' to start the OpenSM service on system startup.
     
  2. Install OpenSM as a 'running' Windows service:
    Select the OpenSM_service_Started install feature. Once the installation has completed, check the running InfiniBand Subnet Management service status via the Windows service manager (see #1).
     
  3. Consult the OpenSM log file @ %SystemRoot%\Temp\osm.log to see what OpenSM thinks is happening.

 

Manual InfiniBand Subnet Management from a command window

Usage: opensm.exe [options]

Options:

-c
--cache-options

Cache the given command line options into the file
/var/cache/osm/opensm.opts for use next invocation
The cache directory can be changed by the environment
variable OSM_CACHE_DIR

-g[=]<GUID in hex>
--guid[=]<GUID in hex>

This option specifies the local port GUID value with which OpenSM should bind. OpenSM may be
bound to 1 port at a time.  If GUID given is 0, OpenSM displays a list of possible port GUIDs and waits for user input. Without -g, OpenSM trys to use the default port.

-l <LMC>
--lmc <LMC>

This option specifies the subnet's LMC value.
The number of LIDs assigned to each port is 2^LMC.
The LMC value must be in the range 0-7.
LMC values > 0 allow multiple paths between ports.
LMC values > 0 should only be used if the subnet
topology actually provides multiple paths between
ports, i.e. multiple interconnects between switches.
Without -l, OpenSM defaults to LMC = 0, which allows
one path between any two ports.

-p <PRIORITY>
--priority <PRIORITY>

This option specifies the SM's PRIORITY.
This will effect the handover cases, where master
is chosen by priority and GUID.
-smkey <SM_Key>
This option specifies the SM's SM_Key (64 bits).
This will effect SM authentication.

-r
--reassign_lids


This option causes OpenSM to reassign LIDs to all end nodes. Specifying -r on a running subnet
may disrupt subnet traffic.  Without -r, OpenSM attempts to preserve existing LID assignments resolving multiple use of same LID.

-u
--updn

This option activate UPDN algorithm instead of Min Hop algorithm (default).

-a
--add_guid_file <path to file>

Set the root nodes for the Up/Down routing algorithm to the guids provided in the given file (one per line)

-o
--once

This option causes OpenSM to configure the subnet once, then exit. Ports remain in the ACTIVE state.

-s <interval>
--sweep <interval>

This option specifies the number of seconds between subnet sweeps. Specifying -s 0 disables sweeping.
Without -s, OpenSM defaults to a sweep interval of 10 seconds.

-t <milliseconds>
--timeout <milliseconds>

This option specifies the time in milliseconds
used for transaction timeouts.
Specifying -t 0 disables timeouts.
Without -t, OpenSM defaults to a timeout value of
200 milliseconds.

-maxsmps <number>

This option specifies the number of VL15 SMP MADs allowed on the wire at any one time.
Specifying -maxsmps 0 allows unlimited outstanding SMPs.
Without -maxsmps, OpenSM defaults to a maximum of one outstanding SMP.

-i <equalize-ignore-guids-file>
-ignore-guids <equalize-ignore-guids-file>

This option provides the means to define a set of ports (by guids) that will be ignored by the link load  equalization algorithm.

-x
--honor_guid2lid

This option forces OpenSM to honor the guid2lid file, when it comes out of Standby state, if such file exists under OSM_CACHE_DIR, and is valid. By default this is FALSE.

-f
--log_file

This option names the OpenSM log file. By default the log goes to %SystemRoot%\Temp\osm.log when started as
a Windows service. When OpenSM.exe is run from a command prompt, the default log file is created as '%TEMP%\osm.log'.
For the log to go to standard output use -f stdout.

-e
--erase_log_file

This option will cause deletion of the log file  (if it previously exists). By default, the log file is accumulative.

-y
--stay_on_fatal

This option will cause SM not to exit on fatal initialization issues: if SM discovers duplicated guids or 12x link with lane reversal badly configured. By default, the SM will exit on these errors.

-v
--verbose

This option increases the log verbosity level. The -v option may be specified multiple times to further increase the verbosity level.  See the -vf option for more information about. log verbosity.

-V

This option sets the maximum verbosity level and forces log flushing.
The -V is equivalent to '-vf 0xFF -d 2'. See the -vf option for more information about log verbosity.

-D <flags>

This option sets the log verbosity level.  A flags field must follow the -D option.
A bit set/clear in the flags enables/disables a specific log level as follows:
BIT LOG LEVEL ENABLED
---- -----------------
0x01 - ERROR (error messages)
0x02 - INFO (basic messages, low volume)
0x04 - VERBOSE (interesting stuff, moderate volume)
0x08 - DEBUG (diagnostic, high volume)
0x10 - FUNCS (function entry/exit, very high volume)
0x20 - FRAMES (dumps all SMP and GMP frames)
0x40 - ROUTING (dump FDB routing information)
0x80 - currently unused.
Without -D, OpenSM defaults to ERROR + INFO (0x3).
Specifying -D 0 disables all messages.
Specifying -D 0xFF enables all messages (see -V).
High verbosity levels may require increasing the transaction timeout with the -t option.

-d <number>
--debug <number>

This option specifies a debug option. These options are not normally needed. The number following -d selects the debug option to enable as follows:
OPT Description
--- -----------------
-d0 - Ignore other SM nodes
-d1 - Force single threaded dispatching
-d2 - Force log flushing after each log message
-d3 - Disable multicast support
-d4 - Put OpenSM in memory tracking mode
-d10 - Put OpenSM in testability mode
Without -d, no debug options are enabled

-h
--help

Display this usage info then exit.

-?

Display this usage info then exit.

<return-to-top>

 


Osmtest - Subnet Management Tests

Invoke open subnet management tests. osmtest currently can not run on the same HCA port which OpenSM is currently using.

 Usage: osmtest [options]

Options:

 -f <c|a|v|s|e|f|m|q|t>
--flow <c|a|v|s|e|f|m|q|t>

This option directs osmtest to run a specific flow:

FLOW DESCRIPTIONS
c = create an inventory file with all nodes, ports & paths.
a = run all validation tests (expecting an input inventory)
v = only validate the given inventory file.
s = run service registration, un-registration and lease.
e = run event forwarding test.
f = flood the SA with queries accoring to the stress mode.
m = multicast flow.
q = QoS info - VLArb and SLtoVL tables.
t = run trap 64/65 flow; requires running an external tool.
(default is all but QoS).

-w <trap_wait_time>
--wait <trap_wait_time>

This option specifies the wait time for trap 64/65 in seconds.
It is used only when running -f t - the trap 64/65 flow
(default to 10 sec).

-d <number>
--debug <number>

This option specifies a debug option. These options are not normally needed.
The number following -d selects the debug option to enable as follows:
OPT Description
--- -----------------
-d0 - Unused.
-d1 - Do not scan/compare path records.
-d2 - Force log flushing after each log message.
-d3 - Use mem tracking.
Without -d, no debug options are enabled.

-m <LID in hex>
--max_lid <LID in hex>

This option specifies the maximal LID number to be searched for during inventory file build (default to 100).

-g <GUID in hex>
--guid <GUID in hex>

This option specifies the local port GUID value with which osmtest should bind. osmtest may be bound to 1 port at a time. Without -g, osmtest displays a menu of possible port GUIDs and waits for user input.

-h
--help

Display this usage info then exit.

-i <filename>
--inventory <filename>

This option specifies the name of the inventory file. Normally, osmtest expects to find an inventory file, which osmtest uses to validate real-time information received from the SA during testing. If -i is not specified, osmtest defaults to the file 'osmtest.dat'.
See the -c option for related information.

-s
--stress

This option runs the specified stress test instead of the normal test suite.
Stress test options are as follows:
OPT Description
--- -----------------
-s1 - Single-MAD response SA queries .
-s2 - Multi-MAD (RMPP) response SA queries.
-s3 - Multi-MAD (RMPP) Path Record SA queries.
Without -s, stress testing is not performed.

-M
--Multicast_Mode

This option specify length of Multicast test :
OPT Description
--- -----------------
-M1 - Short Multicast Flow (default) - single mode.
-M2 - Short Multicast Flow - multiple mode.
-M3 - Long Multicast Flow - single mode.
-M4 - Long Multicast Flow - multiple mode.
Single mode - Osmtest is tested alone , with no other
apps that interact vs. OpenSM MC.
Multiple mode - Could be run with other apps using MC vs.
OpenSM. Without -M, default flow testing is performed.

-t <milliseconds>

This option specifies the time in milliseconds used for transaction timeouts.
Specifying -t 0 disables timeouts.
Without -t, osmtest defaults to a timeout value of 1 second.

-l
--log_file

This option defines the log to be the given file.
By default the log goes to stdout.

-v

This option increases the log verbosity level. The -v option may be specified multiple times
to further increase the verbosity level. See the -vf option for more information about log verbosity.

-V

This option sets the maximum verbosity level and forces log flushing.
The -V is equivalent to '-vf 0xFF -d 2'.
See the -vf option for more information about log verbosity.

-vf <flags>

This option sets the log verbosity level. A flags field must follow the -vf option.
A bit set/clear in the flags enables/disables a specific log level as follows:
BIT LOG LEVEL ENABLED
---- -----------------
0x01 - ERROR (error messages)
0x02 - INFO (basic messages, low volume)
0x04 - VERBOSE (interesting stuff, moderate volume)
0x08 - DEBUG (diagnostic, high volume)
0x10 - FUNCS (function entry/exit, very high volume)
0x20 - FRAMES (dumps all SMP and GMP frames)
0x40 - currently unused.
0x80 - currently unused.
Without -vf, osmtest defaults to ERROR + INFO (0x3).
Specifying -vf 0 disables all messages.
Specifying -vf 0xFF enables all messages (see -V).
High verbosity levels may require increasing
the transaction timeout with the -t option.

<return-to-top>

 



ibtrapgen - Generate Infiniband subnet management traps

Usage: ibtrapgen -t|--trap_num <TRAP_NUM> -n|--number <NUM_TRAP_CREATIONS>
                          -r|--rate <TRAP_RATE> -l|--lid <LIDADDR>
                          -s|--src_port <SOURCE_PORT> -p|--port_num <PORT_NUM>

Options: one of the following optional flows:

-t <TRAP_NUM>
--trap_num <TRAP_NUM>
          This option specifies the number of the trap to generate. Valid values are 128-131.
-n <NUM_TRAP_CREATIONS>
--number <NUM_TRAP_CREATIONS>
          This option specifies the number of times to generate this trap.
          If not specified - default to 1.
-r <TRAP_RATE>
--rate <TRAP_RATE>
          This option specifies the rate of the trap generation.
          What is the time period between one generation and another?
          The value is given in miliseconds.
          If the number of trap creations is 1 - this value is ignored.
-l <LIDADDR>
--lid <LIDADDR>
          This option specifies the lid address from where the trap should be generated.
-s <SOURCE_PORT>
--src_port <SOURCE_PORT>
          This option specifies the port number from which the trap should
          be generated. If trap number is 128 - this value is ignored (since
          trap 128 is not sent with a specific port number)
-p <port num>
--port_num <port num>
          This is the port number used for communicating with the SA.
-h
--help
          Display this usage info then exit.
-o
--out_log_file
          This option defines the log to be the given file.
          By default the log goes to stdout.
-v
          This option increases the log verbosity level.
          The -v option may be specified multiple times to further increase the verbosity level.
          See the -vf option for more information about log verbosity.
-V
          This option sets the maximum verbosity level and forces log flushing.
          The -V is equivalent to '-vf 0xFF -d 2'.
          See the -vf option for more information about. log verbosity.
-x <flags>
          This option sets the log verbosity level.
          A flags field must follow the -vf option.
          A bit set/clear in the flags enables/disables a
          specific log level as follows:

BIT LOG LEVEL ENABLED
---- -----------------
0x01 - ERROR (error messages)
0x02 - INFO (basic messages, low volume)
0x04 - VERBOSE (interesting stuff, moderate volume)
0x08 - DEBUG (diagnostic, high volume)
0x10 - FUNCS (function entry/exit, very high volume)
0x20 - FRAMES (dumps all SMP and GMP frames)
0x40 - currently unused.
0x80 - currently unused.
Without -x, ibtrapgen defaults to ERROR + INFO (0x3).
Specifying -x 0 disables all messages.
Specifying -x 0xFF enables all messages (see -V).

<return-to-top>

 

 

IPoIB - Internet Protocols over InfiniBand


IPoIB enables the use of Internet Protocol utilities (e.g., ftp, telnet) to function correctly over an Infiniband fabric. IPoIB is implemented as an NDIS Miniport driver with a WDM lower edge.

The IPoIB Network adapters are located via 'My Computer->Manage->Device Manager->Network adapters->IPoIB'.
'My Network Places->Properties' will display IPoIB Local Area Connection instances and should be used to configure IP addresses for the IPoIB interfaces; one Local Area Connection instance per HCA port. The IP (Internet Protocol) address bound to the IPoIB adapter instance can be assigned by DHCP or as a static IP addresses via
'My Network Places->Properties->Local Area Connection X->Properties->(General Tab)Internet Protocol(TCP/IP)->Properties'.

When the subnet manager (opensm) configures/sweeps the local Infiniband HCA, the Local Area Connection will become enabled. If you discover the Local Area Connection to be disabled, then likely your subnet manager (opensm) is not running or functioning correctly.

IPoIB Partition Management

<return-to-top>

 

 

Winsock Direct Service Provider


Winsock Direct (WSD) is Microsoft's proprietary protocol that predates SDP (Sockets Direct Protocol) for accelerating TCP/IP applications by using RDMA hardware. Microsoft had a significant role in defining the SDP protocol, hence SDP and WSD are remarkably similar, though unfortunately incompatible.

WSD is made up of two parts, the winsock direct switch and the winsock direct provider. The WSD switch is in the winsock DLL that ships in all editions of Windows Server 2003/2008, and is responsible for routing socket traffic over either the regular TCP/IP stack, or offload it to a WSD provider. The WSD provider is a hardware specific DLL that implements connection management and data transfers over particular RDMA hardware.

WinOF WSD is not supported in the Windows XP environment.

The WSD Protocol seamlessly transports TCP data using Infiniband data packets in 'buffered' mode or Infiniband RDMA in 'direct' mode. Either way the user mode socket application sees no behavioral difference in the standard Internet Protocol socket it created other than reduced data transfer times and increased bandwidth.

The Windows OpenFabrics release includes a WSD provider library that has been extensively tested with Microsoft Windows Server 2003.
During testing, bugs where found in the WSD switch that could lead to hangs, crashes, data corruption, and other unwanted behavior. Microsoft released a hotfix to address these issues which should be installed if using WSD; the Microsoft Windows Server 2003 hotfix can be found here.
Windows Server 2003 (R2) no longer requires this patch, nor does Windows Server 2008.
 

Environment variables can be used to change the behavior of the WSD provider:

IBWSD_NO_READ - Disables RDMA Read operations when set to any value. Note that this variable must be used consistently throughout the cluster or communication will fail.

IBWSD_POLL - Sets the number of times to poll the completion queue after processing completions in response to a CQ event. Reduces latency at the cost of CPU utilization. Default is 500.

IBWSD_SA_RETRY - Sets the number of times to retry SA query requests. Default is 4, can be increased if connection establishment fails.

IBWSD_SA_TIMEOUT - Sets the number of milliseconds to wait before retrying SA query requests. Default is 4, can be increased if connection establishment fails.

IBWSD_NO_IPOIB - SA query timeouts by default allow the connection to be established over IPoIB. Setting this environment variable to any value prevents fall back to IPoIB if SA queries time out.

IBWSD_DBG - Controls debug output when using a debug version of the WSD provider. Takes a hex value, with leading '0x', default value is '0x80000000'

 
0x00000001 DLL
0x00000002 socket info
0x00000004 initialization code
0x00000008 WQ related functions
0x00000010 Enpoints related functions
0x00000020 memory registration
0x00000040 CM (Connection Manager)
0x00000080 connections
0x00000200 socket options
0x00000400 network events
0x00000800 Hardware
0x00001000 Overlapped I/O request
0x00002000 Socket Duplication
0x00004000 Performance Monitoring
0x01000000 More verbose than IBSP_DBG_LEVEL3
0x02000000 More verbose than IBSP_DBG_LEVEL2
0x04000000 More verbose than IBSP_DBG_LEVEL1
0x08000000 Verbose output
0x20000000 Function enter/exit
0x40000000 Warnings
0x80000000 Errors


See https://wiki.openfabrics.org/tiki-index.php?page=Winsock+Direct for the latest WSD status.

Winsock Direct Service Provider Installation

WSD service is automatically installed and started as part of the 'default' installation; except on XP systems - WSD not supported.
Manual control is performed via the \Program Files\WinOF\installsp.exe utility.

usage: installsp [-i | -r | -l]

-i    Install the Winsock Direct (WSD) service provider
-r    Remove the WSD service provider
-r <name>    Remove the specified service provider
-l    List service providers
 

<return-to-top>

 

NetworkDirect Service Provider


NetworkDirect Service Provider Installation

ND service is automatically installed and started as part of the 'default' installation for Windows server 2008, Vista or HPC systems.
Manual control is performed via the %windir%\system32\ndinstall.exe utility.

usage: ndinstall [-l] [-i | -r [ServiceProvider]]

where ServiceProvider is 'ibal' or 'winverbs' or blank [blank implies the default Service Provider 'ibal']

-i <name>    Install (enable) the NetworkDirect (ND) Service Provider 'name'
-r <name>    Remove the specified Service Provider 'name'
-l    List all service providers; same as 'ndinstall' with no args.

The Microsoft Network Direct SDK can be downloaded from here.  Once the ND SDK is installed, ND test programs can be located @
%ProgramFiles%\Microsoft HPC Pack 2008 SDK\NetworkDirect\Bin\amd64\ as nd*.exe.

Known working ND test command invocations (loopback or remote host)

svr: ndrpingpong s IPoIB_IPv4_addr 4096 p1
cli: ndrpingpong c IPoIB_IPv4_addr 4096 p1

svr: ndpingpong s IPoIB_IPv4_addr 4096 b1
cli: ndpingpong c IPoIB_IPv4_addr 4096 b1

See ndping.exe /? for details.

<return-to-top>

 

Usermode Direct Access Transport and Direct Access Programming Libraries


The DAT (Direct Access Transport) API is a C programming interface developed by the DAT Collaborative in order provide a set of transport-independent, platform-independent Application Programming Interfaces that exploit the RDMA (remote direct memory access) capabilities of next-generation interconnect technologies such as InfiniBand, and iWARP.

WinOF DAT and DAPL are based on the 1.1 DAT specification. The DAPL (Direct Access Provider Library) which now fully supports Infiniband RDMA and IPoIB.

WinOF 1.0.1, and future WinOF releases, will include DAT/DAPL version 2.0 runtime libraries along with an optional v2.0 application build environment.
DAT 2.0 is configured with InfiniBand extensions enabled. The IB extensions include


How  DAT objects map to equivalent InfiniBand objects:
 
Interface Adapter (IA) HCA (Host Channel Adapter)
Protection Zone (PZ) PD (Protection Domain)
Local Memory Region (LMR) MR (Memory Region)
Remote Memory Region (RMR) MW (Memory Windows)
Event Dispatcher (EVD) CQ (Completion Queue)
Endpoint (EP) QP (Queue Pair)
Public Service Point (PSP) connection identifier
Reserved Service Point (RSP) connection identifier
Connection Request (CR) connection manager event


DAT ENVIRONMENT
:

DAT/DAPL v1.1 (free-build) runtime libraries are installed into %SystemRoot%, with the v1.1 Debug versions located in '%SystemDrive%\%ProgramFiles(x86)%\WinOF'.  Debug libraries are identified as datd.dll and dapld.dll.

IA32 (aka, 32-bit) versions of DAT/DAPL 1.1 runtime libraries, found only on 64-bit systems, are identified in '%SystemDrive%\%ProgramFiles(x86)%\WinOF' as dat32.dll and dapl32.dll.

DAT/DAPL 2.0 (free-build) libraries are identified in %SystemRoot% as dat2.dll and dapl2.dll.  Debug versions of the v2.0 runtime libraries are located in '%SystemDrive%\%ProgramFiles(x86)%\WinOF'.

IA32 (aka, 32-bit) versions of DAT/DAPL 2.0 runtime libraries, found only on 64-bit systems, are identified in '%SystemDrive%\%ProgramFiles(x86)%\WinOF' as dat232.dll and dapl232.dll.

In order for DAT/uDAPL programs to execute correctly, the runtime library files 'dat.dll and dapl.dll' must be present in one of the following folders: current directory, %SystemRoot% or in the library search path.

The default WinOF installation places the runtime library files dat.dll and dapl.dll in the '%SystemRoot%' folder; symbol files (.pdb) are located in '%SystemDrive%\%ProgramFiles(x86)%\WinOF'.

The default DAPL configuration file is defined as '%SystemDrive%\DAT\dat.conf'. This default specification can be overriden by use of the environment variable DAT_OVERRIDE; see following environment variable discussion.

Within the dat.conf file, the DAPL library specification can be located as the 5th whitespace separated line argument. By default the DAPL library file is installed as '%SystemRoot%\dapl.dll'.

Should you choose to relocated the DAPL library file to a path where whitespace appears in the full library path specification, then the full library file specification must be contained within double-quotes. A side effect of the double-quotes is the library specification is treated as a Windows string which implies the '\' (backslash character) is treated as an 'escape' character.  Hence all backslashes in the library path must be duplicated when enclosed in double-quotes (e.g., "C:\\Programs Files\\WinOF\\dapl.dll").

A sample InfiniBand dat.conf file is installed as '\Program Files\WinOF\dat.conf'.  If dat.conf does not exist in the DAT default configuration folder '%SystemDrive%\DAT\', dat.conf will be copied there.
 

DAPL Providers

DAT 2.0 (free-build) libraries utilize the following user application selectable DAPL providers. Each DAPL provider represents an RDMA hardware interface device type and it's Connection Manager.
DAPL providers are listed in the file '%SystemDrive%\DAT\dat.conf'.
The dat.conf InfiniBand DAPL provider names are formatted 'ibnic-HCA#-DAPL_Version-CM_type'.
Example:
    ibnic0v2 - InfiniBand HCA #zero, DAPL version 2.0, (default CM is IBAL).
    ibnic1v2-scm - InfiniBand HCA #one, DAPL version 2.0, CM is 'socket-CM'
    ibnic0v2-cma - InfiniBand HCA #zero, DAPL version 2.0, CM is 'rdma-CM'
    ibnic0-scm   - InfiniBand HCA #zero, DAPL version 1.1, CM is 'IBAL'

Each non-comment line in the dat.conf file describes a DAPL provider interface.
The 2nd to the last field on the right (7th from the left) describes the ia_device_params (Interface Adapter Device Parameters) (aka, RDMA device) in accordance with the specific DAPL provider specified in the 5th field.

 

DAT application build environment:

DAT library header files are selectively installed in the DAT default configuration folder as
'%SystemDrive%\DAT\v1-1' or '%SystemDrive%\DAT\v2-0'. Your C language based DAT 1.1 application compilation command line should include'/I%SystemDrive%\DAT\v1-1' with C code referencing '#include <DAT\udat.h>'.

The 'default' DAT/DAPL C language calling convention is '__stdcall', not the 'normal' Visual Studio C compiler default. __stdcall was chosen as MS recommended it to be more efficient. An application can freely mix default C compiler linkages '__cdecl' and '__stdcall'.

Visual Studio 2005 command window - (nmake) Makefile Fragments:

DAT_PATH=%SystemDrive%\DAT\v1-1
CC = cl
INC_FLAGS = /I $(DAT_PATH)

CC_FLAGS= /nologo /Gy /W3 /Gm- /GR- /GF /O2 /Oi /Oy- /D_CRT_SECURE_NO_WARNINGS \
            /D_WIN64 /D_X64_ /D_AMD64_ $(INC_FLAGS)

LINK = link
LIBS = ws2_32.lib advapi32.lib User32.lib bufferoverflowU.lib dat.lib

LINK_FLAGS = /nologo /subsystem:console /machine:X64 /libpath:$(DAT_PATH) $(LIBS)


When linking a DEBUG/Checked version make sure to use datd.lib or dat2d.lib for DAT v2.0.

DAT library environment variables:

DAT_OVERRIDE
------------
Value used as the static registry configuration file, overriding the
default location, 'C:\DAT\dat.conf'.

Example: set DAT_OVERRIDE=%SystemDrive%\path\to\my\private.conf


DAT_DBG_LEVEL
-------------

Value specifies which parts of the registry will print debugging
information, valid values are 

DAT_OS_DBG_TYPE_ERROR        = 0x1
DAT_OS_DBG_TYPE_GENERIC      = 0x2
DAT_OS_DBG_TYPE_SR           = 0x4
DAT_OS_DBG_TYPE_DR           = 0x8
DAT_OS_DBG_TYPE_PROVIDER_API = 0x10
DAT_OS_DBG_TYPE_CONSUMER_API = 0x20
DAT_OS_DBG_TYPE_ALL          = 0xff

or any combination of these. For example you can use 0xC to get both 
static and dynamic registry output.

Example set DAT_DBG_LEVEL=0xC

DAT_DBG_DEST
------------ 

Value sets the output destination, valid values are 

DAT_OS_DBG_DEST_STDOUT = 0x1
DAT_OS_DBG_DEST_SYSLOG = 0x2 
DAT_OS_DBG_DEST_ALL    = 0x3 

For example, 0x3 will output to both stdout and the syslog. 

DAPL Provider library environment variables

DAPL_DBG_TYPE
-------------

Value specifies which parts of the registry will print debugging information, valid values are

DAPL_DBG_TYPE_ERR          = 0x0001
DAPL_DBG_TYPE_WARN         = 0x0002
DAPL_DBG_TYPE_EVD          = 0x0004
DAPL_DBG_TYPE_CM           = 0x0008
DAPL_DBG_TYPE_EP           = 0x0010
DAPL_DBG_TYPE_UTIL         = 0x0020
DAPL_DBG_TYPE_CALLBACK     = 0x0040
DAPL_DBG_TYPE_DTO_COMP_ERR = 0x0080
DAPL_DBG_TYPE_API          = 0x0100
DAPL_DBG_TYPE_RTN          = 0x0200
DAPL_DBG_TYPE_EXCEPTION    = 0x0400

or any combination of these. For example you can use 0xC to get both
EVD and CM output.

Example set DAPL_DBG_TYPE=0xC


DAPL_DBG_DEST
-------------

Value sets the output destination, valid values are

DAPL_DBG_DEST_STDOUT = 0x1
DAPL_DBG_DEST_SYSLOG = 0x2
DAPL_DBG_DEST_ALL    = 0x3

For example, 0x3 will output to both stdout and the syslog.


<return-to-top>


DAPLTEST


    dapltest - test for the Direct Access Provider Library (DAPL)

DESCRIPTION

    Dapltest is a set of tests developed to exercise, characterize,
    and verify the DAPL interfaces during development and porting.
    At least two instantiations of the test must be run.  One acts
    as the server, fielding requests and spawning server-side test
    threads as needed.  Other client invocations connect to the
    server and issue test requests.

    The server side of the test, once invoked, listens continuously
    for client connection requests, until quit or killed.  Upon
    receipt of a connection request, the connection is established,
    the server and client sides swap version numbers to verify that
    they are able to communicate, and the client sends the test
    request to the server.  If the version numbers match, and the
    test request is well-formed, the server spawns the threads
    needed to run the test before awaiting further connections.

USAGE

    dapltest [ -f script_file_name ]
             [ -T S|Q|T|P|L ] [ -D device_name ] [ -d ] [ -R HT|LL|EC|PM|BE ]

    With no arguments, dapltest runs as a server using default values,
    and loops accepting requests from clients.  The -f option allows
    all arguments to be placed in a file, to ease test automation.
    The following arguments are common to all tests:

    [ -T S|Q|T|P|L ]    Test function to be performed:
                            S   - server loop
                            Q   - quit, client requests that server
                                  wait for any outstanding tests to
                                  complete, then clean up and exit
                            T   - transaction test, transfers data between 
                                  client and server
                            P   - performance test, times DTO operations
                            L   - limit test, exhausts various resources,
                                  runs in client w/o server interaction
                        Default: S

    [ -D device_name ]  Specifies the name of the device (interface adapter).
                        Default: host-specific, look for DT_MdepDeviceName
                                 in dapl_mdep.h

    [ -d ]              Enables extra debug verbosity, primarily tracing
			of the various DAPL operations as they progress.
			Repeating this parameter increases debug spew.
			Errors encountered result in the test spewing some
			explanatory text and stopping; this flag provides
			more detail about what lead up to the error.
                        Default: zero

    [ -R BE ]           Indicate the quality of service (QoS) desired.
                        Choices are:
                            HT  - high throughput
                            LL  - low latency
                            EC  - economy (neither HT nor LL)
                            PM  - premium
                            BE  - best effort
                        Default: BE

USAGE - Quit test client

    dapltest [Common_Args] [ -s server_name ]

    Quit testing (-T Q) connects to the server to ask it to clean up and
    exit (after it waits for any outstanding test runs to complete).
    In addition to being more polite than simply killing the server,
    this test exercises the DAPL object teardown code paths.
    There is only one argument other than those supported by all tests:

    -s server_name      Specifies the name of the server interface.
                        No default.


USAGE - Transaction test client

    dapltest [Common_Args] [ -s server_name ]
             [ -t threads ] [ -w endpoints ] [ -i iterations ] [ -Q ] 
             [ -V ] [ -P ] OPclient OPserver [ op3, 

    Transaction testing (-T T) transfers a variable amount of data between 
    client and server.  The data transfer can be described as a sequence of 
    individual operations; that entire sequence is transferred 'iterations' 
    times by each thread over all of its endpoint(s).

    The following parameters determine the behavior of the transaction test:

    -s server_name      Specifies the hostname of the dapltest server.
                        No default.

    [ -t threads ]      Specify the number of threads to be used.
                        Default: 1

    [ -w endpoints ]    Specify the number of connected endpoints per thread.
                        Default: 1

    [ -i iterations ]   Specify the number of times the entire sequence
                        of data transfers will be made over each endpoint.
                        Default: 1000

    [ -Q ]              Funnel completion events into a CNO.
			Default: use EVDs

    [ -V ]              Validate the data being transferred.
			Default: ignore the data

    [ -P ]		Turn on DTO completion polling
			Default: off

    OP1 OP2 [ OP3, ... ]
                        A single transaction (OPx) consists of:

                        server|client   Indicates who initiates the
                                        data transfer.

                        SR|RR|RW        Indicates the type of transfer:
                                        SR  send/recv
                                        RR  RDMA read
                                        RW  RDMA write
                        Defaults: none

                        [ seg_size [ num_segs ] ]
                                        Indicates the amount and format
                                        of the data to be transferred.
                                        Default:  4096  1
                                                  (i.e., 1 4KB buffer)

                        [ -f ]          For SR transfers only, indicates
                                        that a client's send transfer
                                        completion should be reaped when
                                        the next recv completion is reaped.
					Sends and receives must be paired
					(one client, one server, and in that
					order) for this option to be used.

    Restrictions:  
    
    Due to the flow control algorithm used by the transaction test, there 
    must be at least one SR OP for both the client and the server.  

    Requesting data validation (-V) causes the test to automatically append 
    three OPs to those specified. These additional operations provide 
    synchronization points during each iteration, at which all user-specified 
    transaction buffers are checked. These three appended operations satisfy 
    the "one SR in each direction" requirement.

    The transaction OP list is printed out if -d is supplied.

USAGE - Performance test client

    dapltest [Common_Args] -s server_name [ -m p|b ]
             [ -i iterations ] [ -p pipeline ] OP

    Performance testing (-T P) times the transfer of an operation.
    The operation is posted 'iterations' times.

    The following parameters determine the behavior of the transaction test:

    -s server_name      Specifies the hostname of the dapltest server.
                        No default.

    -m b|p		Used to choose either blocking (b) or polling (p)
                        Default: blocking (b)

    [ -i iterations ]   Specify the number of times the entire sequence
                        of data transfers will be made over each endpoint.
                        Default: 1000

    [ -p pipeline ]     Specify the pipline length, valid arguments are in 
                        the range [0,MAX_SEND_DTOS]. If a value greater than 
                        MAX_SEND_DTOS is requested the value will be
                        adjusted down to MAX_SEND_DTOS.
                        Default: MAX_SEND_DTOS

    OP
                        An operation consists of:

                        RR|RW           Indicates the type of transfer:
                                        RR  RDMA read
                                        RW  RDMA write
                        Default: none

                        [ seg_size [ num_segs ] ]
                                        Indicates the amount and format
                                        of the data to be transferred.
                                        Default:  4096  1
                                                  (i.e., 1 4KB buffer)

USAGE - Limit test client

    Limit testing (-T L) neither requires nor connects to any server
    instance.  The client runs one or more tests which attempt to
    exhaust various resources to determine DAPL limits and exercise
    DAPL error paths.  If no arguments are given, all tests are run.

    Limit testing creates the sequence of DAT objects needed to
    move data back and forth, attempting to find the limits supported
    for the DAPL object requested.  For example, if the LMR creation
    limit is being examined, the test will create a set of
    {IA, PZ, CNO, EVD, EP} before trying to run dat_lmr_create() to
    failure using that set of DAPL objects.  The 'width' parameter
    can be used to control how many of these parallel DAPL object
    sets are created before beating upon the requested constructor.
    Use of -m limits the number of dat_*_create() calls that will
    be attempted, which can be helpful if the DAPL in use supports
    essentailly unlimited numbers of some objects.

    The limit test arguments are:

    [ -m maximum ]      Specify the maximum number of dapl_*_create()
                        attempts.
                        Default: run to object creation failure

    [ -w width ]        Specify the number of DAPL object sets to
                        create while initializing.
                        Default: 1

    [ limit_ia ]        Attempt to exhaust dat_ia_open()

    [ limit_pz ]        Attempt to exhaust dat_pz_create()

    [ limit_cno ]       Attempt to exhaust dat_cno_create()

    [ limit_evd ]       Attempt to exhaust dat_evd_create()

    [ limit_ep ]        Attempt to exhaust dat_ep_create()

    [ limit_rsp ]       Attempt to exhaust dat_rsp_create()

    [ limit_psp ]       Attempt to exhaust dat_psp_create()

    [ limit_lmr ]       Attempt to exhaust dat_lmr_create(4KB)

    [ limit_rpost ]     Attempt to exhaust dat_ep_post_recv(4KB)

    [ limit_size_lmr ]  Probe maximum size dat_lmr_create()

                        Default: run all tests


EXAMPLES

    dapltest -T S -d -D ibnic0

                        Starts a local dapltest server process with debug verbosity.
                        Server loops (listen for dapltest request, process request).
    
    dapltest -T T -d -s winIB -D ibnic0 -i 100 client SR 4096 2 server SR 4096 2

                        Runs a transaction test, with both sides
                        sending one buffer with two 4KB segments,
                        one hundred times; dapltest server is on host winIB.

    dapltest -T P -d -s winIB -D ibnic0 -i 100 RW 4096 2

                        Runs a performance test, with the client 
                        RDMA writing one buffer with two 4KB segments,
                        one hundred times.

    dapltest -T Q -s winIB -D ibnic0

                        Asks the dapltest server at host 'winIB' to clean up and exit.

    dapltest -T L -D ibnic0 -d -w 16 -m 1000

                        Runs all of the limit tests, setting up
                        16 complete sets of DAPL objects, and
                        creating at most a thousand instances
                        when trying to exhaust resources.

    dapltest -T T -V -d -t 2 -w 4 -i 55555 -s winIB -D ibnic0 \
       client RW  4096 1    server RW  2048 4    \
       client SR  1024 4    server SR  4096 2    \
       client SR  1024 3 -f server SR  2048 1 -f

                        Runs a more complicated transaction test,
                        with two thread using four EPs each,
                        sending a more complicated buffer pattern
                        for a larger number of iterations,
                        validating the data received.
dt-svr.bat - DAPLtest server script; starts a DAPL2test.exe server on the local node.
	dt-svr DAPL-provider [-D [hex-debug-bitmask] ]
where: DAPL-provider can be one of [ ibal | scm | cma ]
  • ibal - Original InfiniBand Access Layer (eye-bal) ibal verbs interface
  • scm - Socket-CM (Connection Manager), exchanges QP information over a n IP socket.
  • cma - rdma CM, use the OFED rdma Communications Manager to create the QP connection.
  • or the DAPL-provider name from %SystemDrive%\DAT\dat.conf
dt-cli.bat - DAPLtest client; drives testing by interacting with dt-svr.bat script.
	dt-cli DAPL-provider host-IPv4-address testname [-D [hex-debug-bitmask] ]
		example: dt-cli ibnic0v2 10.10.2.20 trans
		         dt-cli -h  # outputs help text.
			 dt-svr ibnic0v2	# IBAL on HCA0
Verify dt-*.bat script is running same dapltest.exe(v1.1) or dapl2test.exe(v2.0)


BUGS  (and To Do List)

    Use of CNOs (-Q) is not yet supported.

    Further limit tests could be added.

<return-to-top>

 

 

SRP (SCSI RDMA) Protocol Driver


The SCSI RDMA Protocol  (SRP) is an emerging industry standard protocol for utilizing block storage devices over an InfiniBand™ fabric. SRP is being defined in the ANSI T-10 committee.

WinOF SRP is a storage driver implementation that enables the SCSI RDMA protocol over an InfiniBand fabric.
The implementation conforms to the T10 Working Group draft http://www.t10.org/ftp/t10/drafts/srp/srp-r16a.pdf.

Software Dependencies

The SRP driver depends on the installation of the WinOF stack with a Subnet
Manager running somewhere on the IB fabric.

- Supported Operating Systems and Service Packs:
   o Windows XP SP3 x86 & x64
   o Windows Server 2008/Vista  (x86, x64)
   o Windows Server 2008 HPC (x64)
   o Windows Server 2003 SP2/R2 (x86, x64, IA64)

Testing Levels

The SRP driver has undergone basic testing against Mellanox Technologies' SRP Targets MTD1000 and MTD2000.
Additionally the Linux OFED 1.4 SRP target has been tested.
Testing included SRP target drive format, read, write and dismount/offline operations.
 

Installation

The WinOF installer does not install the SRP driver as part of a default installation.  If the SRP feature is selected in the custom features installation view, an InfiniBand SRP Miniport driver will be installed; see the device manager view under SCSI and RAID controllers.

The system device 'InfiniBand I/O Unit' (IOU) device is required for correct SRP operation.  The WinOF installer will install and load the IOU driver if the SRP feature is selected.  See the device manager view System Devices --> InfiniBand I/O Unit for conformation of correct IOU driver loading.

In order for the SRP miniport driver installation to complete, an SRP target must be detected by a Subnet Manager running somewhere on the InfiniBand fabric; either a local or remote Subnet Manager works.

SRP Driver Uninstall

If the SRP (SCSI RDMA Protocol) driver has been previously installed, then in order to achieve a 'clean' uninstall, the SRP target drive(s) must be released.  Unfortunately the 'offline disk' command is only valid for diskpart (ver 6.0.6001) which is not distributed with Windows Server 2003 or XP.

The consequences of not releasing the SRP target drive(s) are that after the WinOF uninstall reboot there are lingering InfiniBand driver files. These driver files remain because while the SRP target is active they have references, thus when the WinOF uninstall attempts to delete the files the operation fails.

SRP supports WPP tracing tools by using the GUID: '5AF07B3C-D119-4233-9C81-C07EF481CBE6'.  The flags and level of debug can be controlled at load-time or run-time; see ib_srp.inf file for details.

<return-to-top>

 

QLogic VNIC Configuration


The QLogic VNIC (Virtual Network Interface Card) driver in conjunction with the QLogic Ethernet Virtual I/O Controller (EVIC) provides virtual Ethernet interfaces and transport for Ethernet packets over Infiniband.

Users can modify NIC parameters through User Interface icon in Network Connections:
( Properties->"Configure..." button -> "Advanced" Tab).

Parameters available:

Vlan Id (802.1Q) 

  values from 0 to 4094 ( default 0, disabled )
  This specifies if VLAN ID-marked packet transmission is enabled and, if so, specifies the ID.

Priority (802.1P)

  values from 0 to 7 ( default 0, feature disabled)
  This specifies if priority-marked packet transmission is enabled.

Payload MTU size 

  values from 1500 to 9500 (default 1500)
  This specifies the maximum transfer unit size in 100 bytes increments.

Recv ChkSum offload 

  (default enabled)
  This specifies if IP protocols checksum calculations for receive is offloaded.

Send ChkSum offload

  (default enabled)
  This specifies if IP protocols checksum calculations for send is offloaded.
 

Secondary Path 

   (default disabled)
   Enabled - If more than one IB path to IOC exist then secondary IB instance of virtual port will be created and configured with the same parameters as primary one. Failover from Primary to Secondary IB path is transparent for user application sending data through associated NIC.

   Disabled – only one path at a time is allowed. If more than one path to IOC exists then failed path will be destroyed and next available path will be used for new connection. With this scenario there is a possibility new interface instance will be assigned different MAC address when other hosts compete for EVIC resources.
 

LBFO Bundle Id
   (default disabled) Enabling support for OS provided Load Balancing and Fail Over functionality on adapter level.
   If enabled group ID can be selected from predefined names.

 

Heartbeat interval

   configures interval for VNIC protocol heartbeat messages in milliseconds.
   0 – heartbeats disabled.

Note:
   To take advantage of the features supported by these options, ensure that the Ethernet gateway is also configured appropriately.  For example, if the Payload MTU for a VNIC interface is set to 4000, the MTU at the EVIC module must also be set at least 4000 for the setting to take effect.

 <return-to-top>

 

QLogic VNIC Child Device Management


Each I/O Controller (IOC) of QLogic's EVIC gateway device is able to handle 256 connections per host. So a single host can have multiple VNIC interfaces connecting to the same IOC. So qlgcvnic_config can be used to create multiple VNIC interfaces by giving local channel adapter node guid and target ioc guid parameters as input.

Usage:--

To create child vnic devices

qlgcvnic_config -c {caguid}  {iocguid}  {instanceid}  {interface description}

caguid -- Local HCA node guid value in hex format (may start with "0x")
iocguid -- Target IOC's guid vale in hex format (may start with "0x")
instanceid -- InstanceID is used to distinguish between different child devices created by IBAL. So this must be a unique value. InstanceID is a 32bit value. User input should be in decimal format.
interface description -- Description that should be shown in device manager's device tree for the child device.

Listing Channel Adapter to IOC paths

Executing qlgcvnic_config without any option or with -l option will list the IOCs reachable from the host.

 <return-to-top>

 

InfiniBand Software Development Kit


If selected during a WinOF install, the IB Software Development Kit will be installed as '%SystemDrive%\IBSDK'. Underneath the IBSDK\ folder you will find an include folder 'Inc\',  library definition files 'Lib\'  along with a 'Samples' folder.

Compilation:

Add the additional include path '%SystemDrive%\IBSDK\Inc'; resource files may also use this path.

Linking:

Add the additional library search path '%SystemDrive%\IBSDK\Lib'.

Include dependent libraries: ibal.lib and complib.lib, or ibal32.lib & complib32.lib for win32 applications on 64-bit platforms.

Samples:

 

<return-to-top>

 

WinVerbs


WinVerbs is a userspace verbs and communication management interface optimized
for the Windows operating system. Its lower interface is designed to support
any RDMA based device, including Infiniband and future RDMA devices. Its upper interface is
capable of providing a low latency verbs interface, plus supports Microsoft's
NetworkDirect Interface, DAPL and OFED components: libibverbs, libibmad, rdma_cm interfaces and numerous OFED IB diagnostic tools.

The WinVerbs driver loads as an upper filter driver for Infiniband HCA devices.
(Open source iWarp drivers for Windows are not yet available.) A corresponding
userspace library installs as part of the Winverbs driver installation package.
Additionally, a Windows port of the OFED libibverbs library and several test
programs are also included.

As of the WinOF 2.1 release, Winverbs and Winmad are are fully integrated into the HCA driver stack load.
That's to say, Winverbs and Winmad are now integral components of the WinOF stack.

Available libibverbs test programs and their usage are listed
below. Note that not all listed options apply to all applications

ibv_rc_pingpong, ibv_uc_pingpong, ibv_ud_pingpong
no args start a server and wait for connection
-h <host>     connect to server at <host>
-p <port>     listen on/connect to port <port> (default 18515)
-d <dev>     use IB device <dev> (default first device found)
-i <port>      use port <port> of IB device (default 1)
-s <size>      size of message to exchange (default 4096)
-m <size>     path MTU (default 1024)
-r <dep>      number of receives to post at a time (default 500)
-n <iters>     number of exchanges (default 1000)
-l <sl>          service level value
-e                 sleep on CQ events (default poll)

ibv_send_bw, ibv_send_lat
ibv_read_bw, ibv_read_lat
ibv_write_bw, ibv_write_lat
no args start a server and wait for connection
-h <host>              connect to server at <host>
-p <port>              listen on/connect to port <port> (default 18515)
-d <dev>               use IB device <dev> (default first device found)
-i <port>               use port <port> of IB device (default 1)
-c <RC/UC/UD>  connection type RC/UC/UD (default RC)
-m <mtu>              mtu size (256 - 4096. default for hermon is 2048)
-s <size>               size of message to exchange (default 65536)
-a                          Run sizes from 2 till 2^23
-t <dep>                size of tx queue (default 300)
-g                          send messages to multicast group (UD only)
-r <dep>                make rx queue bigger than tx (default 600)
-n <iters>               number of exchanges (at least 2, default 1000)
-I <size>                max size of message to be sent in inline mode (default 400)
-b                          measure bidirectional bandwidth (default unidirectional)
-V                         display version number
-e                          sleep on CQ events (default poll)
-N                         cancel peak-bw calculation (default with peak-bw)

To verify correct WinVerbs and libibverbs installation, run ibstat or ibv_devinfo. It
should report all RDMA devices in the system, along with limited port
attributes. Because of limitations in the WinOF stack in comparision to the Linux OFED stack, it is normal for the programs to
list several values as unknown.

<return-to-top>