Using BBCP

bbcp is a point-to-point network file copy application written by Andy Hanushevsky at SLAC as a tool for the BaBar collaboration. It is capable of transferring files at approaching line speeds in the WAN.

This document gives some tips and details of experience that are intended to supplement the official bbcp description/documentation, which can be found at: http://www.slac.stanford.edu/~abh/bbcp/

(Please see footnote about root and System Managers.)

Installation

A very nice feature of bbcp is the ease with which it can be installed and used. Installation basically involves placing the bbcp executable in your path on all the systems you want to use it on. All standard methods of authentication can be used: passwords and certificates. The latter are most convenient in most situations.

Versions of bbcp are available for Linux and Solaris.

bbcp is a peer-to-peer application. No server process is required - you just invoke bbcp on a source machine and in response a bbcp process is started on the target machine. You can also do this as a third party: the source and target machines do not need to be the same machine that you initiate the file transfer from.

The version of bbcp we have been using is 05.10.19.01.0. The tool is available here: http://www.slac.stanford.edu/~abh/bbcp/bbcp.tar.Z

After unpacking the tar file, we advise rebuilding the executable from the source, on each different Linux platform you install it on.

Checking your TCP Parameters

Before using bbcp, it is worthwhile checking that the TCP/IP stack parameters that are set on your sender and receiver computers are suitable for high speed transfer.

Issue a "cat /proc/sys/net/ipv4/tcp_rmem" command, and verify that the numbers you see are large, for example:

1610612736      1610612736      1610612736

(and not small, like 4360)

Here are some typical settings that will allow high throughput in the WAN:

### IPV4 specific settings
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_sack = 0

# on systems with a VERY fast bus -> memory interface this is the big gainer
net.ipv4.tcp_rmem = 536870912 536870912 536870912
net.ipv4.tcp_wmem = 536870912 536870912 536870912
net.ipv4.tcp_mem = 536870912 536870912 536870912 

### CORE settings (mostly for socket and UDP effect)
net.core.rmem_max = 536870912 
net.core.wmem_max = 536870912
net.core.rmem_default = 536870912
net.core.wmem_default = 536870912
net.core.optmem_max = 536870912 
net.core.netdev_max_backlog = 1000000 

In RHEL Linux, you can put these in /etc/sysctl.conf, or run sysctl from the command line (you'll need to be root).

Note that, with default large windows like these, *every* user who starts a TCP connection will get a connection that is allocated the full window size, which might not be what you want on a multi-user system.

There are many Web sites which describe how to tune TCP/IP, so we do not cover that here.

First Tests

Check the bbcp installation on a local machine by simply transferring data from "/dev/zero" to "/dev/null":

bbcp -P 2 /dev/zero localhost:/dev/null

The "-P 2" argument tells bbcp to give you a progress report every 2 seconds. The output should look something like this:

[uldemo@nw1 ~]$ bbcp -P 2 /dev/zero localhost:/dev/null
bbcp: Creating /dev/null/zero
bbcp: At 051219 10:33:33 copy 0% complete; 941216.6 KB/s
bbcp: At 051219 10:33:35 copy 0% complete; 942343.6 KB/s
bbcp: At 051219 10:33:37 copy 0% complete; 945524.1 KB/s
bbcp: At 051219 10:33:39 copy 0% complete; 950338.8 KB/s

Now try between the local machine and another machine with bbcp, checking that bbcp is able to set a big TCP window:

[uldemo@nw1 ~]$ bbcp -P 2 -v -w 2M /dev/zero v10chi.datatag.org:/dev/null
bbcp: nw1.caltech.edu kernel using a send window size of 4194368 not 2097184

In the above, we have asked for a TCP window size of 2MBytes (the "-w 2M" argument), and bbcp has successfully allocated two buffers each of 2097184 Bytes, reported as a result of the "-v" (verbose) argument.
 

Checks To Make

Check Interfaces

First, make sure that the IP addresses you are using correspond to the network adapters you want to use. Often, machines are instrumented with a "management" interface, which is a low-speed 10/100 or 1Gbit device, and a separate, faster interface which is the one you want to use for file transfer. You can see which interfaces are on the machine by using the ifconfig command:

[uldemo@nw1 ~]$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:FC:00:04:28 
inet addr:140.221.214.67 Bcast:140.221.214.95 Mask:255.255.255.224
inet6 addr: fe80::20c:fcff:fe00:428/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:50000 
RX bytes:0 (0.0 b) TX bytes:1156 (1.1 KiB)
Interrupt:32 Base address:0x8000 

eth1 Link encap:Ethernet HWaddr 00:09:3D:00:99:A7 
inet addr:131.215.207.33 Bcast:131.215.207.255 Mask:255.255.255.0
inet6 addr: fe80::209:3dff:fe00:99a7/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:277640 errors:0 dropped:0 overruns:0 frame:0
TX packets:76929 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:21962494 (20.9 MiB) TX bytes:81542088 (77.7 MiB)
Interrupt:25 

lo Link encap:Local Loopback 
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1063412 errors:0 dropped:0 overruns:0 frame:0
TX packets:1063412 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0 
RX bytes:11907813241 (11.0 GiB) TX bytes:11907813241 (11.0 GiB)

In the example above, eth0 is the fast 10Gbit interface (with a 9000 Byte MTU), and eth1 is the management interface. Ensure that your fast adapters have 9000 MTU or more. A 1500 Byte MTU will siginficantly restrict the maximum speed file transfers will run at, due the high interrupt load on the servers (an interrupt is generated for every frame of incoming data, so at high data rates, the interrupt rate can overwhelm the system).

Check Routes

To ensure your file transfer will travel out of the correct interface, and to the correct interface on the target machine, use the traceroute command. If changes to the routing have been happening recently, or you want to be sure you have the latest route changes:

$ sysctl -w net.ipv4.route.flush=1

Sometimes you will need to configure a route manually, and these should be added to the file

/etc/sysconfig/network-scripts/route-ethX

for the future (X should be replaced by the index of the adapter in question).

If In Doubt: Reboot!

If things look strange, reboot. This old trick is as fresh today as it was in the days of the ENIAC.

Transferring Files/Data

Calculating The Window Size

When choosing the window size for the -w option of bbcp, we have found that the best value is substantially less than that suggested by the usual Bandwidth*Delay product. To get the Bandwidth*Delay product, use the "ping" command between the machines you are transferring files to and from. Multiply the ping time (the Round Trip Time) by the capacity of the link. For example, if the ping time is 100milliseconds and the network is a 10Gbits/sec infrastructure:

Window Size = 10Gbits/sec * 120msecs = 120 MBytes

This is the number you would use for e.g. the iperf tool. But for bbcp, use roughly a half of this

Window Size (bbcp) = 0.5 * Window Size (iperf)

and then experiment ...

Avoiding name lookups

When working with temporary network setups (for demonstrations, or on experimental netwoks), it is often the case that the high speed adapter's IP address number has no DNS entry. By default, like most network applications, bbcp converts numbered IP addresses specified on the command line to a name using DNS, and then back to a number again. You don't necessarily want that. To avoid it, use the "-n" option.

Forcing File Creation

You may want to copy your file to the target machine regardless of whether it already exists on the target. bbcp by default will not copy a file that already exists on the target: use the "-f" option to force the copy regardless.

Avoiding the Free Space Check

Sometimes you need to send files to a remote device that does not report correctly the amount of free space it has available for new files. An example is a ramfs filesystem. By default, bbcp will check the remote device for sufficient space available, before starting to send the file to it. For ramfs and similar devices, insufficient space will appear to be available, and the bbcp transfer will not start. To avoid this, and force the transfer anyway, use the "-F" option.

Multiple Streams

bbcp uses four TCP/IP streams by default - the data from the file you are transferring is shared between the streams. This can give a big performance improvement. bbcp uses some clever algorithms to adjust the rate that each stream sends at, throttling and accelerating streams according to the prevailing achieved rates in the transfer.

Sometimes you want more than four streams, sometimes less. We have found that, in the WAN, it is often better to have only one or two streams, and in the LAN it is better to have 16 or 32. For example, to specify 16 streams, use the "-s 16" argument. Note that, the more streams bbcp uses, the more system memory is required on both the source and target servers ... and with large TCP/IP windows this can quickly become a problem in the WAN. Thus, even if you would like to try 32 streams over a 10Gbit network with a 100millseconds RTT, you may well find your servers have insufficient memory.

Sending Multiple Files

When testing high speed links, it is sometimes difficult to find a datafile that is big enough: the transfer happens so rapidly that you have no time to establish a benchmark or make checks of system performance while it is occuring. In more usual situations, you may have several files that need to be transferred across the network, and you don't want to initiate a new bbcp process for each one, but would rather have bbcp do them all, one after the other, unattended.

These needs are satisfied by the "-I" argument. You create a textfile in which you place the name of each file requiring transfer, one line per file. Then you specify the name of that textfile as an argument to -I, thus:

$ bbcp -P 2 -I filelist remote:/target_dir
 

Example

The graph below shows some bbcp transfer rates between a 10Gbit server at Caltech, and a similar server across the WAN at the StarLight facility in Chicago. You can see the effects of altering the number of streams, and the send window size. You will notice that the maximum rate of around 800 MBytes/sec was achieved with 6 streams and a 256MByte window (pink curve), although this was not stable. The 4 stream transfer (blue curve) was stable, and could have continued indefinitely at ~500MBytes/sec.

 

Fastest bbcp Rate?

In the LAN, we have achieved bbcp rates of >980MBytes/sec between two IBM servers equipped with Intel 10Gbit cards. This is essentially line rate.

Footnote:

To do some of the things described here, you will need to be root.  Beware! System managers hate giving users root access. Simple: get them to do the work instead. If they try to fob you off with security concerns, you need to get yourself a new system manager. Computers and networks are there to be used, not policed to extinction :-)