Careful analysis of your environment, both from the client and from the serverpoint of view, is the first step necessary for optimal NFS performance. Thefirst sections will address issues that are generally important to the client.Later (Section 5.3 and beyond), server side issues will be discussed. In bothcases, these issues will not be limited exclusively to one side or the other,but it is useful to separate the two in order to get a clearer picture ofcause and effect.
Aside from the general network configuration - appropriate network capacity,faster NICs, full duplex settings in order to reduce collisions, agreement innetwork speed among the switches and hubs, etc. - one of the most importantclient optimization settings are the NFS data transfer buffer sizes, specifiedby the mount command options rsizeand wsize.
5.1. Setting Block Size to Optimize Transfer SpeedsAuto Mount Nfs
The following is a sample map file: $ cat /etc/auto.misc payroll -fstype=nfs personnel:/dev/hda3 sales -fstype=ext3:/dev/hda4. The first column in a map file indicates the autofs mount point (sales and payroll from the server called personnel). Linux Mint: Setup autofs to mount automatically NFS-shares from a Synology Objective I installed Linux Mint 17.1 Rebecca on a new computer and I want to access the NFS shares from my Synology Disk Station.
- Now /media/nfs is the dir that will contain your NFS shares (you dont have to create that, autofs does that for you) and /etc/auto.nfs is the configuration file for your shares. We will make that now.
- The automount utility can mount and unmount NFS file systems automatically (on-demand mounting), therefore saving system resources. It can be used to mount other file systems including AFS, SMBFS, CIFS, and local file systems. Autofs uses /etc/auto.master (master map) as its default primary configuration file.
The mount command options rsize and wsize specify the size of the chunks ofdata that the client and server pass back and forth to each other. If no rsizeand wsize options are specified, the default varies by which version of NFS weare using. The most common default is 4K (4096 bytes), although for TCP-basedmounts in 2.2 kernels, and for all mounts beginning with 2.4 kernels, theserver specifies the default block size.
The theoretical limit for the NFS V2 protocol is 8K. For the V3 protocol, thelimit is specific to the server. On the Linux server, the maximum block sizeis defined by the value of the kernel constant NFSSVC_MAXBLKSIZE, found in theLinux kernel source file ./include/linux/nfsd/const.h.The current maximum block size for the kernel, as of 2.4.17, is 8K (8192 bytes),but the patch set implementing NFS over TCP/IP transport in the 2.4 series, as of this writing, uses a value of 32K (defined in thepatch as 32*1024) for the maximum block size.
All 2.4 clients currently support up to 32K block transfer sizes, allowing thestandard 32K block transfers across NFS mounts from other servers, such asSolaris, without client modification.
The defaults may be too big or too small, depending on the specificcombination of hardware and kernels. On the one hand, some combinations ofLinux kernels and network cards (largely on older machines) cannot handleblocks that large. On the other hand, if they can handle larger blocks, abigger size might be faster.
You will want to experiment and find an rsizeand wsize that works and is asfast as possible. You can test the speed of your options with some simplecommands, if your network environment is not heavily used. Note that yourresults may vary widely unless you resort to using more complex benchmarks,such as Bonnie, Bonnie++,or IOzone.
The first of these commands transfers 16384 blocks of 16k each from thespecial file /dev/zero (which if you read it just spits out zeros reallyfast) to the mounted partition. We will time it to see how long it takes. So,from the client machine, type:
This creates a 256Mb file of zeroed bytes. In general, you should create afile that's at least twice as large as the system RAM on the server, but makesure you have enough disk space! Then read back the file into the great blackhole on the client machine (/dev/null) by typing the following:
Repeat this a few times and average how long it takes. Be sure to unmount andremount the filesystem each time (both on the client and, if you are zealous,locally on the server as well), which should clear out any caches.
Then unmount, and mount again with a larger and smaller block size. Theyshould be multiples of 1024, and not larger than the maximum block sizeallowed by your system. Note that NFS Version 2 is limited to a maximum of 8K,regardless of the maximum block size defined by NFSSVC_MAXBLKSIZE; Version 3will support up to 64K, if permitted. The block size should be a power of twosince most of the parameters that would constrain it (such as file systemblock sizes and network packet size) are also powers of two. However, someusers have reported better successes with block sizes that are not powers oftwo but are still multiples of the file system block size and the networkpacket size.
Directly after mounting with a larger size, cd into the mountedfile system and do things like ls, explore the filesystem a bit to make sure everything is as itshould. If the rsize/wsize is too large the symptoms are very odd and not 100%obvious. A typical symptom is incomplete file lists when doing ls, and noerror messages, or reading files failing mysteriously with no error messages.After establishing that the given rsize/wsize works you can do the speed testsagain. Different server platforms are likely to have different optimal sizes.
Remember to edit /etc/fstab to reflect the rsize/wsize you foundto be the most desirable.
Certificate design – templates 2 0 – templates collection. If your results seem inconsistent, or doubtful, you may need to analyze yournetwork more extensively while varying the rsizeand wsize values. In thatcase, here are several pointers to benchmarks that may prove useful:
- Bonnie http://www.textuality.com/bonnie/ Can a mac run windows programs.
- Bonnie++ http://www.coker.com.au/bonnie++/
- IOzone file system benchmark http://www.iozone.org/
- The official NFS benchmark, SPECsfs97 http://www.spec.org/osg/sfs97/
The easiest benchmark with the widest coverage, including an extensive spreadof file sizes, and of IO types - reads, & writes, rereads & rewrites, randomaccess, etc. - seems to be IOzone. A recommended invocation of IOzone (forwhich you must have root privileges) includes unmounting and remounting thedirectory under test, in order to clear out the caches between tests, andincluding the file close time in the measurements. Assuming you've alreadyexported /tmp to everyone from the serverfoo, and that you've installed IOzone in the local directory, this should work:
The benchmark should take 2-3 hours at most, but of course you will need torun it for each value of rsize and wsize that is of interest. The web sitegives full documentation of the parameters, but the specific options usedabove are:
-a Full automatic mode, which tests file sizes of 64K to 512M, using record sizes of 4K to 16M -R Generate report in excel spreadsheet form (The 'surface plot' option for graphs is best) -c Include the file close time in the tests, which will pick up the NFS version 3 commit time -U Use the given mount point to unmount and remount between tests; it clears out caches -f When using unmount, you have to locate the test file in the mounted file system
While many Linux network card drivers are excellent, some are quite shoddy, including a few drivers for some fairly standard cards. It is worth experimenting with your network card directly to find out how it can best handle traffic.
Try pinging back and forth between the two machines with large packets using the -f and -s options with ping (see ping(8) for more details) and see if a lot of packets get dropped, or if they take a long time for a reply. If so, you may have a problem with the performance of your network card.
For a more extensive analysis of NFS behavior in particular, use the nfsstat command to look at nfs transactions, client and server statistics, networkstatistics, and so forth. The '-o net' option will show you the number ofdropped packets in relation to the total number of transactions. In UDPtransactions, the most important statistic is the number of retransmissions,due to dropped packets, socket buffer overflows, general server congestion,timeouts, etc. This will have a tremendously important effect on NFSperformance, and should be carefully monitored. Note that nfsstat does not yetimplement the -z option, which would zero out all counters, so you must lookat the current nfsstat counter values prior to running the benchmarks.
To correct network problems, you may wish to reconfigure the packet size thatyour network card uses. Very often there is a constraint somewhere else in thenetwork (such as a router) that causes a smaller maximum packet size betweentwo machines than what the network cards on the machines are actually capableof. TCP should autodiscover the appropriate packet size for a network, but UDPwill simply stay at a default value. So determining the appropriate packetsize is especially important if you are using NFS over UDP.
You can test for the network packet size using the tracepath command: Fromthe client machine, just type tracepathserver2049 and the path MTU shouldbe reported at the bottom. You can then set the MTU on your network card equalto the path MTU, by using the MTU option to ifconfig, and see if fewer packetsget dropped. See the ifconfig man pages for details on how to reset the MTU.
In addition, netstat -s will give the statistics collected for traffic acrossall supported protocols. You may also look at /proc/net/snmp for informationabout current network behavior; see the next section for more details.
5.3. Overflow of Fragmented PacketsUsing an rsize or wsize larger than your network's MTU (often set to 1500, inmany networks) will cause IP packet fragmentation when using NFS over UDP. IPpacket fragmentation and reassembly require a significant amount of CPUresource at both ends of a network connection. In addition, packetfragmentation also exposes your network traffic to greater unreliability,since a complete RPC request must be retransmitted if a UDP packet fragment isdropped for any reason. Any increase of RPC retransmissions, along with thepossibility of increased timeouts, are the single worst impediment toperformance for NFS over UDP.
Packets may be dropped for many reasons. If your network topography iscomplex, fragment routes may differ, and may not all arrive at the Server forreassembly. NFS Server capacity may also be an issue, since the kernel has alimit of how many fragments it can buffer before it starts throwing awaypackets. With kernels that support the /proc filesystem, you can monitor thefiles /proc/sys/net/ipv4/ipfrag_high_thresh and/proc/sys/net/ipv4/ipfrag_low_thresh. Once the number of unprocessed,fragmented packets reaches the number specified by ipfrag_high_thresh (inbytes), the kernel will simply start throwing away fragmented packets untilthe number of incomplete packets reaches the number specified byipfrag_low_thresh.
Another counter to monitor is IP: ReasmFails in the file /proc/net/snmp; thisis the number of fragment reassembly failures. if it goes up too quicklyduring heavy file activity, you may have problem.
5.4. NFS over TCP![Nfs Nfs](https://8gwifi.org/docs/img/autofs.png)
A new feature, available for both 2.4 and 2.5 kernels but not yetintegrated into the mainstream kernel at the time of this writing, is NFS over TCP. Using TCPhas a distinct advantage and a distinct disadvantage over UDP. The advantageis that it works far better than UDP on lossy networks.When using TCP, a single dropped packet can be retransmitted, withoutthe retransmission of the entire RPC request, resulting in better performanceon lossy networks. In addition, TCP will handle network speed differencesbetter than UDP, due to the underlying flow control at the network level.
The disadvantage of using TCP is that it is not a stateless protocol likeUDP. If your server crashes in the middle of a packet transmission,the client will hang and any shares will need to be unmounted and remounted.
The overhead incurred by the TCP protocol will result insomewhat slower performance than UDP under ideal networkconditions, but the cost is not severe, and is often notnoticable without careful measurement. If you are using gigabit ethernet from end to end, you might also investigate theusage of jumbo frames, since the high speed network mayallow the larger frame sizes without encountering increased collision rates, particularly if you have set the network to full duplex.
5.5. Timeout and Retransmission ValuesTwo mount command options, timeo and retrans, control the behavior of UDPrequests when encountering client timeouts due to dropped packets, networkcongestion, and so forth. The -o timeo option allows designation of the lengthof time, in tenths of seconds, that the client will wait until it decides itwill not get a reply from the server, and must try to send the request again.The default value is 7 tenths of a second. The -o retrans option allowsdesignation of the number of timeouts allowed before the client gives up, anddisplays the Server not responding message. The default value is 3 attempts.Once the client displays this message, it will continue to try to sendthe request, but only once before displaying the error message ifanother timeout occurs. When the client reestablishes contact, it will fall back to using the correct retrans value, and will display the Server OK message.
If you are already encountering excessive retransmissions (see the output ofthe nfsstat command), or want to increase the block transfer size withoutencountering timeouts and retransmissions, you may want to adjust thesevalues. The specific adjustment will depend upon your environment, and in mostcases, the current defaults are appropriate.
5.6. Number of Instances of the NFSD Server DaemonMost startup scripts, Linux and otherwise, start 8 instances of nfsd. In theearly days of NFS, Sun decided on this number as a rule of thumb, and everyoneelse copied. There are no good measures of how many instances are optimal, buta more heavily-trafficked server may require more. You should use at the very least one daemon per processor, butfour to eight per processor may be a better rule of thumb.If you are using a 2.4 orhigher kernel and you want to see how heavily each nfsd thread is being used,you can look at the file /proc/net/rpc/nfsd.The last ten numbers on the thline in that file indicate the number of seconds that the thread usage was atthat percentage of the maximum allowable. If you have a large number in thetop three deciles, you may wish to increase the number of nfsd instances. Thisis done upon starting nfsd using thenumber of instances as the command lineoption, and is specified in the NFS startup script (/etc/rc.d/init.d/nfs onRed Hat) as RPCNFSDCOUNT. See the nfsd(8) man page for more information.
5.7. Memory Limits on the Input QueueOn 2.2 and 2.4 kernels, the socket input queue, where requests sit while theyare currently being processed, has a small default size limit (rmem_default)of 64k. This queue is important for clients with heavy read loads, and serverswith heavy write loads. As an example, if you are running 8 instances of nfsdon the server, each will only have 8k to store write requests while itprocesses them. In addition, the socket output queue - important for clientswith heavy write loads and servers with heavy read loads - also has a smalldefault size (wmem_default).
Autofs Nfs Mount Options
Several published runs of the NFS benchmark SPECsfs specify usage of a much higher value for boththe read and write value sets, [rw]mem_default and [rw]mem_max. You mightconsider increasing these values to at least 256k. The read and write limitsare set in the proc file system using (for example) the files/proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_max. Thermem_default value can be increased in three steps; the following method is abit of a hack but should work and should not cause any problems:
- Increase the size listed in the file:
- Restart NFS. For example, on Red Hat systems,
- You might return the size limits to their normal size in case other kernel systems depend on it:
This last step may be necessary because machines have been reported to crash if these values are left changed for long periods of time.
5.8. Turning Off Autonegotiation of NICs and HubsIf network cards auto-negotiate badly with hubs and switches, and ports run atdifferent speeds, or with different duplex configurations, performance will beseverely impacted due to excessive collisions, dropped packets, etc. If yousee excessive numbers of dropped packets in the nfsstat output, or poornetwork performance in general, try playing around with the network speed andduplex settings. If possible, concentrate on establishing a 100BaseT fullduplex subnet; the virtual elimination of collisions in full duplex willremove the most severe performance inhibitor for NFS over UDP. Be carefulwhen turning off autonegotiation on a card: The hub or switch that the cardis attached to will then resort to other mechanisms (such as parallel detection)to determine the duplex settings, and some cards default to half duplexbecause it is more likely to be supported by an old hub. The best solution, if the driver supports it, is to force the card to negotiate 100BaseT full duplex.
5.9. Synchronous vs. Asynchronous Behavior in NFSThe default export behavior for both NFS Version 2 and Version 3 protocols,used by exportfs in nfs-utils versions prior to Version 1.11 (the latter is in the CVS tree, but not yet released in a package, as of January, 2002) is'asynchronous'. This default permits the server to reply to client requests assoon as it has processed the request and handed it off to the local filesystem, without waiting for the data to be written to stable storage. This isindicated by the async option denoted in the server's export list. It yieldsbetter performance at the cost of possible data corruption if the serverreboots while still holding unwritten data and/or metadata in its caches. Thispossible data corruption is not detectable at the time of occurrence, sincethe async option instructs the server to lie to the client, telling theclient that all data has indeed been written to the stable storage, regardlessof the protocol used.
In order to conform with 'synchronous' behavior, used as the default for mostproprietary systems supporting NFS (Solaris, HP-UX, RS/6000, etc.), and nowused as the default in the latest version of exportfs, the Linux Server'sfile system must be exported with the sync option. Note that specifyingsynchronous exports will result in no option being seen in the server's exportlist:
- Export a couple file systems to everyone, using slightly different options:
- Now we can see what the exported file system parameters look like:
If your kernel is compiled with the /proc filesystem,then the file /proc/fs/nfs/exports will also show thefull list of export options.
When synchronous behavior is specified, the server will not complete (that is,reply to the client) an NFS version 2 protocol request until the local filesystem has written all data/metadata to the disk. The serverwill complete asynchronous NFS version 3 request without this delay, and will return thestatus of the data in order to inform the client as to what data should bemaintained in its caches, and what data is safe to discard. There are 3possible status values, defined an enumerated type, nfs3_stable_how, ininclude/linux/nfs.h. The values, along with the subsequent actions taken dueto these results, are as follows:
Automount Nfs Share
- NFS_UNSTABLE - Data/Metadata was not committed to stable storage on theserver, and must be cached on the client until a subsequent client commitrequest assures that the server does send data to stable storage.
- NFS_DATA_SYNC - Metadata was not sent to stable storage, and must be cachedon the client. A subsequent commit is necessary, as is required above.
- NFS_FILE_SYNC - No data/metadata need be cached, and a subsequent commitneed not be sent for the range covered by this request.
In addition to the above definition of synchronous behavior, the client mayexplicitly insist on total synchronous behavior, regardless of the protocol,by opening all files with the O_SYNC option. In this case, all replies toclient requests will wait until the data has hit the server's disk, regardlessof the protocol used (meaning that, in NFS version 3, all requests will beNFS_FILE_SYNC requests, and will require that the Server returns this status).In that case, the performance of NFS Version 2 and NFS Version 3 will bevirtually identical.
If, however, the old default async behavior is used, the O_SYNC option hasno effect at all in either version of NFS, since the server will reply to theclient without waiting for the write to complete. In that case the performancedifferences between versions will also disappear.
Finally, note that, for NFS version 3 protocol requests, a subsequent commitrequest from the NFS client at file close time, or at fsync() time, will forcethe server to write any previously unwritten data/metadata to the disk, andthe server will not reply to the client until this has been completed, as longas sync behavior is followed. If async is used, the commit is essentiallya no-op, since the server once again lies to the client, telling the client thatthe data has been sent to stable storage. This again exposes the client andserver to data corruption, since cached data may be discarded on the clientdue to its belief that the server now has the data maintained in stablestorage.
5.10. Non-NFS-Related Means of Enhancing Server PerformanceIn general, server performance and server disk access speed will have animportant effect on NFS performance.Offering general guidelines for setting up a well-functioning file server isoutside the scope of this document, but a few hints may be worth mentioning:
Autofs Nfs
- If you have access to RAID arrays, use RAID 1/0 for both write speed andredundancy; RAID 5 gives you good read speeds but lousy write speeds.
- A journalling filesystem will drastically reduce your reboot time in theevent of a system crash. Currently,ext3 will work correctly with NFSversion 3. In addition, Reiserfs version 3.6 will work with NFS version 3 on2.4.7 or later kernels (patches are available for previous kernels). Earlier versionsof Reiserfs did not include room for generation numbers in the inode, exposingthe possibility of undetected data corruption during a server reboot.
- Additionally, journalled file systems can be configured to maximizeperformance by taking advantage of the fact that journal updates are all thatis necessary for data protection. One example is using ext3 with
data=journalso that all updates go first to the journal, and later to the main filesystem. Once the journal has been updated, the NFS server can safely issue thereply to the clients, and the main file system update can occur at theserver's leisure. The journal in a journalling file system may also reside on a separate devicesuch as a flash memory card so that journal updates normally require no seeking. With only rotationaldelay imposing a cost, this gives reasonably good synchronous IO performance.Note that ext3 currently supports journal relocation, and ReiserFS will(officially) support it soon. The Reiserfs tool package found at ftp://ftp.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.x.0k.tar.gz containsthereiserfstune tool, which will allow journal relocation. It does, however,require a kernel patch which has not yet been officially released as ofJanuary, 2002. - Using an automounter (such as
autofs or amd) may prevent hangs if youcross-mount files on your machines (whether on purpose or by oversight) andone of those machines goes down. See the Automount Mini-HOWTO for details. - Some manufacturers (Network Appliance, Hewlett Packard, and others) provide NFSaccelerators in the form of Non-Volatile RAM. NVRAM will boost access speed tostable storage up to the equivalent of
async access.