Introduction

On Linux systems, contrary to most other 'Workstation-type' operating systems, there is no single company or consortium enforcing 'the standard way(TM)' of organizing the file system layout and how to configure system services (there are always attempts to implement that, though). So there is no simple way for programs like CPMD to provide a one-size-fits-all configuration (this is already quite difficult for some of the more 'controlled' operating systems). This is made even more complicated by the fact that there are several (commercial) Fortran compilers available for Linux and the fact that these compilers are mostly mutually incompatible. Since the default Linux Fortran compiler, the GNU g77 compiler, is not sufficient to compile CPMD, but most ready-to-use precompiled libraries are configured for the GNU g77 compiler, compiling CPMD under Linux can become pretty tricky (especially when you want to run CPMD in parallel).

flame the author This webpage tries to collect pieces of information and helpful hints for those poor suckers, who - like me - have to deal with compiling CPMD on Linux machines. If you notice any incorrect statements or have some additional tips, that think should appear here, please contact me at axel.kohlmeyer@theochem.ruhr-uni-bochum.de.


Disclaimer:
The information and files on this webpage are provided in the hope that they will be useful but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Sourcecode is available on request.


Linux Fortran Compilers for CPMD

CPMD makes heavy use of the 'Cray Pointer' extension to Fortran 77 for dynamical memory management. Also for some platforms, or when the Gromos/Amber QM/MM interface is used, a Fortran 90/95 compiler is required. The g77 compiler, part of the GNU Compiler Collection and the default Fortran compiler on Linux machines, is therefore not sufficient to compile CPMD. The same is (currently) true for the G95 compiler which is still in heavy development.

The most popular alternatives (in no particular order) are the Intel Fortran Compiler, the Portland Group Fortran Compiler, the Absoft Fortran Compiler, and the Lahey/Fujitsu Fortran Compiler. For the x86_64 (Athlon64/Opteron) platform there also is the Pathscale Compiler. All of these compilers are commercial, license managed compilers. For most of them, however, one can get a trial or evaluation license, so that you can check, if it works before you have to pay the (quite sizeable) license/subscription fee(s). For the Intel Fortran Compiler you can also get a license for the Non-Commercial Unsupported Version free of charge. This version is identical to the commercial version, but adds the restriction, that it is a non-transferable personal license and you are not allowed to sell the compiled executables. As of version 8.1 it also supports the EM64T instruction set, which makes it also usable for the x86_64 (Athlon64/Opteron) platform.

A nice overview about compiling and running Fortran programs under Linux can be found at http://www.nikhef.nl/~templon/fortran.html. A nice paper about how to get the most out of your compiler is at http://www.fortran-2000.com/ArnaudRecipes/CompilerTricks.html.

Up: back to top

Linux Configuration Files

As stated above, there are many different Linux installations, so that compiling CPMD for your local machines can become a tricky hurdle to cross, before you can actually start to use CPMD. The default CPMD distribution provides a selection of configurations, that - most of the time - have to be adapted to the local configuration. Starting with CPMD version 3.9.x new configurations can be added without changing the Configure script. You only need to add files with the proper definitions to the CONFIGURE subdirectory. The following additional configurations adapted for our local installation are available for download, and may help to get you started.

download file:BOCHUM-ATHLON:

This is a configuration to compile a serial CPMD executable for an Athlon processor Linux machine with the latest Intel Fortran Compilers (8.1-020).

download file:BOCHUM-P4

This is a configuration to compile a serial CPMD executable for a Pentium 4 or Xeon processor Linux machine with the latest Intel Fortran Compiler (8.1-020).

download file:BOCHUM-GIGA:

This is a configuration to compile a parallel CPMD executable for a cluster of Athlon processor Linux machine connected via Ethernet with the Intel Fortran Compiler and using a LAM/MPI installation, that was compiled for a different fortran compiler (PGI in this case).

download file:BOCHUM-AMD64-PGI:

This is a configuration to compile a serial CPMD executable for an Opteron/Athlon64/AthlonFX processor Linux machine with the Portland Group Fortran Compiler.

download file:BOCHUM-AMD64-IFORT:

This is a configuration to compile a serial CPMD executable for an Opteron/Athlon64/AthlonFX/Xeon+EM64T processor Linux machine with the Intel Fortran Compiler EM64T Edition.

download file:BOCHUM-SCALI

This is a configuration to compile a parallel CPMD executable for a cluster of Athlon processor Linux machine connected via SCI/Dolphin high speed interconnect using the Scali ScaMPI library. Note how the MPI libraries are linked dynamically (as required), but the compiler runtime is linked statically, so that there is no need to install the compiler runtime on all cluster nodes.

Up: back to top

Optimized LAPACK/BLAS/ATLAS Library Binaries

These are unified LAPACK and BLAS libraries based on the ATLAS library and the LAPACK/BLAS sources from netlib that should give close to optimal CPMD performance on the platforms they were tuned for.

Special care has been taken, that all of the libraries here do not require any other library to be installed (besides the ubiquitous libc and libm), especially not the Fortran compiler runtime libraries. All routines, that need access to the Fortran compiler runtime were replaced with Fortran compatible counterparts written in plain C. As a consequence they should be compatible to all currently available Linux Fortran compilers.

To use them simply copy (or symlink) them to your compilation directory under the name libatlas.a and use '-L. -latlas' as linker flags (and delete all other linker flags related to lapack, blas, mkl etc.). The libraries are also available as RPM packages that will emulate a 'normal' BLAS/LAPACK/ATLAS installation.

Please drop me a note at axel.kohlmeyer@theochem.ruhr-uni-bochum.de if you want to be notified (via email) in case i update the libraries (which is rather infrequently).

Recent changes:

added ACML fast math lib hack on 2004/12/17.

Repackaged the binaries for RPM based distributions with several compatibility symlinks and additional documentation packages 2004/11/02.

Added a multi-threaded ATLAS-3.6.0 for dual Athlon and Opteron on 2004/08/13.

Updated to ATLAS-3.6.0 on 2004/06/02. The new libraries also feature an enlarged internal buffer for BLAS Level 3 calls to gain maximum CPMD performance at the cost of an slightly increased memory footprint (up to 25MB on 32-bit platforms and up to 50MB on 64-bit platforms). Performance gain for CPMD jobs over the old (3.4.1) libraries is on average about 3 percent.

download file: libatlas_x86_64.a  (21.694 MByte):
ATLAS version 3.6.0, LAPACK3.0-20011027, tuned for AMD Opteron 2.0GHz with PC400 DDR memory and for AMD Athlon64 3000+ 2.0GHz with PC333 DDR memory Compiled on SuSE 9.0. Tested on SuSE 9.0, RedHat EL-3. Should be compatible to all AMD Athlon64, AthlonFX and Opteron CPUs (64-bit) CPUs. May also work on Intel Xeon processors with EM64T instruction set (please tell me, if it does).
A small additional speedup can be gained by using the 'fast math libray' distributed with ACML as of version 2.5 by copying or symlinking libacml_mv.a to the compilation directory and adding '-L. -lacml_mv' directly after the linking ATLAS.

Please note: that for CPMD there are on average no significant speed gains by using a version tuned for only a single x86_64 cpu type. Alternatively you may want to try out the ACML library by AMD, but you need at least ACML version 2.0 to get correct results with CPMD.

download file: libatlas_x86_64_mt.a  (21.983 MByte):

ATLAS version 3.6.0, LAPACK3.0-20011027, 2-cpu multi-threaded version tuned for Dual-AMD Opteron 2.0GHz with PC400 DDR memory. Compiled on SuSE 9.0. Tested on SuSE 9.0. For use in an MPI/OpenMP hybrid parallel CPMD compilation.
NOTES:
- this library is independent from the use of OpenMP when compiling CPMD and vice versa, but only the combination will give the maximum speed gain.
- for CPMD the speed gain from Multithreading is usually inferior to the MPI parallelization. So unless you use a very large number of nodes, there is no point in using this library and/or OpenMP.
- The additional speed gain of using this library instead of the single threaded one, will be between 0 and 30%, depending on the size of the system (mostly 5-10%).

download file: libatlas_athlon.a  (18.775 MByte):

ATLAS version 3.6.0, LAPACK3.0-20011027, tuned for AMD Athlon XP 1600+ with PC133 memory, Dual AMD Athlon MP 1600+ with PC266 ECC-DDR memory, and AMD Athlon XP 2500+ with PC333 DDR memory, Compiled on Redhat 7.1. Tested on RedHat 7.x/8.0/9, SuSE 9.0, Mandrake 10.0. Compatible to all AMD Athlon and Opteron CPUs (32-bit).
Please note: on AMD Athlon (and Opteron) CPUs the timings depend a bit on memory alignment and fragmentation and therefore are not 100% reproducable between tuning runs. For this library the results of several tuning runs were compared and combined to yield a library that should give close to optimal performance for all Athlon platforms and no suboptimal performance on any of them.

download file: libatlas_athlon_mt.a  (18.980 MByte):

ATLAS version 3.6.0, LAPACK3.0-20011027, 2-cpu multi-threaded version tuned for Dual AMD Athlon MP 1600+ with PC266 ECC-DDR memory, Compiled on Redhat 7.1. Tested on RedHat 7.x/8.0/9, SuSE 9.0, Mandrake 10.0. Compatible to all AMD Athlon MP and Opteron CPUs (32-bit).
NOTES:
- this library is independent from the use of OpenMP when compiling CPMD and vice versa, but only the combination will give the maximum speed gain.
- for CPMD the speed gain from Multithreading is usually inferior to the MPI parallelization. So unless you use a very large number of nodes, there is no point in using this library and/or OpenMP.
- The additional speed gain of using this library instead of the single threaded one, will be between 0 and 30%, depending on the size of the system (mostly 5-10%).

download file: libatlas_p4.a  (14.783 MByte):

ATLAS version 3.6.0, LAPACK3.0-20011027, tuned for 2.4GHz Intel Pentium 4 Xeon/512kB-L2 with PC200 DDR memory. Compiled on Redhat 7.3. Tested on RedHat 7.x/8.0/9, SuSE 9.0. Compatible to all Intel Pentium 4 CPUs.

download file: libatlas_p3.a  (12.880 MByte):

ATLAS version 3.6.0, LAPACK3.0-20011027, tuned for 600MHz Intel Pentium III with PC100 memory. Compiled on Redhat 7.1. Tested on RedHat 7.x/8.0/9. Compatible to all Intel Pentium III CPUs.

download file: libatlas_p2.a  (12.804 MByte):

ATLAS version 3.4.1, LAPACK3.0-20011027, tuned for 350MHz Intel Pentium II with PC100 memory. Compiled on Redhat 7.1. Tested on RedHat 7.x/8.0/9, SuSE 9.0 Compatible to all Intel Pentium II, III, IV and AMD Athlon CPUs.
Please note: that using this ATLAS library already gives a large speed gain over standard BLAS/LAPACK for all 32-bit x86 platforms, so it is worth a try, especially if you want to generate an executable, that can be used on all current 32-bit x86 platforms.

NOTES: The libraries utilize the special instructions of the respective CPUs, so the resulting binaries are nonportable between platforms.

If your platform does not fully match the architectures the binaries were tuned for, you will usually still get a very good performance with the generic (p2) library.

RPM Packages

The following RPMs contain the ATLAS binaries from above repackaged as RPMs. They packaged with the oldest (to me) available RPM version, so they should be compatible with all current RPM based distributions. The libraries itself are identical to the ones above, so there is no need to download them both.
Also there are a few companion RPMs with documentation and manpages.

ATLAS/LAPACK RPM, generic x86 version    download file: atlas-3.6.0-1ak.i386.rpm  (3.376 MByte)
ATLAS/LAPACK RPM, Pentium 3 version    download file: atlas-3.6.0-1ak.i686.rpm  (3.390 MByte)
ATLAS/LAPACK RPM, Pentium 4 version    download file: atlas-3.6.0-1ak.i786.rpm  (3.370 MByte)
ATLAS/LAPACK RPM, Athlon version    download file: atlas-3.6.0-1ak.athlon.rpm  (3.366 MByte)
ATLAS/LAPACK RPM, x86_64 version    download file: atlas-3.6.0-1ak.x86_64.rpm  (3.942 MByte)
ATLAS Documentation and header files: download file: atlas-doc-3.6.0-1ak.noarch.rpm  (0.554 MByte)
BLAS manpages (applies to ATLAS as well): download file: blas-man-3.0-9.noarch.rpm  (0.121 MByte)
LAPACK manpages (applies to ATLAS as well): download file: lapack-man-3.0-9.noarch.rpm  (1.673 MByte)
Up: back to top

CPMD compatible LAM-MPI RPMs for Linux

These RPMs provide binaries of the LAM-MPI message passing library, that can be used to run CPMD in parallel on shared memory SMP computers or clusters of networked Linux PCs over TCP/IP or a combination thereof. Contrary to the standard MPI RPMs provided by RedHat, SuSE and other distributions, these RPMs were configured to be compatible with g77 as well as Intel ifc or PGI's pgf77/pgf90.

To compile with the different compilers just use mpif77 (for g77), mpiifc (for Intel ifc), mpiifort (for Intel ifort), mpipgf77 (for PGI's pgf77), mpipgf90 (for PGI's pgf90), or mpifort (for the Compaq Fortran Alpha Linux Compiler). You can easily adapt the mpi-wrapper scripts to other compilers as well (as long as they have the same underscoring conventions).

I have not tested the RPMs on other distributions, so you have figure out, which version matches yours. Alternatively you could download the source RPM and build a matching binary RPM for yourself (by running rpmbuild --rebuild on the source rpm).

NOTE: different Fortran compilers usually produce mutually incompatible object files, so that you should compile all Fortran sources with the same compiler (or compiler wrapper script) to get a usable executable.

Please drop me a note at axel.kohlmeyer@theochem.ruhr-uni-bochum.de if you want to be notified (via email) in case i update the RPMs (which is rather infrequently).

Updated to lam-7.0.6 on 2004/06/03
RPM for RedHat 7.x, x86    download file: lam-7.0.6-2ak6.i386.rpm  (2.924 MByte)
RPM for RedHat 7.x, alpha    download file: lam-7.0.6-2ak6.alpha.rpm  (3.421 MByte)
RPM for RedHat 9, x86    download file: lam-7.0.6-2ak6.i386.rpm  (8.012 MByte)
RPM for Fedora Core 2, x86    download file: lam-7.0.6-2ak6.i386.rpm  (8.294 MByte)
RPM for SuSE 9.0, x86_64 download file: lam-7.0.6-2ak6.x86_64.rpm  (8.132 MByte)
RPM for SuSE 9.1, x86    download file: lam-7.0.6-2ak6.i586.rpm  (7.823 MByte)
RPM for Mandrake 10.0, x86    download file: lam-7.0.6-2ak6.i586.rpm  (2.907 MByte)
Source RPM:  download file: lam-7.0.6-2ak6.src.rpm  (7.729 MByte)
In case you need to rebuild the RPM for yourself.
Shell script wrappers to emulate rsh access with ssh:
Arch independ RPM: download file: rsh-sshwrap-0.1-2ak.noarch.rpm  (0.003 MByte)
contains no binaries, should work everywhere.
Source RPM:  download file: rsh-sshwrap-0.1-2ak.src.rpm  (0.003 MByte)
In case you need to rebuild the RPM for yourself.

Up: back to top

Using the Intel Fortran Compiler with/without the Intel MKL

With version 8.0 Intel has switched the Fortran frontend. It now uses (almost) the same frontend as the DEC/Compaq/HP alpha compiler which has two major consequences: a) all write statements are synchroneous now which makes following files easier, but include a performance hit, especially on large networked file systems, and b) the resulting binaries need a lot more stack memory (technical explanation: local automatic variables are allocated via alloca(3) instead of malloc(3)). As a consequence you have to raise the stacksize limit in your shell or CPMD will crash with a segmentation fault. ulimit -a(for a bourne/korn shell) or limit (on (t)csh) will tell you the actual settings. You should set it to at least 320000 kbytes via ulimit -s 320000 or limit stacksize 320000, respectively. Note, that this can be especially crucial (and difficult to debug) for parallel jobs.

The Intel Fortran compiler for Linux uses a large amount of shared libraries. This makes the resulting binaries highly nonportable, unless you link them statically. So using '-i-static', '-static-libcxa' or '-static' as link option is highly recommended. If this does not work for one reason or another you should at least try to link all compiler provided libraries statically. This gets even more complicated, if you use the Intel Math Kernel library (MKL). With the following set of flags you will link every Intel library statically, but pthread and libc dynamically (tested with Intel IFC v7.1 and MKL v5.2):

-static-libcxa -Xlinker -Bstatic -lsvml         \
 -L/opt/intel/mkl/lib/32/ -lmkl_lapack -lmkl_p4 \
 -lguide -Vaxlib -Xlinker -Bdynamic -lpthread

If you are using RedHat 9 (or any newer distribution, that uses the new native POSIX threads library) and an older Intel Fortran Compiler (up to version 7.1), you have to link everything dynamically. When you upgrade to the newer version 8 of the Intel Compilers, however, this limitation is removed. If this is not possible, the dynamic linking can be achieved by using ifc -i_dynamic instead of plain ifc. To avoid having to set LD_LIBRARY_PATH every time, you can use the following set of flags to hardcode the default path of those shared libraries into the binary.

-lsvml -Xlinker -rpath=/opt/intel/mkl/lib/32/   \
 -Xlinker -rpath=/opt/intel/compiler70/ia32/lib \
 -L/opt/intel/mkl/lib/32/ -lmkl_lapack -lmkl_p4 \
 -lguide -Vaxlib

If your run a MKL linked CPMD binary on a multiprocessor Intel Xeon or a Hyperthreading enabled desktop machine, you should be aware, that this may prompt the library to use multiple threads internally. Depending on your configuration this may interfere with intended use of the machine and lead to suboptimal performance. Setting the enviroment variable OMP_NUM_THREADS to 1 via export OMP_NUM_THREADS=1 or setenv OMP_NUM_THREADS 1 will disable this 'feature'.

Of course you can avoid some of the trouble by using the combined LAPACK/ATLAS libraries from the paragraph above. So far the performance loss for real world applications compared to MKL seems to be quite small (~5%).

Up: back to top

Compiling CPMD for OpenMP

To compile CPMD for OpenMP you start with creating a regular makefile as if you would want to compile a standard serial or MPI-parallel executable. Next you need to tell the compiler to look for OpenMP directives. With the Portland Group compilers (pgf77/pgf90) this is done by adding the flag '-mp' to the FC and LD makefile variables, e.g.:

  CC = gcc -O2 -Wall -D_REENTRANT
  FC = pgf77 -c -fast -mp -tp athlon -D_REENTRANT
  LD = pgf77 -fast -mp -tp athlon -D_REENTRANT

For the Intel Linux Fortran compiler the equivalent compiler flags are -fpp -openmp. As of 08/2004 i have been able to compile a working OpenMP executable with the Intel compiler Version 8.0 Build 20040412Z, Version 8.1-020 works as well. You need to register with Intel Premier Support (currently at no extra cost, even for the non-commercial version) to get access to the updated binaries (the stock v8.0 compiler does not work). You then have to compile everything without the OpenMP flags, and finally delete the files util.o, mltfft.o, fftnew.o, and fftutil.o change the Makefile for OpenMP and type 'make'. This will recompile only those files for OpenMP (they also provide the largest part of the OpenMP speed gain) and link appropriately. As of CPMD version 3.9.2 OpenMP compilation of the whole package should work (again).

To enable OpenMP parallelization during a CPMD run, you have to set the environment variable OMP_NUM_THREADS to the number of cpus, you want to use for OpenMP (usually 2 on SMP Linux PCs). So you have to type either

  OMP_NUM_THREADS=2
  export OMP_NUM_THREADS
if you have a Bourne shell or
  setenv OMP_NUM_THREADS 2
if you have a c-shell.

You then start the cpmd job and should see multiple cpmd threads/processes instead of a single one. Also the number of active OpenMP threads shold be visible in the CPMD output, e.g.:

 OPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPEN
 NUMBER OF CPUS PER TASK                      2
 OPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPENMPOPEN

Due to changes in the threads interface in Linux, running OpenMP applications may need to tell the dynamic linke which interface to use, by setting environment variable LD_ASSUME_KERNEL. So you have to type either

  LD_ASSUME_KERNEL=2.4.20
  export LD_ASSUME_KERNEL
if you have a Bourne shell or
  setenv LD_ASSUME_KERNEL 2.4.20
if you have a c-shell.

The tricky part is to make that work on a parallel run. With LAM-MPI this may be done by adding the flag ' -x OMP_NUM_THREADS=2 ' to the mpirun command line. For Scali MPI you just have to set the environment like for the serial case. I have not yet tried another Linux MPI implementation with OpenMP.

If all fails, you could try to write a short shell wrapper for your parallel CPMD executable, which could look like this:

#!/bin/sh
OMP_NUM_THREADS=2
LD_ASSUME_KERNEL=2.4.20

export OMP_NUM_THREADS LD_ASSUME_KERNEL

exec /path/to/my/real/cpmd-omp.x "$@"

Be prepared for depressing results on PC style hardware. For a small number of tasks, the MPI parallelization in CPMD is significantly better than the OpenMP parallelization and the overhead of spawning multiple threads frequently outweighs the speed gain. And although a combination of MPI and OpenMP seems to be a smart choice for running a CPMD job on a large number of cpus (particularly if you do not have a high-speed interconnect), so far i have not found any significant gain by using that kind of parallel configuration on current PC hardware. In fact, i was usually better off using either only MPI for all cpus (connecting SMP-cpus via local shared memory) or not using the second CPU at all (the latter especially on dual Pentium-4/Xeon machines).

Up: back to top

Reading binary files from other platforms

CPMD stores its restart information in a so called 'unformatted' (i.e. binary) file format. While this reduces the size of the file largely (while retaining full precision) compared to a (formatted) text file while, it poses a problem, when you want to continue (or restart) a run on a different platform. The same is true if you are using Vanderbilt ultra-soft pseudopotential files in CPMD, which are also read in 'unformatted' (there is a tool in the uspp distribution to convert the binary file to TEXT and back, so you can move the text version to the new platform and create a new unformatted file from it). The unformatted output is mostly a 1:1 copy of the memory contents with block size indicators added, and since the organization of the multi-byte variables (like real*8) in the computer memory changes between platforms, this these files are not generally interchangable between platforms.

There is some hope, though. Most CPUs and operating systems currently support the IEEE-754-1985 Standard for floating point numbers and use one of two ways to store the data internally: big-endian or little-endian . When you have conforming machines, then they you can interchange the binaray files as long as you have machines with the same endianness. Some compilers even allow to compile binaries for the opposite endian (see below), or allow on-the-fly conversion.

Big-endian Machines Little-endian Machines
IBM Power / AIX DEC/Compaq/HP Alpha / True64/Linux
SGI MIPS / IRIX x86 / Linux-PC
HP-PA / HP-UX Itanium / Linux
Sun Sparc / Solaris
PowerMac (G5) / Mac OS X
(if you know of others, please tell me).

How to enable on-the-fly endian conversion under Linux:

If you have the Portland Group Compiler (pgf77/pgf90) you can use the use the -Mbyteswapio or the -byteswapio flag, when compiling CPMD. Your CPMD (or other) binary will now only read and write big-endian unformatted data files. Keep that in mind, when you are using Vanderbilt USPPs, because you need to have the pseudopotential files with the same endianness. An error message like PGFIO-F-219/unformatted read/unit=22/attempt to read/write past end of record is usually an indicator for such an endianness mismatch.

For the Intel Fortran compiler for Linux (ifc/efc) you don't have to recompile. Simply set the environment variable F_UFMTENDIAN to 'big' (i.e. with export F_UFMTENDIAN=big if you are in a bourne/korn shell and setenv F_UFMTENDIAN big if are in a (t)csh). Check your compiler documentation for more details (search for endian).

If you have a DEC/Compaq/HP Alpha machine and use the Compaq Fortran Compiler for Alpha then you can use the compiler flag -convert big_endian to compile an executable that is able to read and write big-endian unformatted binary files. Again check your compiler documentation for more details (search for endian).

Up: back to top

Disclaimer   /   Author of this page: Axel.Kohlmeyer@theochem.ruhr-uni-bochum.de
Source File: cpmd-linux.wml (Fri May 27 11:52:07 2005) ($Revision: 1.29 $) Translated to HTML: Mon Oct 10 00:07:28 2005