All OpenMS files use a tab width of two. Use the command set tabstop=2
in vi
or set-variable tab-width 2
if you are using emacs
. For those two editors, the indentation behavior should be set automatically throught the standard file headers (see below). Due to these ugly issues with setting the tabwidth in the editor, it is perfectly ok not to use tabs at all. In emacs, you can replace all tabs with the right number of spaces by typing the following keys: C-x h
(to mark the whole buffer), then M-x untabify RET
.
All lines in ASCII files (.C, .h, .cmake, ...) should have the svn property svn:eol-style set to native, allowing native line endings on each platform. This is desirable as Visual Studio for example will always insert CRLF even if the file is LF only, leading to a mixed line ending style for this file. Native eol style avoids this problem.
Matching pairs of opening and closing curly braces should be set to the same column:
The main reason for this rule is to avoid constructions like:
which might later be changed to something like
The resulting errors are hard to find. There are two ways to avoid these problems: (a) always use braces around a block (b) write everyting in a single line. We recommend method (a). However, this is mainly a question of personal style, so no explicit checking is performed to enforce this rule. However, if there is an else following the if statement the braces are mandatory! One exception is several if/else statements can be written as
which is save, because the first statement in each else branch is used, which is itself braced by the if branch.
Every .h
file must be accompanied by a .C
file, even if is just a ``dummy''. This way a global make
will stumble across errors.
For template classes default instances with common template arguments should be put into the .C
file. The varaible names of these instances start with default_
. Here an example for the DPeak
class:
The compiler does instanciate the template and detects errors at compile time that way. Doing this saves your time! Otherwise the error is detected much later, when the test is compiles.
Simply speaking, _impl.h files are for templates what .C files are for ordinary classes. Remember that the definition of a class or function template has to be known at its point of instantiation. Therefore the implementation of a template is normally contained in the .h file. (No problem so far, things are even easier than for ordinary classes, because declaration and definition are given in the same file. You may like this or not.) Things get more complicated when certain design patterns (e.g., the factory pattern) are used which lead to "circular dependencies". Of course this is only a dependency of names, but it has to be resolved by separating declarations from definitions, at least for some of the member functions. In this case, a .h file can be written that contains most of the definitions as well as the declarations of the peculiar functions. Their definition is deferred to the _impl.h file ("impl" for "implementation"). The _impl.h file is included only if the peculiar member functions have to be instantiated. Otherwise the .h file should be sufficent. No .h file should include an _impl.h file.
Each OpenMS class should provide the following interface:
There are however circumstances that allow to omit these methods:
operator delete
invocation on a pointer to a base class will fail badly.The OPENMS_DLLAPI macro on the first line is required for correctly building the DLL. The correct usage of this macro is explained in the OpenMS C++ guide!
OpenMS uses its own type names for primitive types. Use only the types defined in OpenMS/include/OpenMS/CONCEPT/Types.h
!
The main OpenMS classes are implemented in the namespace OpenMS
. Auxilary classes are implemented in OpenMS::Internal
. There are some other namespaces e.g. for constants and exceptions.
Importing a whole namespace in a header files is forbidden. E.g.:
This could lead to name clashes when OpenMS is used together with other libraries. In source files (.C) it is however allowed.
Accessors to protected or private members of a class are implemented as a pair of get-method and set-method. This is necessary as accessors that return mutable references to a member cannot be wrapped with Python!
For members that are too large to be read with the get-method, modified and written back with the set-method, an additional non-const get-method can be implemented!
For primitive types a non-const get-method is strictly forbidden! For more complex types it should be present only when really necessary!
Many OpenMS classes base on STL classes. However, only the C++ Standard Library part of the STL must be used. This means that SGI extensions like hash_set
, hash_multiset
, hash_map
and hash_multimap
are not allowed!
No OpenMS program should dump a core if an error occurs. Instead, it should attempt to die as gracefully as possible. Furthermore, as OpenMS is a framework rather than an application, it should give the programmer ways to catch and correct errors. The recommended procedure to handle - even fatal - errors is to throw an exception. Uncaught exception will result in a call to abort
thereby terminating the program.
All exceptions used in OpenMS are derived from Exception::Base
defined in CONCEPT/Exception.h
. A default constructor should not be implemented for these exceptions. Instead, the constructor of all derived exceptions should have the following signature:
Additional arguments are possible but should provide default values (see IndexOverflow
for an example).
The throw
directive for each exception should be of the form
to simplify debugging. __FILE__
and __LINE__
are standard-defined preprocessor macros. The symbol __PRETTY_FUNCTION__
works similar to a char*
and contains the type signature of the function as well as its bare name, if the GNU compiler is being used. It is defined to <unknown>
on other platforms. Exception::Base provides methods (getFile
, getLine
, getFunction
) that allow the localization of the exception's cause.
As usual with C++, the standard way to catch an exeption should be by reference (and not by value).
Exceptions are not specified in the code using the throw statement, as this forces the compiler to check that only the specified exceptions are thrown. This check not only increases the runtime, but may prevent efficient optimization of the code by the compiler.
However, thrown exceptions must be documented to tell the user which exceptions can be catched.
/** @brief Silly function @exception Exception::Foo is always thrown */ void myFunction() { throw Foo(__FILE__, __LINE__, __PRETTY_FUNCTION__); }
Reserved words of the C++ language and symbols defined e. g. in the STL or in the standard C library must not be used as names for classes or class members. Even if the compiler accepts it, such words typically mess up the syntax highlighting and are confusing for other developers, to say the least. Bad examples include: set, map, exp, log. (All developers: Add your favorites to this list whenever you stumble upon them!)
Header files and source files should be named as the classes they contain. Source files end in ".C", while header files end in ".h". File names should be capitalized exactly as the class they contain (see below). Each header/source file should contain one class only, although exceptions are possible for light-weight classes.
Usage of underscores in names has two different meanings: A trailing ``_'' at the end indicates that something is protected or private to a class. Apart from that, different parts of a name are sometimes separated by an underscore, and sometimes separated by capital letters. (The details are explained below.)
Note that according to the C++ standard, names that start with an underscore are reserved for internal purposes of the language and its standard library (roughly speaking), so you should never use them.
Class names and type names always start with a capital letter. Different parts of the name are separated by capital letters at the beginning of the word. No underscores are allowed in type names and class names, except for the names of protected types and classes in classes, which are suffixed by an underscore. The same conventions apply for namespace
s.
Variable names are all lower case letters. Distinguished parts of the name are separated using underscores ``_
''. If parts of the name are derived from common acronyms (e.g. MS) they should be in upper case. Private or protected member variables of classes are suffixed by an underscore.
No prefixing or suffixing is allowed to identify the variable type - this leads to completely illegible documentation and overly long variable names.
Function names (including class method names) always start with a lower case letter. Parts of the name are separated using capital letters (as are types and class names). They should be comprehensible, but as short as possible. The same variable names must be used in the declaration and in the definition. Arguments that are actually not used in the implementation of a function have to be commented out - this avoids compiler warnings. The argument of void
functions (empty argument list) must omitted in both the declaration and the definition. If function arguments are pointers or references, the pointer or reference qualifier is appended to the variable type. It should not prefix the variable name.
Enumerated values and preprocessor constants are all upper case letters. Parts of the name are separated by underscores.
(You should avoid using the preprocessor anyway. Normally, const
and enum
will suffice unless something very special.)
Parameters in .ini files and elsewhere follow these conventions:
This rule applies to all kinds of parameter strings, both keys and string-values.
To generate UML diagrams use yEd and export the diagrams in png format. Do not forget to save also the corresponding .yed file.
Each OpenMS class has to be documented using Doxygen. The documentation is inserted in Doxygen format in the header file where the class is defined. Documentation includes the description of the class, of each method, type declaration, enum declaration, each constant, and each member variable.
Longer pieces of documentation start with a brief description, followed by an empty line and a detailed description. The empty line is needed to separate the brief from the detailed description.
Descriptions of classes always have a brief section!
Please use the doxygen style of the following example for OpenMS:
/** @defgroup DummyClasses Dummy classes @brief This class contains dummy classes Add classes by using the '@ingroup' command. */ /** @brief Demonstration class. A demonstration class for teaching doxygen @note All classes need brief description! @ingroup DummyClasses */ class Test { public: /** @brief An enum type. The documentation block cannot be put after the enum! */ enum EnumType { int EVal1, ///< Enum value 1. int EVal2 ///< Enum value 2. }; /** @brief constructor. A more elaborate description of the constructor. */ Test(); /** @brief Dummy function. A normal member taking two arguments and returning an integer value. The parameter @p dummy_a is an integer. @param dummy_a an integer argument. @param dummy_s a constant character pointer. @see Test() @return The dummy results. */ int dummy(int dummy_a, const char *dummy_s); /// Brief description in one line. int isDummy(); /** @name Group of members. Description of the group. */ //@{ /// Dummy 2. void dummy2(); /// Dummy 3. void dummy3(); //@} protected: int value; ///< An integer value. };
The defgroup command indicates that a comment block contains documentation for a group of classes, files or namespaces. This can be used to categorize classes, files or namespaces, and document those categories. You can also use groups as members of other groups, thus building a hierarchy of groups. Using the ingroup command a comment block of a class, file or namespace will be added to the group or groups.
The groups (or modules as doxygen calls them) definded by the ingroup command should contain only the classes of special interest to the OpenMS user. Helper classes and such must be omitted.
Documentation which does not belong to a specific .C or .h file can be written into a separate Doxygen file (with the ending .doxygen). This file will also be parsed by Doxygen.
Open tasks are noted in the documentation of a header or a group using the todo command. The ToDo list is then shown in the doxygen menu under 'Related pages'. Each ToDo should be followed by a name in parentheses to indicated who is going to handle it.
These commands should be used as well:
Doxygen is not hard to learn, have a look at the manual :-)
The code for each .C file has to be commented. Each piece of code in OpenMS has to contain at least 5% of comments. The use of
// Comment text
instead of C style comments
/* Comment text */
is recommended to avoid problems arising from nested comments. Comments should be written in plain english and describe the functionality of the next few lines.
Instructive programming examples can be provided in the source/EXAMPLES
directory.
OpenMS uses Subversion to manage different versions of the source files. For easier identification of the responsible person each OpenMS file contains the $Maintainer:$
string in the preamble.
Examples of .h
and .C
files have been given above. In non-C++ files (Makefiles, (La)TeX-Files, etc.) the C++ comments are replaced by the respective comment characters (e.g. #'' for Makefiles,
'' for (La)TeX). TeX will switch to math mode after a $
, but you can work around this by writing something like
Latest SVN $ $Date:$ $
if you want to use it in texts; the one here expands to ``Latest SVN Date: 2007-01-19 13:47:36 +0100 (Fri, 19 Jan 2007) ''. Subversion does not turn on keyword substitution by default. See svn -h propset
and svn -h proplist
for details.
Each OpenMS class has to provide a test program. This test program has to check each method of the class. The test programs reside in the directory source/TEST
are usually named <classname>_test.C
. The test program has to be coded using the class test macros as described in the OpenMS online reference. Special care should be taken to cover all special cases (e.g. what happens, if a method is called with empty strings, negative values, zero, null pointers etc.). Please activate the keyword substitution of '$Id$' for all tests with the following command: svn propset svn:keywords Id <file>
.
If a test needs suplementary files, put these files in the source/TEST/data/
folder. The name of suplementary files has to begin with the name of the tested class.
START_TEST(class_name, version)
END_TEST()
START_SECTION(name)
END_SECTION()
STATUS(message)
ABORT_IF(condition)
TEST_EQUAL(a, b)
TEST_NOT_EQUAL(a, b)
TEST_REAL_SIMILAR(a, b)
TEST_STRING_EQUAL(a, b)
TEST_STRING_SIMILAR(a, b)
TOLERANCE_ABSOLUTE(double)
TOLERANCE_RELATIVE(double)
TEST_EXCEPTION(exception, expression)
TEST_EXCEPTION_WITH_MESSAGE(exception, expression, message)
TEST_FILE_EQUAL(file, template_file)
TEST_FILE_SIMILAR(file, template_file)
Do not use methods with side-effects inside the comparison macros i.e. *(it++). The expressions in the macro are called serveral times, so the side-effect is triggered several times as well.
You might want to create temporary files during the tests. The following macro puts a temporary filename into the string argument. The file is automatically deleted after the test.
All temporary files are validated using the XML schema,if the type of file can be determined by FileHandler. Therefor for each file written in a test NEW_TMP_FILE should be called. Otherwise only the last writen file is checked.
NEW_TMP_FILE(string)
There are also some PHP tools for testing other tasks in the tools/
directory. See tools/README
for details!
The abbreviation TOPP stands for The OpenMS Proteomics Pipeline, a collection of tools based upon the C++ classes in OpenMS. The TOPP tools are located in source/APPLICATIONS/TOPP
.
The tests for a TOPP tool are simple commands which can be found in source/TEST/TOPP/CMakeLists.txt
. To add a new test simply follow the examples given in that file. If a test needs suplementary input files, put these files in the same folder. The name of suplementary files has to begin with the name of the tested tool. All extensions but .tmp
are possible.
In order to build the tests, execute the target "tests_build" (in VisualStudio based solution files, this target is available in the source/TEST/OpenMS_tests.sln
solution, not in the OpenMS.sln
) This will build the TOPP tools, UTILS and Unit-tests. Building the TOPP tools alone is not sufficient (you need FuzzyDiff - a UTIL to run the tests).
OpenMS uses CTest to run its tests. You can invoke the ctest
executable in the OpenMS binary directory and it will run all tests (including TOPP tests). To run a specific test use the ctest -R <testname>
, e.g. ctest -R TOPP_FileMerger
to run all FileMerger tests. You can add -V
or -VV
to ctest to make the output more verbose.
The TOPP tests will be run on 32 bit and 64 bit platforms. Therefore a purely character-based comparison of computed and expected result files might fail although the results are in fact numerically correct - think of cases like 9.999e+3
vs. 1.0001e+4
. Instead we provide a small program FuzzyDiff
as a UTIL. This program steps through both inputs simultaneously and classifies each position into 3 categories: numbers, characters, whitespace. Within each line of input, numbers are compared with respect to their ratio (i.e., relative error), characters must match exactly (e.g. case is significant) and all whitespace is considered equal. Empty lines or lines containing only whitespace are skipped, but extra linebreaks 'within' lines will result in error messages. You can also define a "whitelist" of terms, which makes FuzzyDiff ignore lines where these terms occur (useful for hardcoded filepaths etc). For more details and verbosity options, see the built-in help message and the source code.
Each test relies on a number of files. These file should be named source/TEST/TOPP/<toolname>_<nummer>_<name>.<extension>
, where
<toolname>
has the form [A-Z][a-zA-Z]*
; this is the name of the TOPP tool <number>
has the form [0-9]+
; this is the running number of the test <name>
has the form [-_a-zA-Z0-9]+
; this should be a descriptive name (characters _
and -
are ok here, since <toolname>
and <number>
must not contain them) <extension>
; this is the extension expressing the type of the data. The data files should be as small as possible, but not totally trivial.
Yes. Testing is crucial to verify the correctness of the library - especially when using C++. But why has it to be so complicated, using all these macros and stuff? One of the biggest problems when building large class frameworks is portability. C++ compilers are strange beasts and there is not a single one that accepts the same code as any other compiler. Since one of the main concerns of OpenMS is portability, we have to ensure that every single line of code compiles on all platforms. Due to the long compilation times and the (hopefully in future) large number of different platforms, tests to verify the correct behaviour of all classes have to be carried out automatically. This implies a well defined interface for all tests, which is the reason for all these strange macros. This fixed format also enforces the writing of complete class tests. Usually a programmer writes a few lines of code to test the parts of the code he wrote for correctness. Of the methods tested after the introduction of the test macros, about a tenth of all functions/methods showed severe errors or after thorough testing. Most of these errors didn't occur an all platforms or didn't show up on trivial input.
Writing tests for each method of a class also ensures that each line is compiled. When using class templates the compiler only compiles the methods called. Thus it is possible that a code segment contains syntactical errors but the compiler accepts the code happily - he simply ignores most of the code. This is quickly discovered in a complete test of all methods. The same is true for configuration dependend preprocessor directives that stem from platform dependencies. Often untested code also hides inside the const
version of a method, when there is a non-const method with the same name and arguments (for example most of the getName
) methods in OpenMS. In most cases, the non-const version is preferred by the compiler and it is usually not clear to the user which version is taken. Again, explicit testing of each single method provides help for this problem. The ideal method to tackle the problem of untested code is the complete coverage analysis of a class. Unfortunately this is only supported for very few compilers, so it is not used for testing OpenMS.
One last point: writing the test program is a wonderful opportunity to verify and complete the documentation! Often enough implementation details are not clear at the time the documentation is written. A lot of side effects or special cases that were added later do not appear in the documentation. Going through the documentation and the implementation in parallel is the best way to verify the documentation for consistence and (strange coincidence?!) the best way to implement a test program, too!
OpenMS / TOPP release 1.9.0 | Documentation generated on Sun Oct 27 2013 01:11:37 using doxygen 1.8.4 |