4 Usage

4.1 Search Control Heuristics

Search control heuristics define the order in which the prover considers newly generated clauses. A heuristic is defined by a set of clause evaluation functions and a selection scheme which defines how many clauses are selected according to each evaluation function. A clause evaluation function consists of a priority function and an instance of a generic weight function.

4.1.1 Priority functions

Priority functions define a partition on the set of clauses. A single clause evaluation consists of a priority (which is the first selection criteria) and an evaluation. Priorities are usually not suitable to encode heuristical control knowledge, but rather are used to express certain elements of a search strategy, or to restrict the effect of heuristic evaluation functions to certain classes of clauses. It is quite trivial to add a new priority function to E, so at any time there probably exist a few not yet documented here.

Syntactically, a large subset of currently available priority functions is described by the following rule:

<prio-fun> ::= PreferGroundGoals ||  
               PreferUnitGroundGoals ||  
               PreferGround ||  
               PreferNonGround ||  
               PreferProcessed ||  
               PreferNew ||  
               PreferGoals ||  
               PreferNonGoals ||  
               PreferUnits ||  
               PreferNonUnits ||  
               PreferHorn ||  
               PreferNonHorn ||  
               ConstPrio ||  
               ByLiteralNumber ||  
               ByDerivationDepth ||  
               ByDerivationSize ||  
               ByNegLitDist ||  
               ByGoalDifficulty ||  
               SimulateSOS||  
               PreferHorn||  
               PreferNonHorn||  
               PreferUnitAndNonEq||  
               DeferNonUnitMaxEq||  
               ByCreationDate||  
               PreferWatchlist||  
               DeferWatchlist

The priority functions are interpreted as follows:

PreferGroundGoals:
Always prefer ground goals (all negative clauses without variables), do not differentiate between all other clauses.
PreferUnitGroundGoals:
Prefer unit ground goals.
PreferGround:
Prefer clauses without variables.
PreferNonGround:
Prefer clauses with variables.
PreferProcessed:
Prefer clauses that have already been processed once and have been eliminated from the set of processed clauses due to interreduction (forward contraction).
PreferNew:
Prefer new clauses, i.e. clauses that are processed for the first time.
PreferGoals:
Prefer goals (all negative clauses).
PreferNonGoals:
Prefer non goals, i.e. facts with at least one positive literal.
PreferUnits:
Prefer unit clauses (clauses with one literal).
PreferNonUnits:
Prefer non-unit clauses.
PreferHorn:
Prefer Horn clauses (clauses with no more than one positive literals).
PreferNonHorn:
Prefer non-Horn clauses.
ConstPrio:
Assign the same priority to all clauses.
ByLiteralNumber:
Give a priority according to the number of literals, i.e. always prefer a clause with fewer literals to one with more literals.
ByDerivationDepth:
Prefer clauses which have a short derivation depth, i.e. give a priority based on the length of the longest path from the clause to an axiom in the derivation tree. Counts generating inferences only.
ByDerivationSize:
Prefer clauses which have been derived with a small number of (generating) inferences.
ByNegLitDist:
Prefer goals to non-goals. Among goals, prefer goals with fewer literals and goals with ground literals (more exactly: the priority is increased by 1 for a ground literal and by 3 for a non-ground literal. Clauses with lower values are selected before clauses with higher values).
ByGoalDifficulty:
Prefer goals to non-goals. Select goals based on a simple estimate of their difficulty: First unit ground goals, then unit goals, then ground goals, then other goals.
SimulateSOS:
Use the priority system to simulate Set-Of-Support. This prefers all initial clauses and all Set-Of-Support clauses. Some non-SOS-clauses will be generated, but not selected for processing. This is neither well tested nor a particularly good fit with E’s calculus, but can be used as one among many heuristics. If you try a pure SOS strategy, you also should set --restrict-literal-comparisons and run the prover without literal selection enabled.
PreferHorn:
Prefer Horn clauses (note: includes units).
PreferNonHorn:
Prefer non-Horn clauses.
PreferUnitAndNonEq:
Prefer all unit clauses and all clauses without equational literal. This was an attempt to model some restricted calculi used e.g. in Gandalf [Tam97], but did not quite work out.
DeferNonUnitMaxEq:
Prefer everything except for non-unit clauses with a maximal equational literal (“Don’t paramodulate if its to expensive”). See above, same result.
ByCreationDate:
Return the creation date of the clause as priority. This imposes a FIFO equivalence class on clauses. Clauses generated from the same given clause are grouped together (and can be ordered with any evaluation function among each other).
PreferWatchlist
Prefer clauses on the watchlist (see 4.4).
DeferWatchlist
Defer clauses on the watchlist (see above).

Please note that careless use of certain priority functions can make the prover incomplete for the general case.

4.1.2 Generic Weight Functions

Generic weight functions are templates for functions taking a clause and returning a weight (i.e. an estimate of the usefulness) for it, where a lower weight means that the corresponding clause should be processed before a clause with a higher weight. A generic weight function is combined with a priority function and instantiated with a set of parameters to yield a clause evaluation function.

You can specify an instantiated generic weight function as described in this rule8:

<weight-fun> ::= Clauseweight ’(’ <prio-fun> ’, <int>, <int>,  
                                  <float> ’)’                    ||  
                 Refinedweight ’(’ <prio-fun> ’, <int>, <int>,  
                                   <float>, <float>, <float> ’)’ ||  
                 Orientweight ’(’ <prio-fun>, <int>, <int>,  
                                  <float>, <float>, <float> ’)’  ||  
                 Simweight ’(’ <prio-fun>, <float>, <float>,  
                               <float>, <float> ’)’              ||  
                 FIFOWeight ’(’ <prio-fun> ’)’                   ||  
                 LIFOWeight ’(’ <prio-fun> ’)’

Clauseweight(prio, fweight, vweight, pos_mult): This is the basic symbol counting heuristic. Variables are counted with weight vweight, function symbols with weight fweight. The weight of positive literals is multiplied by pos_mult before being added into the final weight.

Refinedweight(prio, fweight, vweight, term_pen, lit_pen, pos_mult): This weight function is very similar to the first one. It differs only in that it takes the effect of the term ordering into account. In particular, the weight of a term that is maximal in its literal is multiplied by term_pen, and the weight of maximal literals is multiplied by lit_pen.

Orientweight(prio, fweight, vweight, term_pen, lit_pen, pos_mult): This weight function is a slight variation of Refinedweight(). In this case, the weight of both terms of an unorientable literal is multiplied by a penalty term_pen.

Simweight(prio, equal_weight, vv_clash, vt_clash, tt_clash): This weight function is intended to return a low weight for literals in which the two terms are very similar. It does not currently work very well even for unit clauses – RTFS (in <che_simweight.c>) to find out more.

FIFOWeight(prio): This weight function assigns weights that increase in a strictly monotonic manner, i.e. it realizes a first-in/first-out strategy if used all by itself. This is the most obviously fair strategy.

LIFOWeight(prio): This weight function assigns weights that decrease in a strictly monotonic manner, i.e. it realizes a last-in/first-out strategy if used all by itself (which, of course, would be unfair and result in an extremely incomplete prover).

4.1.3 Clause Evaluation Functions

A clause evaluation function is constructed by instantiating a generic weight function. It can either be specified directly, or specified and given a name for later reference at once:

<eval-fun>          ::= <ident>         ||  
                        <weight-fun>    ||  
                        <eval-fun-def>  
<eval-fun-def>      ::= <ident> = <weight-fun>  
<eval-fun-def-list> ::= <eval-fun-def>*

Of course a single identifier is only a valid evaluation function if it has been previously defined in a <eval-fun-def>. It is possible to define the value of an identifier more than once, in which case later definitions take precedence to former ones.

Clause evaluation functions can be be defined on the command line with the -D (--define-weight-function) option, followed by a <eval-fun-def-list>.

Example:
    eprover -D"ex1=Clauseweight(ConstPrio,2,1,1) \  
               ex2=FIFOWeight(PreferGoals)" ...

sets up the prover to know about two evaluation function ex1 and ex2 (which supposedly will be used later on the command line to define one or more heuristics). The double quotes are necessary because the brackets and the commas are special characters for most shells

There are a variety of clause evaluation functions predefined in the variable DefaultWeightFunctions, which can be found in che_proofcontrol.c. See also sections 4.4 and  4.5, which cover some of the more complex weight functions of E.

4.1.4 Heuristics

A heuristic defines how many selections are to be made according to one of several clause evaluation functions. Syntactically,

<heu-element>   ::= <int> ’*’ <eval-fun>  
<heuristic>     ::= ’(’ <heu-element> (,<heu-element>)* ’)’ ||  
                    <ident>  
<heuristic-def> ::= <ident> = <heuristic> ||  
                    <heuristic>

As above, a single identifier is only a valid heuristic if it has been defined in <heuristic-def> previously. A <heuristic-def> which degenerates to a simple heuristic defines a heuristic with name Default (which the prover will automatically choose if no other heuristic is selected with the -x (--expert-heuristic).

Example: To continue the above example,
    eprover -D"ex1=Clauseweight(ConstPrio,2,1,1) \  
               ex2=FIFOWeight(PreferGoals)"  
            -H"new=(3*ex1,1*ex2)" \  
            -x new LUSK3.lop

will run the prover on a problem file named LUSK3.lop with a heuristic that chooses 3 out of every 4 clauses according to a simple symbol counting heuristic and the last clause first among goals and then among other clauses, selecting by order of creation in each of these two classes.

4.2 Term Orderings

E currently supports two families of orderings: The Knuth-Bendix-Ordering (KBO), which is used by default, and the Lexicographical Path Ordering (LPO). The KBO is weight-based and uses a precedence on function symbols to break ties. Consequently, to specify a concrete KBO, we need a weight function that assigns a weight to all function symbols, and a precedence on those symbols.

The LPO is based on a lexicographic comparison of symbols and subterms, and is fully specified by giving just a precedence.

Currently it is possible to explicitly specify an arbitrary (including incomplete or empty) precedence, or to use one of several precedence generating schemes. Similarly, there is a number of predefined weight function and the ability to assign arbitrary weights to function and predicate symbols.

The simplest way to get a reasonable term ordering is to specify automatic ordering selection using the -tAuto option.

Options controlling the choice of term ordering:

-term-ordering=<arg>
-t<arg> Select a term ordering class (or automatic selection). Supported arguments are at least LPO, LPO4 (for a much faster new implementation of LPO), KBO, and Auto. If Auto is selected, all aspects of the term ordering are fixed, additional options will be (or at least should be) silently ignored.
--order-precedence-generation=<arg>
-G <arg>Select a precedence generation scheme (see below).
--order-weight-generation=<arg>
-w <arg>Select a symbol weight function (see below).
--order-constant-weight=<arg>
-c <arg>Modify any symbol weight function by assigning a special weight to constant function symbols.
--precedence[=<arg>]
Describe a (partial) precedence for the term ordering. The argument is a comma-separated list of precedence chains, where a precedence chain is a list of function symbols (which all have to appear in the proof problem), connected by >, <, or = (to denote equivalent symbols). If this option is used in connection with --order-precedence-generation, the partial ordering will be completed using the selected method, otherwise the prover runs with a non-ground-total ordering. The option without the optional argument is equivalent to --precedence= (the empty precedence). There is a drawback to using --precedence: Normally, total precedences are represented by mapping symbols to a totally ordered set (small integers) which can be compared using standard machine instructions. The used data structure is linear in the number n of function symbols. However, if --precedence is used, the prover allocates (and completes) a n × n lookup table to efficiently represent an arbitrary partial ordering. If n is very big, this matrix takes up significant space, and takes a long time to compute in the first place. This is unlikely to be a problem unless there are at least hundreds of symbols.
--order-weights=<arg>
Give explicit weights to function symbols. The argument syntax is a comma-separated list of items of the form f:w, where f is a symbol from the specification, and w is a non-negative integer. Note that at best very simple checks are performed, so you can specify weights that do not obey the KBO weight constraints. Behaviour in this case is undefined. If all your weights are positive, this is unlikely to happen. Since KBO needs a total weight function, E always uses a weight generation scheme in addition to the user-defined options. You may want to use -wconstant for predictable behaviour.
--lpo-recursion-limit[=<arg>]
Limits the recursion depth of LPO comparison. This is useful in rare cases where very large term comparisons can lead to stack overflow issues. It does not change completeness, but may lead to unnecessary inferences in rare cases (Note: By default, recursion depth is limited to 1000. To get effectively unlimited recursion depth, use this option with an outrageously large argument. Don’t forget to increase process stack size with limit/ulimit from your favourite shell).
4.2.1 Precedence Generation Schemes

Precedence generation schemes are based on syntactic features of the symbol and the input clause set, like symbol arity or number of occurrences in the formula. At least the following options are supported as argument to --order-precedence-generation:

unary_first:
Sort symbols by arity, with the exception that unary symbols come first. Frequency is used as a tie breaker (rarer symbols are greater).
unary_freq:
Sort symbols by frequency (rarer symbols are bigger), with the exception that unary symbols come first. Yes, this should better be named unary_invfreq for consistency, but is not…
arity:
Sort symbols by arity (symbols with higher arity are larger).
invarity:
Sort symbols by arity (symbols with higher arity are smaller).
const_max:
Sort symbols by arity (symbols with higher arity are larger), but make constants the largest symbols. This is allegedly used by SPASS [Wei01] in some configurations.
const_min:
Sort symbols by arity (symbols with higher arity are smaller), but make constants the smallest symbols. Provided for reasons of symmetry.
freq:
Sort symbols by frequency (frequently occurring symbols are larger). Arity is used as a tie breaker.
invfreq:
Sort symbols by frequency (frequently occurring symbols are smaller). In our experience, this is one of the best general-purpose precedence generation schemes.
invfreqconstmin:
Same as invfreq, but make constants always smaller than everything else.
invfreqhack:
As invfreqconstmin, but unary symbols with maximal frequency become largest.

4.2.2 Weight Generation Schemes

Weight generation schemes are based on syntactic features of the symbol and the input clause set, or on the predefined precedence. The following options are available for --order-weight-generation.

firstmaximal0:
Give the same arbitrary (positive) weight to all function symbols except to the first maximal one encountered (order is arbitrary), which is given weight 0.
arity:
Weight of a function symbol f|n is n + 1, i.e. its arity plus one.
aritymax0:
As arity, except that the first maximal symbol is given weight 0.
modarity:
Weight of a function symbol f|n is n+c, where c is a positive constant (W_TO_BASEWEIGHT, which has been 4 since the dawn of time).
modaritymax0:
As modarity, except that the first maximal symbol is given weight 0.
aritysquared:
Weight of a symbol f|n is n2 + 1.
aritysquaredmax0:
As aritysquared, except that the first maximal symbol is given weight 0.
invarity:
Let m be the largest arity of any symbol in the signature. Weight of a symbol f|n is m - n + 1.
invaritymax0:
As invarity, except that the first maximal symbol is given weight 0.
invaritysquared:
Let m be the largest arity of any symbol in the signature. Weight of a symbol f|n is m2 - n2 + 1.
invaritysquaredmax0:
As invaritysquared, except that the first maximal symbol is given weight 0.
precedence:
Let < be the (pre-determined) precedence on function symbols F in the problem. Then the weight of f is given by |g|g < f|+1 (the number of symbols smaller than f in the precedence increased by one).
invprecedence:
Let < be the (pre-determined) precedence on function symbols F in the problem. Then the weight of f is given by |g|f < g| + 1 (the number of symbols larger than f in the precedence increased by one).
freqcount:
Make the weight of a symbol the number of occurrences of that symbol in the (potentially preprocessed) input problem.
invfreqcount:
Let m be the number of occurrences of the most frequent symbol in the input problem. The weight of f is m minus he number of occurrences of f in the input problem.
freqrank:
Sort all function symbols by frequency of occurrence (which induces a total quasi-ordering). The weight of a symbol is the rank of it’s equivalence class, with less frequent symbols getting lower weights.
invfreqrank:
Sort all function symbols by frequency of occurrence (which induces a total quasi-ordering). The weight of a symbol is the rank of its equivalence class, with less frequent symbols getting higher weights.
freqranksquare:
As freqrank, but weight is the square of the rank.
invfreqranksquare:
As invfreqrank, but weight is the square of the rank.
invmodfreqrank:
Sort all function symbols by frequency of occurrence (which induces a total quasi-ordering). The weight of an equivalence class is the sum of the cardinality of all smaller classes (+1). The weight of a symbol is the weight of its equivalence classes. Less frequent symbols get higher weights.
invmodfreqrankmax0:
As invmodfreqrank, except that the first maximal symbol is given weight 0.
constant:
Give the same arbitrary positive weight to all function symbols.

4.3 Literal Selection Strategies

The superposition calculus allows the selection of arbitrary negative literals in a clause and only requires generating inferences to be performed on these literals. E supports this feature and implements it via manipulations of the literal ordering. Additionally, E implements strategies that allow inferences into maximal positive literals and selected negative literals. A selection strategy is selected with the option --literal-selection-strategy. Currently, at least the following strategies are implemented:

NoSelection:
Perform ordinary superposition without selection.
NoGeneration:
Do not perform any generating inferences. This strategy is not complete, but applying it to a formula generates a normal form that does not contain any tautologies or redundant clauses.
SelectNegativeLiterals:
Select all negative literals. For Horn clauses, this implements the maximal literal positive unit strategy [Der91] previously realized separately in E.
SelectPureVarNegLiterals:
Select the first negative literal of the form XY .
SelectLargestNegLit:
Select the largest negative literal (by symbol counting, function symbols count as 2, variables as 1).
SelectSmallestNegLit:
As above, but select the smallest literal.
SelectDiffNegLit:
Select the negative literal in which both terms have the largest size difference.
SelectGroundNegLit:
Select the first negative ground literal for which the size difference between both terms is maximal.
SelectOptimalLit:
If there is a ground negative literal, select as in the case of SelectGroundNegLit, otherwise as in SelectDiffNegLit.

Each of the strategies that do actually select negative literals has a corresponding counterpart starting with P that additionally allows paramodulation into maximal positive literals9.

Example: Some problems become a lot simpler with the correct strategy. Try e.g.
eprover --literal-selection-strategy=NoSelection \  
          GRP001-1+rm_eq_rstfp.lop  
eprover --literal-selection-strategy=SelectLargestNegLit \  
          GRP001-1+rm_eq_rstfp.lop

You will find the file GRP001-1+rm_eq_rstfp.lop in the E/PROVER directory.

As we aim at replacing the vast number of individual literal selection functions with a more abstract mechanism, we refrain from describing all of the currently implemented functions in detail. If you need information about the set of implemented functions, run eprover -W none. The individual functions are implemented and somewhat described in E/HEURISTICS/che_litselection.h.

4.4 The Watchlist Feature

Since public release 0.81, E supports a watchlist. A watchlist is a user-defined set of clauses. Whenever the prover encounters10 a clause that subsumes one or more clauses from the watchlist, those clauses are removed from it. The saturation process terminates if the watchlist is empty (or, of course, if a saturated state or the empty clause have been reached).

There are two uses for a watchlist: To guide the proof search (using a heuristic that prefers clauses on the watchlist), or to find purely constructive proofs for clauses on the watchlist.

If you want to guide the proof search, place clauses you believe to be important lemmata onto the watchlist. Also include the empty clause to make sure that the prover will not terminate prematurely. You can then use a clause selection heuristic that will give special consideration to clauses on the watchlist. This is currently supported via the priority functions PreferWatchlist and DeferWatchlist. A clause evaluation function using PreferWatchlist will always select clauses which subsume watchlist clauses first. Similarly, using DeferWatchlist can be used to put the processing of watchlist clauses off.

There is a predefined clause selection heuristic UseWatchlist (select it with -xUseWatchlist) that will make sure that watchlist clauses are selected relatively early. It is a strong general purpose heuristic, and will maintain completeness of the prover. This should allow easy access to the watchlist feature even if you don’t yet feel comfortable with specifying your own heuristics.

To generate constructive proofs of clauses, just place them on the watch list and select output level 4 or greater (see section 6.3). Steps effecting the watch list will be marked in the PCL2 output file. If you use the eproof script for proof output or run epclextract of your own, subproof for watchlist steps will be automatically extracted.

Note that this forward reasoning is not complete, i.e. the prover may never generate a given watchlist clause, even if it would be trivial to prove it via refutation.

Options controlling the use of the watch list:

--watchlist=<arg>

Select a file containing the watch list clauses. Syntax should be the same syntax as your proof problem (E-LOP, TPTP or TSTP). Just write down a list of clauses.

--no-watchlist-simplification

By default, watch list clauses are simplified with respect to the current set P. Use this option to disable the feature.

4.5 Learning Clause Evaluation Functions

E can use a knowledge base generated by analyzing many successful proof attempts to guide its search, i.e. it can learn what kinds of clauses are likely to be useful for a proof and which ones are likely superfluous. The details of the learning mechanism can be found in [Sch00Sch01]. Essentially, an inference protocol is analyzed, useful and useless clauses are identified and generalized into clause patterns, and the resulting information is stored in a knowledge base. Later, new clauses that match a pattern are evaluated accordingly.

4.5.1 Creating Knowledge Bases

An E knowledge base is a directory containing a number of files, storing both the knowledge and configuration information. Knowledge bases are generated with the tool ekb_create. If no argument is given, ekb_create will create a knowledge base called E_KNOWLEDGE in the current directory.

You can run ekb_create -h for more information about the configuration. However, the defaults are usually quite sufficient.

4.5.2 Populating Knowledge Bases

The knowledge base contains information gained from clausal PCL2 protocols of E. In a first step, information from the protocol is abstracted into a more compact form. A number of clauses is selected as training examples, and annotations about there role are computed. The result is a list of annotated clauses and a list of the axioms (initial clauses) of the problem. This step can be performed using the program direct_examples11.

In a second step, the collected information is integrated into the knowledge base. For this purpose, the program ekb_insert can be used. However, it is probably more convenient to use the single program ekb_ginsert, which directly extracts all pertinent information from a PCL2 protocol and inserts it into a designated knowledge base.

The program ekb_delete will delete an example from a knowledge base. This process is not particularly efficient, as the whole knowledge base is first parsed.

4.5.3 Using Learned Knowledge

The knowledge in a knowledge base can be utilized by the two clause evaluation functions TSMWeight() and TSMRWeight(). Both compute a modification weight based on the learned knowledge, and apply it to a conventional symbol-counting base weight (similar to Clauseweight() for TSMWeight() and Refinedweight() for TSMWeight(). An example command line is:

eprover -x’(1*TSMWeight(ConstPrio, 1, 1, 2, flat, E_KNOWLEDGE,

100000,1.0,1.0,Flat,IndexIdentity,100000,-20,20,-2,-1,0,2))’

There are also two fully predefined learning clause selection heuristics. Select them with -xUseTSM1 (for some influence of the learned knowledge) or -xUseTSM2 (for a lot of influence of the learned knowledge).

4.6 Other Options

8Note that there now are many additional generic weight functions not yet documented.

9Except for SelectOptimalLit, where the resulting strategy, PSelectOptimalLit will allow paramodulation into positive literals only if no ground literal has been selected.

10Clauses are checked against the watchlist after normalization, both when they are inserted into U or if they are selected for processing.

11The name is an historical accident and has no significance anymore