- Интеллектуальный Анализ
Данных в финансах
- Компьютерное познание
- Существо подхода
- Теория и методы
с другими методами
- Проблема предсказания
- Теория измерений
- Вероятностные формальные
- Проблема индукции
- Классификация как закон
- Теория функциональных
- Моделирование аниматов
- Финансовый прогноз
- Анализ жульничества
- Другие приложения
- Евгений Витяев
- Борис Ковалерчук
Last updated 01/02/2018
Arguments constraints and skipping useless hypotheses
Background knowledge fulfills a variety of functions in the data mining process. One of the most important is reducing the number of hypotheses to be tested to speed up learning and make this process tractable. There are several approaches to reduce the size of the hypothesis space. Below two of them are presented. They use constraints on arguments of predicates from background knowledge B. The difference is that the first approach uses constraints on a single argument and the second one uses constraints on several arguments of a predicate defined in B.
The first approach called typing approach is based on information about individual data types of arguments of a predicate. For instance, suppose only an integer can be the first argument of predicate P and only the date (M/D/Y) can be the second argument of this predicate. It would be wasteful to test hypotheses with the following typing P(date, integer), P(integer, integer) and P(date, date). The only one correct type here is P(integer, date).
The second approach is called inter-argument constraints approach. For example, predicate Equal(x,x) is always true if both arguments are the same. Similarly, it is possible that for some predicate P for all x P(x,x)=0. Therefore, testing hypotheses extended by adding Equal(x,x) or P(x,x) should be avoided and the size of the hypothesis space explored can be reduced.
The value of inter-argument constraints is illustrated by the experimental fact that the FOCL algorithm, using typing and inter-argument constraints, was able to test 2.3 times less literals and examples than using only typing. [Pazzani, Kibler, 1992]. Table 4.12 summarizes properties of the two discussed approaches for reducing the number of hypotheses.
Approaches for reducing hypothesis space
Initial rules and improving search of hypotheses
This section considers another useful sort of background knowledge - a (possibly incorrect) partial initial rule that approximates the concept (rule) to be learned. There are two basic forms of this initial rule:
- extensional form;
If a predicate is defined by other predicates, we say the definition is intensional. Otherwise, a predicate given by example is called extensional. It is also possible that background knowledge B contains a predicate in a mixed way partially by examples and partially by other predicates. In general, background knowledge presented in a mixed way reduces the search. [Passani, Kibler, 1992].
Learning using initial extensional rule. An expert or another learning system can provide an initial extensional rule [Widmer,1990]. Then this rule (initial concept) is refined by adding clauses [Passani,Kibler, 1992]:
1. An algorithm computes the criterion of optimality
(usually information gain) of each clause in the initial concept;
Learning using initial intensional rules. Next, consider domain knowledge defined in terms of extensional and intensional initial predicates. Systems such as CIGOL [Muggleton & Buntine, 1988] make use of (or invent) background knowledge of this form. For example, if an extensional definition of the predicate GrowingStock(x,y,z) is not given, it could be defined in terms of the intensional predicate GreaterPrice by:
GrowingStock(x,y,z) <- GreaterPrice(x,y), GreaterPrice(y,z),
where x, y, and z are prices of the stock for days t, t+1, and t+2, respectively.
It is possible that the intensional predicate GrowingStock(x,y,z) added to the hypothesis improves it, but each of predicates GreaterPrice(x,y) and GreaterPrice(y,z) does not improve the hypothesis. Therefore, common search methods may not discover a valuable stock regularity.
Pazzani and Kibler  suggested that if the literal with the maximum of the optimality criterion (gain) is intensional, then the literal is made extensional and the extensional definition is added to the clause under construction. Table 4.13 shows this idea more specifically. The process is called operationalization
Note that computation of the optimality criterion, which guides the search, is different for extensional and intensional predicates. For intensional predicates it is usually involves a Prolog proof.
Table Operationalization [Pazzani, Kibler, 1992]
Potentially operationalization can generate very long rules, similarly to large decision trees discussed in Chapter 3, when this extensional approach was illustrated versus short intensional formulas.
Table Partial background knowledge for stock market
Learning using initial intensional and extensional rules. The previous consideration has shown that adding background knowledge can increase the ability of algorithms to find solutions. Table 4.14 shows an example of a partial background knowledge for a stock market forecast. It consists of:
- a definition of the target predicate UP(x,y,w,z) with four arguments to be learned;
In addition, following table provides an initial intensional rule for the target concept Up(x,y,w,z)
Table Intensional initial rule for the target concept
This rule assumes that if growth was accelerated from date t to t+2 then the stock will grow further on date t+3.
Background knowledge is called extended background knowledge if it includes:
- extensional knowledge (training examples and extensional predicates);
Pazzani and Kibler  found in experiments that extended background knowledge with a correct intensional target definition avoids exhaustive testing every variable of every predicate and increases the speed of the search. In their experiments, a correct extensional definition of the target concept was found by testing only 2.35% of literals needed for rule discovery if the target concept is not provided. However, the same research has shown that extended background knowledge
- can increase the search space;
These observations show the need for balancing initial intensional and extensional predicates in background knowledge. One of them can be more accurate and can speed up the search for regularity more than other. Therefore, the following procedure will be more efficient:
1) Compare accuracy of intensional and extensional knowledge;
The modification of this mechanism includes use of
probabilities assigned to all types of background knowledge.
The extensional predicates are compared to the induced literal with the maximum information gain. This approach is used in the FOIL algorithm
In an explanation -based learning approach, the target concept is assumed a correct, intensional definition of the concept to be learned and the domain knowledge is assumed correct as well. An approach that is more realistic is implemented in algorithms such as FOCL and MMDR. These methods relax the assumption that the target concept and the domain knowledge are correct.