Books and chapters
    -  Data Mining in Finance
Scientific Discovery and
       Computational Cognition

   Presentations and Video
   Ontological Data Mining
    -  Approach
    -  Theory and methods
     - Comparisons with
      other methods

   Computational Cognition
    -  Prediction problem
    -  Mearsurement theory
    -  Probabilistic formal

    -  Induction problem
    -  Natural classification
   Cognitive models
    -  Functional systems

    -  Computer models
    -  Perception
    -  Financial forecasting
    -  Bioinformatics
    -  Medicine
    -  Forensic Accounting
    -  Other
   Lectures and school-book
    -  Evgenii Vityaev
    -  Boris Kovalerchuk


Last updated 07/05/2020

COMPARISONS of DM Tool Discovery
with other well known DM methods

Financial forecasting

The results of "Discovery" Tool comparisons with Neural Networks (NN), Decision trees (Sipina), Rules extracted from NN, First-order logic methods (FOIL) and other benchmark methods presented on the table and figure

Breast cancer diagnostic system

The results of comparisons with Neural Networks, Decision Tree (SIPINA), Linear Discriminant Analysis, "SIGAMD" software

Figure presents results for another selection criterion: level of conditional probability. We studied three levels: 0.7, 0.85 and 0.95. A higher level of conditional probability decreases the number of rules and diagnosed patients, but increases accuracy of diagnosis. Results for them are marked as MMDR1, MMDR2 and MMDR3. We extracted 44 statistically significant diagnostic rules for 0.05 level of F –criterion with a conditional probability no less than 0.75 (MMDR1). There were 30 rules with a conditional probability no less than 0.85 (MMDR2) and 18 rules with a conditional probability no less than 0.95 (MMDR3). The total accuracy of diagnosis is 82%. The false negative rate was 6.5% (9 malignant cases were diagnosed as benign) and the false positive rate was 11.9% (16 benign cases were diagnosed as malignant). The most reliable 30 rules delivered a total accuracy of 90%, and the 18 most reliable rules performed with 96.6% accuracy with only 3 false positive cases (3.4%). Neural Network (“Brainmaker”, California Scientific Software) software had given 100% accuracy on training data, but for the Round-Robin test, the total accuracy fell to 66%. The main reason for this low accuracy is that Neural Networks (NN) do not evaluate the statistical significance of the perfect performance (100%) on training data. Poor results (76% on training data test) were also obtained with Linear Discriminant Analysis (“SIGAMD” software, StatDialogue software, Moscow). The Decision Tree approach (“SIPINA” software, Universite Lumiere, Lyon, France) performed with accuracy of 76%-82% on training data. This is worse than what we obtained for the MMDR method with the much more difficult Round-Robin test (fig. 8). The very important false-negative rate was 3-8 cases (MMDR), 8-9 cases (Decision Tree), 19 cases (Linear Discriminant Analysis) and 26 cases (NN).
In these experiments, rule-based methods (MMDR and decision trees) outperformed other methods. Note also that only MMDR and decision trees produce diagnostic rules. These rules make a computer-aided diagnostic decision process visible, transparent to radiologists. With these methods radiologists can control and evaluate the decision making process. Linear discriminant analysis gives an equation, which separates benign and malignant classes. For example, 0.0670x1-0.9653x2+… represents a case. How would one interpret a weighted number of calcifications/cm 2 (0.0670x1) plus a weighted volume (cm 3), i.e., 0.9653x2? There is no direct medical sense in this arithmetic.