knowledge Discovery

Technology:
Knowledge Discovery

Introduction to Knowledge-based Knowledge Discovery (KDK)

A White Paper
Zaptron Systems, Inc.
January 16, 1999

I. Summary

This white paper reviews data based knowledge discovery (KDD), and reports on an innovative knowledge discovery scheme – knowledge based knowledge discovery (KDK). By introducing an interceptive coordinator, a directive coordinator, a knowledge synthesizer, a knowledge deductor and a knowledge inductor, we have obtained a knowledge discovery system that is time-space variant. An old (existing) knowledge base is used to create a new knowledge base, which is then synthesized with a basic knowledge base to produce a extended knowledge base.

II. Data based Knowledge discovery – KDD

Most current development in knowledge discovery is the so-called data based knowledge discovery (KDD). Artificial intelligence methodologies, such as fuzzy logic, neural networks, rough set theory and chaotic modeling, are utilized in these KDD systems. They are designed to work with the various databases, including relational data bases, object data bases, spatial-time data bases, and so on. They have found applications in many different fields, from financial services to banking industry, from chemical plant controls to automobile manufacturing, from insurance to agriculture.

A general system diagram of a KDD system is represented by Fig. 1. In this figure, blocks are used represent the various functions of a general KDD system, as described blow:

wpe3.jpg (10896 bytes)

DB – the DB block is the data base that contains the original (historical) data.

Focus – the Focus mechanism is used to search the data base by interest of user or experts, and only those categories of data that are of interest are extracted from the data base and used in building a knowledge base.

Rule Discover – The rule discover block is where hypothetical rules are constructed from the focused data. They are hypothetical because they have not been evaluated. Many techniques are available to find rules from data, including fuzzy logic, neural networks, chaotic models, rough set theory, evidence theory. In our system, a proprietary causal quantitative reasoning method proposed is used.

Evaluation – Evaluation is the time consuming and slow process whereby a decision is made on whether a hypothetical rule is retained or discarded according some criterion. This can be done by human-machine interaction where a human expert is the judge, or by statistical inference and computer induction whereby positive and negative rules are thresholded by the strength level of evidence. In our system, an extended Cohen method is used for rule evaluation.

Derived KB – the result of rule evaluation is stored in the derived knowledge base that can be used by an expert system.

III. Knowledge-based Knowledge Discovery (KDK) – the Double-base coordination mechanism

To improve the above scheme of KDD, the double-base coordination mechanism is introduced where a basic knowledge base, an Interceptive Coordinator (I-type) and a Directive Coordinator (D-type) are added to the KDD diagram, as shown in Fig. 2. They are described below.

wpe4.jpg (20678 bytes)

Basic KB – this is the prior knowledge base obtained by a knowledge acquisition facility from experts’ knowledge and experience and from related theories in books.

Interceptive Coordinator – It is used in conjunction with the Basic KB to detect and remove those hypothetical rules, obtained by the Rule Discover block, that are repetitive or contradictory of the rules in the Basic KB. The introduction of the Interceptive Coordinator will prevent repetitive or conflicting rules from being evaluated and stored in the Derived Knowledge Base, thus saving both the processing time and storage space.

Directive Coordinator – It is added to provide a effective solution to the knowledge shortage problem in a knowledge base. In most cases, there are a large amount of unrelated rules (or independent nodes if represented as a decision tree) in a knowledge base, and it is desirable to discover additional knowledge from the given knowledge base, by finding out if some of them are related (connected if decision tree). Given the huge amount of data, it is impossible for human being to evaluate these rule-pairs (node pairs) one by one to provide additional knowledge. The Directive Coordinator is introduced to let the computer automatically perform this evaluation task by a directed data mining technique. In directed data mining (or heuristic search), only categories of data in the DB that are highly related to the two rules (or two nodes) in evaluation are used to mine new knowledge, thus significantly reducing the time needed to discover new knowledge.

Synthetic Knowledge Base –

-- to be continued --

Copyright © 1997-2000, Zaptron Systems, Inc.