数据挖掘,也可以称为数据库中的知识发现(Knowledge Discovery in Database,KDD),是从大量数据中提取出可信、新颖、有效并能被人理解的模式的高级处理过程。分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型。该函数或模型能够把数据库中的数据映射到给定类中的某一个,从而可以应用与数据预测。大部分数据挖掘工具采用规则发现或决策树分类技术来发现数据模式和规则,其核心是某种归纳算法。这类工具通常是对数据库的数据进行开采,生产规则和决策树,然后对新数据进行分析和预测。本文针对于决策树算法中的ID3和C4.5算法,研究算法的实现与应用。
关键词:分类 决策树 ID3算法 C4.5算法
Abstract
Data mining, also named as KDD (Knowledge Discovery in Database), is an advanced process, in which we can pick up many trustful, novel, useful and readable patterns from very large amounts of data. Classification is one of the most important branches of data mining research Classification is one of the most important branches of data mining research works. Classification is to learn to find out a classification function or model on the basis of original data.The model can map a single record in database to a pre-assumed class. Thus,classification can be used to forecast.Most of data mining tool kits use the regular discovery or the decision tree classification techniques to find the new data model and rules,the nuclear target is a certain summarizing calculation.This tool kit usually mines the data in storage,produces rules and decision tree, and then analyzes and forecasts new data.This paper studies data mining classification calculation of ID3 and C4.5 .