CMH

 

Data Mining Software

A brief review of the major players - September 1999

Clementine - ISL/SPSS

(www.isl.co.uk / www.spss.com )

Advertising highlights:

  • open data mining - add nodes into the toolkit that will run external applications (executables only?) and build simple dialogs to control the parameters.
  • new client-server approach

Feature highlights:

  • resourcing: ODBC, Clementine cache files, text files
  • pre-processing: filter, sample, merge, balance, sort, aggregate, distinct, append, type, derive, history, plot
  • mining tools: neural nets, kohonen nets, rule induction, C5.0, GRI (generalised rule induction), apriori nodes, regression, k-means clustering
  • visualisation: distribution, histogram, web, muliplot, confusion matrix, analysis, statistics, report, text/odbc files,

Enterprise Miner - SAS

(www.sas.com)
  • SEMMA process (Sample, Explore, Modify, Model, Assess)
  • recent focus on CRM - customer relation management, finding new customers, keeping existing ones etc.

Feature highlights:

  • mining tools: clustering, decision trees, linear and logistic regression, and neural networks
  • pre-processing - outlier detection, variable transformations, random sampling, and the partitioning of data sets (into train, test, and validate data sets)

Intelligent Miner for Data - IBM

(www.software.ibm.com/data/iminer/fordata)

Feature highlights:

  • pre-processing: large number of routines
  • mining tools: associations, clustering (demographic/nearest neighbour & neural nets), sequential patterns, time sequence analysis, classification (decision trees, and neural nets), prediction (RBF & neural nets), statistics
  • visualisation: factor analysis, linear regression charts, residual charts (Lift), cluster charts (pie & bar),

KnowledgeSeeker/Studio/Excellerator - Angoss

(www.angoss.com)

Feature highlights:

  • Excellerator: profile, analyse, predict and analyse data from with an Excel spreadsheet
  • Seeker: CART/CHAID & exhaustive tree induction, imports from text/ODBC, graphical tree representation
  • Studio: features of KnowledgeSeeker plus five decision tree algorithms, three neural nets and a clustering algorithm. Windows style GUI. Data mining techniques can be integrated into other packages using ActiveX technology.

Mineset 3.0 - Silicon Graphics

(www.sgi.com/software/mineset)

Advertising highlights:

  • client server and product now available under windows (SGI hardware running Windows NT)
  • ActiveX controls available as an option

Feature highlights:

  • pre-processing: transformation history, binning (discretisation), aggregation, transpose, feature construction using expressions, statistics.
  • mining: decision and option trees, evidence (bayes), regression trees, association rules, k-means clustering, attribute selection (feature selection). Advanced options allow boosting, pruning, laplace correction, holdout, cross-validation, lift curves and loss matrices.
  • visualisation: mapping, scatter plots, splat graphs, rule and tree visualisation.

Darwin - Thinking Machines Corp.

(www.think.com)

Advertising highlights:

  • parallel, scalable data mining architecture for extremely large databases
  • multialgorithmic approach for more accurate results
  • easy to use client/server architecture
  • exportable c/c++/java business models
  • up-to-date windows-style interface

Feature highlights:

  • resourcing: text, SQL and ODBC access
  • cleansing: handling missing values,
  • pre-processing: sampling, randomization, feature construction (100 types), append, merge, select, rename, replace and auto-partition.
  • mining: neural nets, linear and logistic regression, CART decision trees, match models (nearest neighbour), bayesian learning, clustering (k-means and SOM).
  • visualisation: lift charts, ROI charts, sensitivity charts, visual tree display, prediction and error charts.

WinViz

(www.wizsoft.com)

Visualisation software that uses parallel coordinates to visualise rules. Integrates with business software, lotus and excel. Can be used for clustering and machine learning.
REFERENCES: "Exploiting Vizualisation Techniques in Knowledge Discovery", H. Lee, H. Ong and L. Quek, KDD95, pp198-203
LINK: http://jsaic.krdl.org.sg