What is Data Mining?

Data Mining is the process of AUTOMATICALLY collecting large volumes of data with the objective of finding HIDDEN PATTERNS and analyzing the relationships between numerous types of data to develop PREDICTIVE models. A typical example is the widespread use of loyalty cards which are used to identify and gather data from customers in retail stores. Millions of customers unwittingly share information about their purchases, which is collected as bar codes are read at check out points, and is accumulated in data warehouses. Retail stores look at parameters such as RECENCY, FREQUENCY and MONETARY value to determine the likelihood of customers remaining loyal to their retail stores. In addition, location information embedded in loyalty cards helps to correlate demographic and psychographics information, provided by companies like Claritas and ESRI, with purchase data. Companies use such data to identify relatively homogenous groups of customers which demonstrate similar buying behavior. When these segments are demarcated, predictive or statistical models can be develop to forecast their purchase behavior. Each of these groups then receives product and services relevant to their profile which saves costs of mailing catalogues sent to disinterested consumers.

Data mining is a rapidly growing tool in management decision making.

Companies analyze data to offer services in proportion to the revenue earned from customers, price financial products to match the risk profile of customers, customer acquisition and retention strategies, inventory management, fraud detection etc.

The technological centerpiece of well developed data mining is the data warehouse. In the past, data was gathered by transactional or operational technologies such as those used for finance, order booking, sales data or production data management. These operational systems have specific functions while a data warehouse aggregates multi-dimensional information which means that it affords cross-referencing. Analysis of data hosted on operational systems cannot be done efficiently because it takes away time from routine business functions. In addition, operational data stores dynamic information or data such as orders placed which is updated at short intervals. A data warehouse, on the other hand, stores historical information which is not modified after it is transferred from an operational system.

Data stored on data warehouses inevitably grows in volumes and cannot be stored on servers. Instead, data warehouses use storage area networks where disk capacity can be increased incrementally as demand grows unlike servers which increase disk capacity discretely. An added advantage of storage area networks is that they are accessible by all departments or subsidiaries of the company since they are managed from a single GUI. A single view of the data also implies that companies can use data for strategic planning for their business.

The final technological piece in data mining is the analytical applications. These range from simple SQL queries to construction of tables using OLAP tools, such as Business Objects and Cognos, or more sophisticated statistical analysis tools such as SAS, S-Plus, R or SPSS. The analytical tools look for patterns in the data or test hypothesis. They use methodologies like CHAID (Chi-square Automatic Interaction Detector) to find patterns or conduct multivariate statistics for customer segmentation.