Matching Patterns from Historical Data Using PCA and Distance Similarity Factors


		Matching Patterns from Historical Data Using PCA and Distance Similarity Factors
		Ashish Singhal
		Ph.D. Candidate
		Department of Chemical Engineering University of California, Santa Barbara, CA, 93106
		ashishs@engineering.ucsb.edu


Abstract

In this research, the term, abnormal situation, refers to an unanticipated situation in an industrial plant that could have serious consequences, but does not warrant a drastic action such as an emergency shutdown. After the abnormal situation stops, plant personnel need to identify its root cause and to determine how to avoid future occurrences. Although a wide variety of process monitoring and fault diagnosis techniques are available, a valuable resource, historical plant data, has largely been overlooked. A new strategy is proposed to provide a preliminary screening of historical data. The objective is to locate previous periods of process behavior that are similar, but not necessarily identical, to the abnormal situation. Neither a process model nor training data for previous abnormal situations are required. A novel methodology is proposed for this pattern-matching problem, which uses principal component analysis (PCA) and the distance between the current and historical datasets. The new approach provides a preliminary screening of large amounts of historical data in order to generate a candidate pool of similar periods of operation. Someone familiar with the process can then further evaluate this much smaller number of records. Similarity factors are used to characterize the degree of similarity between the current abnormal operation and historical data. A new Distance Similarity Factor is proposed that complements the standard PCA similarity factor. The two similarity factors provide the basis for an unsupervised pattern matching technique. The proposed pattern matching methodology has been evaluated in a simulation case study for a controlled continuous stirred tank reactor (14 measured variables, more than 474,000 data points for each measured variable, and 28 operating conditions). The proposed methodology was able to locate previous occurrences of “abnormal situations” when the start and end of abnormal situations are unknown. The pattern matching located similar patterns with over 79% accuracy. The pattern matching approach is also applied to a batch fermentation example (9 process variables, 900,000 measurements per variable, and 5 operating conditions). The process variables are sampled at two different rates. A simple data pre-processing technique is employed to account for multiple sampling rates and missing data. The proposed technique locates batches similar to the current abnormal batch with over 95% efficiency. The future research will include extending the methodology to a large eight-year historical database generated for the Tennessee Eastman challenge problem. Data pre-processing issues such as multiple data sampling rates, missing or corrupted data and data compression will also be addressed.

Publications

1.	Singhal, A. and D. E. Seborg, "Matching Patterns From Historical Data Using PCA and Distance Similarity Factors", In Proc. 2001 American Control Conference (ACC2001), Arlington, VA, pp 1759-1764 (2001).