|
||
Matching
Patterns from Historical Data Using PCA and Distance Similarity Factors |
||
Ashish
Singhal |
||
Ph.D. Candidate |
||
Department of Chemical Engineering University of California, Santa Barbara, CA, 93106 |
||
Abstract |
||
In this research, the term, abnormal situation, refers to an unanticipated situation in an industrial plant that could have serious consequences, but does not warrant a drastic action such as an emergency shutdown. After the abnormal situation stops, plant personnel need to identify its root cause and to determine how to avoid future occurrences. Although a wide variety of process monitoring and fault diagnosis techniques are available, a valuable resource, historical plant data, has largely been overlooked. A new strategy is proposed to provide a preliminary screening of historical data. The objective is to locate previous periods of process behavior that are similar, but not necessarily identical, to the abnormal situation. Neither a process model nor training data for previous abnormal situations are required. A novel methodology is proposed for this pattern-matching problem, which uses principal component analysis (PCA) and the distance between the current and historical datasets. The
new approach provides a preliminary screening of large amounts of
historical data in order to generate a candidate pool of similar periods
of operation. Someone familiar with the process can then further
evaluate this much smaller number of records. Similarity factors are used to characterize the degree of similarity
between the current abnormal operation and historical data. A new
Distance Similarity Factor is proposed that complements the standard PCA
similarity factor. The two similarity factors provide the basis for an
unsupervised pattern matching technique. The proposed pattern matching
methodology has been evaluated in a simulation case study for a
controlled continuous stirred tank reactor (14 measured variables, more
than 474,000 data points for each measured variable, and 28 operating
conditions). The proposed methodology was able to locate previous
occurrences of “abnormal situations” when the start and end of
abnormal situations are unknown. The pattern matching located similar
patterns with over 79% accuracy. The
pattern matching approach is also applied to a batch fermentation
example (9 process variables, 900,000 measurements per variable, and 5
operating conditions). The process variables are sampled at two
different rates. A simple data pre-processing technique is employed to
account for multiple sampling rates and missing data. The proposed
technique locates batches similar to the current abnormal batch with
over 95% efficiency. The future research will include extending the methodology to a large eight-year historical database generated for the Tennessee Eastman challenge problem. Data pre-processing issues such as multiple data sampling rates, missing or corrupted data and data compression will also be addressed. |
||
Publications | ||
1. |
Singhal, A. and D. E. Seborg, "Matching Patterns From Historical Data Using PCA and Distance Similarity Factors", In Proc. 2001 American Control Conference (ACC2001), Arlington, VA, pp 1759-1764 (2001). | |