6. DATA ANALYSIS ________________________________________________________________________________________________


Pentti Paatero, Philip K. Hopke* and Sirkka Juntto**

Matrix factorization methods for physical sciences ("Factor Analysis") are applicable to many problems where a number of "spectra" have been measured in similar situations or of similar samples consisting of same (perhaps unknown) constituents in different proportions. Examples: chromatographic "spectra", aerosol size distributions, compositions of environmental samples, Auger spectra measured after various heat treatments of the same sample.

A newly developed method "PMF" or "Positive Matrix Factorization" is developed and studied in the present work. The essential features of PMF are: - utilization of error information of the measured data matrix - implementation of strict non-negativity constraints for the factor matrices - production of meaningful error estimates for the computed factors. The method has been developed both for two-dimensional and for three-dimensional data arrays. The 3-way model is often called PARAFAC. The present 3-way solution is more efficent than the customary solutions of the PARAFAC problem and produces error estimates for the results.

In 1997, journal articles describing the theoretical and computational aspects of the method have been published. Various measurements of pollution in the Arctic air have been analyzed in order to determine the sources of pollution. Several articles have been submitted for publication.

Development of a general "Multilinear" program has lead to the release of a first "beta test" version of the program ME-1. This table-driven program allows that different mathematical models of data analysis may be formulated and computed by individual users by using the same program. The individual features of the models are described by a large "structure table"; reprogramming of the fitting algorithm is not needed when a new model is to be solved.

A new application of factor analytic models is studied: analysis of almost-periodic time series. Many environmental data sets have this property. A demonstration analysis has been performed of carbon monoxide data. Daily and weekly patterns are observed; these patterns are related to traffic intensity at different times. The analyses of the arctic pollution measurements utilize this technique.

* Clarkson Univ., NY, USA
** Finnish Meteorological Institute