## 6. DATA ANALYSIS ________________________________________________________________________________________________

FACTOR ANALYTIC METHODS FOR PHYSICAL SCIENCES

Pentti Paatero, Philip K. Hopke* and Sirkka Juntto**

Matrix factorization methods for physical sciences ("Factor Analysis")
are applicable to many problems where a number of "spectra" have
been measured in similar situations or of similar samples consisting of
same (perhaps unknown) constituents in different proportions. Examples:
chromatographic "spectra", aerosol size distributions, compositions
of environmental samples, Auger spectra measured after various heat treatments
of the same sample.

A newly developed method "PMF" or "Positive Matrix Factorization"
is developed and studied in the present work. The essential features of
PMF are: - utilization of error information of the measured data matrix
- implementation of strict non-negativity constraints for the factor matrices
- production of meaningful error estimates for the computed factors. The
method has been developed both for two-dimensional and for three-dimensional
data arrays. The 3-way model is often called PARAFAC. The present 3-way
solution is more efficent than the customary solutions of the PARAFAC problem
and produces error estimates for the results.

In 1997, journal articles describing the theoretical and computational
aspects of the method have been published. Various measurements of pollution
in the Arctic air have been analyzed in order to determine the sources
of pollution. Several articles have been submitted for publication.

Development of a general "Multilinear" program has lead to
the release of a first "beta test" version of the program ME-1.
This table-driven program allows that different mathematical models of
data analysis may be formulated and computed by individual users by using
the same program. The individual features of the models are described by
a large "structure table"; reprogramming of the fitting algorithm
is not needed when a new model is to be solved.

A new application of factor analytic models is studied: analysis of
almost-periodic time series. Many environmental data sets have this property.
A demonstration analysis has been performed of carbon monoxide data. Daily
and weekly patterns are observed; these patterns are related to traffic
intensity at different times. The analyses of the arctic pollution measurements
utilize this technique.

* Clarkson Univ., NY, USA

** Finnish Meteorological Institute