Friday, October 19, 2007

Component analysis, causal inference, and general intelligence

The aim of astronomy is astrophysics - we observe to Universe with the hope of using the resulting data to understand the fundamental physical processes that give rise to its observed properties.

As with many sciences the data obtained from observation (experimentation, in other sciences) itself does not uniquely tell you the physics or what caused what. Instead one normally investigates to look for correlations between different aspects of the data.

For example it is known that the surface brightness, effective radius and velocity dispersion of the stars in elliptical galaxies are strongly correlated, a result now called the fundamental plane. Another example is that in starburst galaxies the soft X-ray luminosity is linearly proportional to the galaxies far-IR luminosity because, causually, the FIR traces the formation rate of massive stars, the same stars that very rapidly die and whose supernovae heat the ISM to X-ray-emitting temperatures.

Various methods of investigating correlations between multiple variables exist (e.g. principal component analysis), now often referred to as "data mining." The problem is that these methods, while useful at recasting the data in ways that aid visualization of any correlations in the data variables, do not necessarily tell you what caused what.

An interesting discussion of these often-forgotten issues and complexities, one is applicable even to astrophysics, can be found in Cosma Shalizi's article on the myth of g, the so-called general factor of intelligence. Indeed, he argues that while factor analysis is perfectly valid for data exploration or model testing, as a method for finding causal structure it is not reliable (it can be right, but often its completely wrong and can fool you).

All very interesting, and rather important to understand in the wake of a certain elderly Nobel-prize winner's recent counter-factual comments.

No comments: