2006 AIChE Annual Meeting
(582h) Evolution and Changes in the Blosum Matrix and Blocks Database
We begin by noting inconsistencies between the intended alogrithm for the development of the BLOSUM matrices and the actual implementation of the algorithm. These inconsistencies lead to subtle, yet important, differences in the actual BLOSUM matrices that are still used today and the matrices that should have been derived and published originally. We analyze the impact of these differences using structurally aligned proteins from the SCOP database as a gold standard and find statistically significant differences between the performances of the BLOSUM matrices used today and those that should have originally been derived.
Next, we show that updated BLOSUM matrices computed from successive releases of the Blocks database deviate from the original BLOSUM matrices. At constant re--clustering percentage, later releases of the Blocks database give rise to matrices with decreasing relative entropy, or information content. We show that this decrease in entropy is due to the addition of large, diverse families to the Blocks database. Using two separate tests, we demonstrate that isentropic matrices derived from later Blocks releases are less effective for the detection of remote homologs, and that these differences are statistically significant. Finally, we show that by removing the top 1% largest, most diverse blocks, the performance of the matrices can largely be recovered.