![]() |
A Software Package for the Analysis, Rebuilding, and Visualization of
Three-dimensional Nucleic Acid Structures Xiang-Jun Lu & Wilma K. Olson |
3DNA was created in the hope that it
will be useful, but without any guarantees. Nothing is perfect, and we
strive to improve its functionalities. Any comments, constructive
suggestions, and bug reports are welcome.
|
This section provides technical details of some of the algorithms used in 3DNA. It is meant for those who want really want to know "what's inside that black box". Least-squares fitting procedures with illustrated examplesIntroductionThe least-squares fitting procedures presented below make use of well known mathematics. Indeed, the methods are so well known and widely used that it is somewhat difficult to locate the original references. As part of our comparison of nucleic acid structure analysis programs (1, 2), we came across a variety of least-squares fitting procedures. This webpage gives a detailed description, with step-by-step examples, of our implementation of two least-squares fitting algorithms based on a covariance matrix and its eigensystem. Standard vs experimental basesThree analysis schemes-CompDNA (3), Curves (4, 5), and RNA (6-8), use least-squares procedures to fit a standard base with an embedded reference frame to an observed base structure. CompDNA (3) and Curves (4, 5) take advantage of the conventional approach of McLachlan (9), while RNA (6-8) implements a closed-form solution of absolute orientation using unit quaternions first introduced by Horn (10). The two algorithms are mathematically equivalent since the unit quaternion can be transformed to the rotation matrix given by McLachlan (9). The Horn method (10), however, is more straightforward and generally applicable. The latter algorithm can be applied even when one or both of the structures are perfectly planar, whereas the McLachlan approach (9) fails. Here we use the ideal adenine geometry derived from the high resolution crystal structures of model nucleosides, nucleotides, and bases (11). The x-, y-, and z- coordinates of the standard base, taken from the Nucleic Acid Database (NDB), are listed below in the columns labeled sx, sy, and sz, respectively. s_(average) is the geometric center of the base.
sx sy sz
1 N9 0.213 0.660 1.287
2 C4 0.250 2.016 1.509
3 N3 0.016 2.995 0.619
4 C2 0.142 4.189 1.194
5 N1 0.451 4.493 2.459
6 C6 0.681 3.485 3.329
7 N6 0.990 3.787 4.592
8 C5 0.579 2.170 2.844
9 N7 0.747 0.934 3.454
10 C8 0.520 0.074 2.491
------------------------------------
s_(average): 0.4589 2.4803 2.3778
We similarly describe the coordinates of one of the adenine bases (the fifth residue in the sequence strand) from the high resolution (1.4 Å) self-complementary d(CGCGAATTCGCG) dodecamer duplex determined by Williams and co-workers (12). The experimental coordinates (NDB entry: bdl084) are listed below in the columns labeled ex, ey, and ez. The geometric center is e_(average). Note that the atomic serial numbers from the NDB (first column) have been rearranged so that the atoms are in the same order as the above ideal base.
ex ey ez
91 N9 16.461 17.015 14.676
100 C4 15.775 18.188 14.459
99 N3 14.489 18.449 14.756
98 C2 14.171 19.699 14.406
97 N1 14.933 20.644 13.839
95 C6 16.223 20.352 13.555
96 N6 16.984 21.297 12.994
94 C5 16.683 19.056 13.875
93 N7 17.918 18.439 13.718
92 C8 17.734 17.239 14.207
------------------------------------
e_(average):16.1371 19.0378 14.0485
We collect the two sets of coordinates in the 10 x 3 matrices S and E corresponding respectively to the standard and experimental bases. We then construct the 3 x 3 covariance matrix (C) (13) between S and E using the following formula:
1 1
C = ------- [S' E - --- S' i i' E]
N - 1 N
=
0.2782 0.2139 -0.1601
-1.4028 1.9619 -0.2744
1.0443 0.9712 -0.6610
Here N, the number of atoms in each base, is 10, and i is an N x 1 column vector consisting of only ones. S' and i' are the transpose of matrix S and column vector i, respectively. From the nine elements of C, we subsequently generate the 4 x 4 real symmetric matrix M using the expression:
- -
| c11+c22+c33 c23-c32 c31-c13 c12-c21 |
M = | c23-c32 c11-c22-c33 c12+c21 c31+c13 |
| c31-c13 c12+c21 -c11+c22-c33 c23+c32 |
| c12-c21 c31+c13 c23+c32 -c11-c22+c33 |
- -
=
1.5792 -1.2456 1.2044 1.6167
-1.2456 -1.0228 -1.1890 0.8842
1.2044 -1.1890 2.3447 0.6968
1.6167 0.8842 0.6968 -2.9011
The largest eigenvalue of matrix M is 4.0335, and its corresponding unit eigenvector (qi, i=0--3) is: [ q0 q1 q2 q3 ] = [ 0.6135 -0.2878 0.7135 0.1780 ] The rotation matrix R is deduced from the qi as:
- -
| q0q0+q1q1-q2q2-q3q3 2(q1q2-q0q3) 2(q1q3+q0q2) |
R = | 2(q2q1+q0q3) q0q0-q1q1+q2q2-q3q3 2(q2q3-q0q1) |
| 2(q3q1-q0q2) 2(q3q2+q0q1) q0q0-q1q1-q2q2+q3q3 |
- -
=
-0.0817 -0.6291 0.7730
-0.1923 0.7710 0.6072
-0.9779 -0.0990 -0.1839
Following coordinate transformation with matrix R, the origin of the standard base is found to be displaced from the experimental structure by: o = e_(average) - s_(average) R' = [15.8969 15.7701 15.1802] The least-squares fitted coordinates (F) of the standard base atoms on the experimental structure are then given by:
F = S R' + i o
=
16.4592 17.0194 14.6699
15.7747 18.1925 14.4586
14.4899 18.4519 14.7542
14.1729 19.6974 14.4070
14.9343 20.6404 13.8420
16.2222 20.3472 13.5569
16.9832 21.2875 12.9925
16.6829 19.0585 13.8760
17.9183 18.4437 13.7219
17.7335 17.2396 14.2062
Here S is the (N x 3) matrix of original coordinates of the standard base, and as noted above, i is an N x 1 column vector consisting of only ones. The difference matrix (D) between F and E, the (N x 3) matrix of original coordinates of the experimental base, and the root-mean-square (RMS) deviation between the two structures are found to be:
D = E - F
=
0.0018 -0.0044 0.0061
0.0003 -0.0045 0.0004
-0.0009 -0.0029 0.0018
-0.0019 0.0016 -0.0010
-0.0013 0.0036 -0.0030
0.0008 0.0048 -0.0019
0.0008 0.0095 0.0015
0.0001 -0.0025 -0.0010
-0.0003 -0.0047 -0.0039
0.0005 -0.0006 0.0008
RMS deviation = 0.0054
It should be noted that if the standard base is already defined in terms of its reference frame, as in the RNA (6-8) analysis, the vector o and matrix R represent the best-fitted coordinate frame of the experimental base. Moreover, the three axes of the frame given by R are guaranteed to be orthogonal. Base normalRather than fit a standard base to experimental coordinates, the CEHS (14, 15), FREEHELIX (16), and NUPARM (17, 18) analyses perform a least-squares fitting of a plane to a set of atoms (19) in order to define the base and base-pair normals. The covariance matrix based on the N x 3 matrix of experimental Cartesian coordinates E is diagonalized to find the vector normal to the best plane. Specifically, C is obtained using the above formula with S substituted by E. The normal vector then lies along the eigenvector that corresponds to the smallest eigenvalue. Note that the coefficient 1/(N-1) in the formula for calculating C has no effect on the eigenvectors and only determines the magnitudes of the eigenvalues. The well known procedure from Blow (20) corresponds to a covariance matrix without this coefficient. Using the above adenine base from the high resolution dodecamer duplex as an example, the covariance matrix C is found to be:
C =
1.6680 -0.5015 -0.3253
-0.5015 2.0670 -0.5840
-0.3253 -0.5840 0.3061
The smallest eigenvalue of C, 8.26e-5, indicates that the base is almost perfectly planar. The corresponding unit eigenvector corresponding to the base normal is: Base normal: 0.2737 0.3224 0.9062 It is also worth noting that the best-fitted global helical axis can be found with the same algorithm, although FREEHELIX implements two subroutines with identical functionalities. References
|
|
|
Created and maintained by Dr. Xiang-Jun Lu © 2000-2008, Rutgers-the State University of New Jersey. |