PROTEIN-DNA COMPLEXATION: CONTACT PROFILES IN DNA GROOVES

Background: Investigation of the specific protein-DNA complexation mechanisms allows to establish general principles of molecular recognition, which must be taken into account while developing artificial nanostructures based on DNA, and to improve the prediction efficiency of the protein binding sites on DNA. One of the main characteristics of the protein-DNA complexes are the number and type of contacts in the binding sites of DNA and proteins. Conformational changes in the DNA double helix can cause changes in these characteristics. Objectives: The purpose of our study is to establish the features of the interactions between nucleotides and amino acid residues in the binding sites of protein-DNA complexes and their dependence on the conformation of deoxyribose and the angle γ of the polynucleotide chain. Materials and methods: At research of protein-DNA recognition process we have analyzed the contacts between amino acids and nucleotides of the 128 protein-DNA complexes from the structural databases. Conformational parameters of DNA backbone were calculated using the 3DNA/CompDNA program. The number of contacts was determined using a geometric criterion. Two protein and DNA atoms were considered to be in contact if the distance between their centers is less than 4.5 Å. Amino acid residues were arranged according to hydrophobicity scale as hydrophobic or nonpolar and polar. Results: The analysis of contacts between polar and hydrophobic residues and nucleotides with different conformations of the sugar-phosphate backbone showed that nucleotides form more contacts with polar amino acids in both grooves than with hydrophobic ones regardless of nucleotide conformation. But the profile of such contacts differs in minor and major grooves and depends on the conformation of both deoxyribose and γ angle. The contact profiles are characterized by the sequence-specificity or the different propensity of nucleotides to form contacts with the residues in both grooves. Conclusions: Our analysis have shown, that the amount and type of protein-nucleic contacts and their distribution in the grooves depend on the conformation of the sugar-phosphate backbone, the nucleotide sequence and the type of amino acids in the binding sites.

КЛЮЧЕВЫЕ СЛОВА: белково-нуклеиновое узнавание; белково-нуклеиновые контакты; структура ДНК; предсказание сайтов связывании; структурное многообразие. Molecular recognition of DNA by proteins is a fundamental problem in molecular biology and drug design. Investigation of the mechanisms of specific protein-DNA complexation allows us to establish general principles of the molecular recognition, which must be taken into account while developing artificial nanostructures based on DNA, and, also, to improve the efficiency of the prediction of protein-DNA binding sites. The prediction of DNA binding sequences for proteins is an unsolved problem, but significant progress has occurred in the last years [1]. To improve the reliability of DNA-binding sites prediction, some problems must be solved. Primarily structural and physicochemical complementarity between DNA sequences and proteins or between the DNA nucleotides and interacting amino acid residues are the principal factors of specificity. That is the binding site information is depending on both protein and DNA structure [2]. For most of protein-DNA complexes the location and sequences of binding sites are not defined. Their experimental determination requires expensive methods. But if we have the information about the spatial structures of the complexes, it is possible for us to predict in silico protein-DNA binding sites [3][4][5].
One of the characteristics of protein-DNA complexation is the number and type of the contacts in protein-DNA binding sites. The detailed analysis of protein-DNA interactions are presented in Refs. [6,7]. Authors have determined all protein-nucleic contacts and separated them according to the types of interactions and possible places of their formation in complexes. They have found that the most common types of protein-nucleic interactions are hydrogen bonds and van der Waals contacts. As for the primary binding sites it turned out that more than two-thirds of the protein contacts are formed with the DNA sugar-phosphate backbone.
Simultaneously conformational changes in DNA can lead to changes in the protein-DNA contact profiles, which are formed, as a rule, in the grooves of the double helix [8,9]. Since, the DNA shape can implicate to protein-DNA binding specificity [9], we also have to look for any peculiar properties of DNA shape.
It should be noted that in protein-DNA binding sites is observed an enriched occurrence of the A-like conformers and mixed A/B conformers [10]. The tendency of A-like nucleotides appearance in surface of protein-DNA complexes is accompanied by a transition of the deoxyribose from the C2'-endo (B-like) to C3'-endo (A-like) sugar pucker. This effect is known as the 'sugar switching' and facilitating by hydrophobic interactions in the minor groove [8,11]. Unusual DNA conformers are often concentrated in the DNA binding sites of protein-DNA complexes. Obviously the propensity of DNA to adopt unusual conformers is essential for their recognition by proteins. The occurrence of these non-canonical forms of the DNA double helix can also explain the role of such DNA structures for evolution [12].
Typical re-arrangements of the DNA sugar-phosphate backbone in the complexes with protein involve B→A sugar switching and rotations around the four torsion angles (α, γ, ε, ζ) with following possible conformations: gauche+ (g+), trans (t), gauche-(g-) [8,13,14]. Earlier [15] we focused on the question of how the transition of the γ angle (O5'-C5'-C4'-C3') in the DNA sugar-phosphate backbone from classical g+ to alternative t and gconformations affects the local DNA interface accessible for interaction with proteins in the minor and major grooves and evaluate the sequence dependence of such transitions?
The purpose of this study is to establish the features of the interactions between nucleotides and amino acid residues in protein-DNA binding sites depending on the conformation of deoxyribose and the angle γ of the polynucleotide chain. The results are based on the statistical analysis of crystallographic structures of protein-DNA complexes from the NDB database [16].

Data set of crystallographic structures
The data-set of the 128 protein-DNA complexes was generated for comprehensive analysis. For the structure selection we use crystallographic coordinates of the protein-DNA complexes extracted from the Nucleic acids Data Bank [16].
The resolution of structures solved by the X-ray crystallography is better than 1.9 Å. This resolution has been previously identified as the one that ensures accurate determination of the sugar puckers and backbone torsion angles as well as statistical analysis of their distribution [13]. All chosen complexes were defined as any structure containing one or more protein chains and at least one double-stranded DNA with more than 4 nucleotides in each chain. From this set we excluded modified and terminal unpaired nucleotides. In order to avoid the end effects, the terminal base pairs on both ends of each DNA structure are excluded from the analysis. Moreover, the data-set is non-redundant, i.e. only one structure is selected from the group of complexes containing an identical protein or its mutants. The preference is given to the complexes with wild-type proteins and better resolution.

Calculation of the DNA structural parameters
The values of the dihedral angle γ of the DNA backbone (O5′-C5′-C4′-C3′) and the angle of pseudorotation (P) of the sugar ring for each nucleotide are calculated by means of the 3DNA/CompDNA analyzer [17].

Revealing the atom-atom contacts in the protein-DNA binding sites
An amino acid residue is considered as forming contacts if the distance between the atoms of the residues and the nucleotide atoms is less than a cutoff radius. Several studies have used different methods of definitions of protein-DNA contacts in the binding sites [1,18]. Two atoms of the residue and nucleotide are considered to form contact if the distance between their centers is less than 4.5Å [7,8] and afterward authors in Ref. [19] declared that this cutoff distance gave the best determination of binding and nonbinding residues to predict binding sites.
We also use the cutoff value of 4.5 Å in our investigation. In this way, we find atoms interacting according to all different mechanisms (hydrogen bonds, electrostatic interactions, and van der Waals interactions). We identify the contacting pairs of atoms in the protein-DNA binding sites by means of the modified software package proposed in [8] that distinguishes contacts in the major and minor grooves. The atoms of bases are also divided into groups exposed in the major (N4, N6, N7, O4, O6, C5, C6, C7, C8, and C4 in Thymine and Cytosine) and minor grooves (N2, N3, O2, C2, and C4 in Adenine and Guanine). The nucleotides which contain atoms contacting with protein are classified as interacting nucleotides, with further subdivision into those interacting in the minor and major grooves.
Twenty amino acids have different hydrophobic and polar properties [20,21]. In our analysis all amino acid residues are separated as hydrophobic or nonpolar IFVLMAGC and polar WYTРSHENQDKR residues (denote by one-letter codes) according to the hydrophobicity scale [22].

Distribution of the protein-DNA contacts
In previous investigation [15] we have shown that the ability of the nucleotides to participate in contacts with protein residues for the both grooves is connected with conformations of their sugar moiety and torsion angle γ. Now more detail analysis presented in this investigation allowed us to make some clarifications. The protein-DNA interactions were calculated for 128 protein-DNA complex structures. These complexes include 1765 nucleotides: 230 A-like nucleotides and 1535 B-like ones. The nucleotides in the classical conformation (1352 nucleotides with the B-conformation of deoxyribose and g+ conformation of the γ angle) are the most frequently found, while the other conformations are much less frequent. For A-like nucleotides there are: 152 g+ classical and 71 t alternative nucleotides, for B-like: 1352 g+, 53g-and 38 t. However, as previously reported [8,9,[13][14][15], such alternative conformations can play an important role in the process of indirect proteins recognition of their binding site on DNA sequence. The more detailed scheme of the structure selection and base composition can be found in Ref. [15].
In total we have analyze the 26293 amino acid-DNA contacts. The numbers of contacts between amino acid and DNA in the both grooves are counted separately for bases and sugarphosphate backbone. Then, all nucleotides were divided according to their deoxyribose conformation (the A-like or the B-like) and conformation of γ angle (classical g+ and alternative t or g-).
Then we examine the location of contacts in the grooves and determine that near 3/4 of the protein-DNA contacts are formed with the sugar-phosphate backbone (19483 contacts or 74%), and the rest one (6810 contacts or 26%) are formed with bases. At the same time the distribution of the contacts between the grooves are differed. In the minor groove only 10% of interactions are the contacts between residues and bases, and 90% of residues contacts are formed with sugarphosphate backbone. In the major groove the number of contacts with bases increases up to 39%. These results are in agreement with the well-known idea that specific protein contacts with DNA bases are realized in the major groove whereas nonspecific interactions with the sugar-phosphate backbone are occurred in the minor groove [2,6,8,].
But the percentage of contacts between protein and bases or backbone atoms in the grooves is different for nucleotides with various conformations of the sugar-phosphate backbone. In the minor groove, the transition to alternative t conformation of the angle γ increases the participation of backbone atoms in the contacts with proteins, especially for A-like nucleotides: from 88% for g+ state to 91.4% for t state. In the major groove, only the transition of B-like nucleotides to t γ angle conformation alters the participation of nucleotides in the contacts, increasing the number of backbone atom contacts (from 62 to 71%).
That is, transitions to alternative conformations of the angle γ can influence the features of protein-nucleic interactions primarily in the minor groove.
We also can notice some common patterns in the sequence-specificity of contacts for nucleotides with different sugar-phosphate backbone conformations.
In total the greatest number of contacts with amino acid residues forms nucleotides containing Guanine, while the other three types of nucleotide create practically the equal number of contacts. The decreasing order of contact propensity is: Guanine > Thymine > Cytosine ≥ Adenine. This result is explained by the fact that Guanine exposes the greatest number of potential hydrogen-bonding atoms on the base edges [6] and only Guanine can serve as a hydrogen bond donor in the minor groove [23].
Conformational angle γ transitions and sugar switching lead to a different participation of all four types of nucleotides in the protein-DNA contacts.
It should be noted, that B-like alternative nucleotides contained Guanine are involved in contacts between Guanine and amino acid residues in the minor groove while B-like alternative nucleotides containing Adenine, do not form any contacts between Adenine and amino acid residues. Interesting, that Thymine nucleotides with B-like deoxyribose and gγ angle conformation more often take part in the protein-DNA contacts in comparison to other conformations. Obviously this conformation promotes the interaction of the Thymine bases with amino acids, which can be one of the ways of the indirect recognition mechanism realization.
For А-like nucleotides in both grooves the transition to alternative t angle γ conformation leads to even the greater increasing of the number of Guanine contacts and for their decreasing with other bases, especially with Cytosine. The similar tendency is observed for amino acid contacts with sugar-phosphate backbone of A-like nucleotides: the number of contacts is increased for nucleotides containing Guanine and decreased for the other types of nucleotides.
In the major groove B-like nucleotides in t γ angle state containing Adenine form more backbone contacts than other types of nucleotides with the similar sugar-phosphate conformation. At the same time Adenines create the fewer number of base contacts in major groove as well as Cytosines in minor groove (both with alternative γ angle conformations).
Thus the contact profiles are characterized by the sequence-specificity, that is, the different propensity of certain nucleotides to form contacts with amino acids in the grooves.

Contacts with polar and hydrophobic amino acid residues
The analysis of contacts between polar and hydrophobic amino acid residues and nucleotides with different conformations of the sugar-phosphate backbone shows that nucleotides form more contacts with polar amino acids in both grooves than with nonpolar residues regardless of the nucleotide conformation (Table 2, Total contacts).
But, the profiles of such contacts differ in minor and major grooves and depend on the conformation of both deoxyribose and the angle γ (Fig.1).
The nucleotides with classical g+ conformation of the γ angle are characterized by the similar contact profiles in both grooves independently on the deoxyribose conformation.
There are slightly changes in contacts profiles for A-like nucleotides with alternative t conformation of the γ angle ( Table 2). The percent of contacts with polar amino acids in minor groove increases, and with hydrophobic amino acids decreases. In major groove, on the contrary, the percent of contacts with polar amino acids decreases, and with hydrophobic amino acids increases. These facts are in agreement with the change in the polar/nonpolar profile of the DNA surface, available in the major and minor grooves [15]. More careful evaluation of polar/hydrophobic contact profile shows that for A-like nucleotides with alternative t conformation of the γ angle the percent of residue contacts is differ for bases and backbone atoms ( Table 2). For B-like nucleotides with alternative t conformation of the γ angle in both grooves, the percent of contacts with polar amino acids increases, and with hydrophobic amino acids decreases, especially in major groove. But the transition of Blike nucleotides to the galternative conformation reduces the percent of contacts with polar amino acids and increases it with nonpolar ones in both grooves ( Table 2, Total contacts). So we observe the significant changes in their contact profiles (Fig.1). The verification of the results suggests that we need to be careful interpreting the high propensities of hydrophobic amino acids contact with bases of nucleotides in this conformation (B-like sugar pucker, gconformation) because there are only 25 total residue-base contacts in the whole dataset. Whatever, such minor groove contacts may play a role for proteins binding only in the minor groove (such as Architectural), which is often associated with a dramatic widening and extensive hydrophobic contacts [24] and contribute to specificity. The analysis of contacts has shown, that the significant changes are observed for base contacts rather than for sugar-phosphate backbone ones and primarily for classical nucleotides independently of deoxyribose conformation (Table 2). For alternative nucleotides such tendency is less defined. Only one exception is observed in the major groove for B-like nucleotides with t conformation representing the significant changes in sugar-phosphate backbone contact profiles. The most dramatical changes observed for the base contact profiles in the minor groove for B-like nucleotides with gangle γ conformation we have discussed above.
Conformational transitions of sugar-phosphate DNA backbone can affect the propensity for DNA-binding of amino acids (Fig. 1).
Our determination the propensity of amino acids to interact with nucleotides shows, that the most common the protein-DNA contacts form positive charged Arginine (R) and Lysine (K) residues (Fig. 1) and this data are consistent with the results of many studies [6,7,23]. Tryptophan (W) has the largest propensity to form contacts with the DNA in major groove for B-like nucleotides in alternative γ angle conformations. Despite the fact that Tryptophan is referred as polar amino acid [22], the presence of a heterocyclic system in its side chain allows Tryptophan to participate in hydrophobic interactions. Therefore the observed effect can be explained by the increasing of nonpolar surface in the major groove upon the transitions of γ angle into alternative conformations [15] facilitating the formation of the specific hydrophobic contacts with nucleotides [25].
The number of Aspartate (D) contacts in DNA major groove is high for nucleotides with any sugar pucker and t γ angle conformations (Fig.1). Moreover, almost half of the contacts in minor groove for nucleotides with B-like sugar and t γ angle conformation are the Cytosine interactions (Table 1). Recently Corona et al. [23] have shown that negatively charged Aspartate is enriched in base interactions for highly specific DNA-binding proteins and predominately binds to Cytosine in the major groove through a single hydrogen bond or two consecutive Cytosines through bidentate hydrogen bonds. Obviously, transitions into t angle γ conformation contribute binding specificity through indirect readout mechanism.
The transition of B-like nucleotides into gangle γ conformation reduces the number of Arginine (R) contacts in minor groove (Fig.1). This effect can be associated with sequencespecificity. Arginine is prefer to bind with Guanine [6], while we have shown, that in this conformation Thymine is more often takes part in the protein-DNA contacts (Table 1).

CONCLUSIONS
Prediction of protein binding sites on DNA will allow us to determine the functions of proteins and understand regulatory processes in molecular biological systems as well as to develop pharmaceutical drugs that can prevent the expression of target genes. For solving this problem some general principles concerning the frequency of formation of the specific amino acid-base pairs in binding sites should be formulated.
In this investigation we try to identify common features of DNA backbone rearrangements which can effect on the ability of nucleotides to participate in contacts with proteins. According to the results of our study one can assert the existence of certain preferences in the formation of the protein-DNA contacts, depending on the conformation of the DNA sugar-phosphate backbone.
The A-like nucleotides more often interact with proteins in the minor groove, while B-like nucleotides make more contacts in the major groove regardless of the γ angle conformation. At the same time the alternative A-like nucleotides form more contacts in the grooves than B-like nucleotides with both t and galternative angle γ conformations.
Near 3/4 of the protein-DNA contacts are formed with sugar-phosphate backbone. Such interactions predominate in the minor groove especially for A-like alternative nucleotides. In the major groove the number of contacts with bases is significantly higher, but the transition of Blike nucleotides to alternative t conformation of angle γ leads to increasing the interaction with backbone atoms.
The nucleotides form more contacts with polar amino acids in both grooves than with nonpolar residues regardless of the nucleotide conformation, but the profiles of such contacts differ in minor and major grooves and depend on the conformation of both deoxyribose and the angle γ.
In the minor groove the amount of interactions between alternative nucleotides and polar amino acids increase, and with hydrophobic amino acids decrease. In the major groove, on the contrary, alternative nucleotides form less number of contacts with polar amino acids than with hydrophobic ones. The polar/hydrophobic contact profile for alternative A-like nucleotides is differed for bases and backbone atoms.
We have shown that contact profiles are characterized by the sequence-specificity, that is, the different propensity of certain nucleotides to form contacts with amino acids in both grooves. In particular the transitions to alternative conformations of the angle γ can influence the specificity of the protein-DNA interactions primarily in the minor groove. But the correct determination of preferred amino acid -nucleotide pairs in the binding sites of protein-DNA complexes requires additional studies.
Statistical analysis of the presented protein-DNA structures allows us to make following concluding remark. It is well known that there are no simple rules in protein-DNA recognition [9], and the prediction of binding sites is extremely difficult, since proteins and DNA have a significant conformational variability. Most of the methods predicting protein binding sites on DNA are based on sequence preferences of residues and protein structure in binding site [1] and take into account only the DNA sequence (i.e., direct readout) [26][27][28][29][30][31]. We assume that information about conformational rearrangements of DNA (indirect or shape recognition) will allow to supplement the results of these methods and take into consideration not only protein features but also the structure of DNA. Such data can substantially improve the quality of binding sites prediction.