Advances in understanding the initiation of HIV-1 reverse transcription

Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.

Abstract

Many viruses, including Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and Human Immunodeficiency Virus (HIV), use RNA as their genetic material. How viruses harness RNA structure and RNA–protein interactions to control their replication remains obscure. Recent advances in the characterization of HIV-1 reverse transcriptase, the enzyme that converts its single-stranded RNA genome into a double-stranded DNA copy, reveal how the reverse transcription complex evolves during initiation. Here we highlight these advances in HIV-1 structural biology and discuss how they are furthering our understanding of HIV and related ribonucleoprotein complexes implicated in viral disease.

Current Opinion in Structural Biology 2020, 65:175–183

This review comes from a themed issue on Protein nucleic acid interactions

Edited by Guillermo Montoya and Teresa Carlomagno

For a complete overview see the Issue and the Editorial

Available online 8th September 2020

0959-440X/© 2020 Published by Elsevier Ltd.

Introduction

Emerging zoonotic viruses pose a serious threat to modern society [1]. The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and the Human Immunodeficiency Virus (HIV) cause severe, and often fatal, diseases around the world. SARS-CoV-2 virus has currently infected more than 3.9 million people and killed more than 270 000 worldwide during the ongoing pandemic [2]. HIV is responsible for 76 million infections and about 33 million deaths worldwide to date [3]. These and other zoonotic viruses use a ribonucleic acid (RNA) genome [4]. To replicate, viruses like SARS-CoV-2 use an RNA-dependent RNA polymerase (RdRp), which creates RNA copies of its genome for inclusion in virions [5,6 • ]. Retroviruses, like HIV, replicate through a deoxyribonucleic acid (DNA) intermediate that allows them to integrate into the host genome before producing more viral particles [7,8]. This process, known as reverse transcription, is performed by a specialized viral enzyme called reverse transcriptase.

HIV reverse transcription has been investigated for decades, culminating in the development of retroviral therapies that are unfortunately increasingly ineffective due to resistance [9]. Thus, there remains a pressing need for deeper molecular understanding of the HIV life cycle. Here we focus on recent structural and biochemical advances in understanding how HIV reverse transcription initiates and the role of viral components such as reverse transcriptase (RT), the viral RNA genome (vRNA), and a human transfer RNA (tRNA Lys 3) in this process ( Figure 1 ). These advances have the potential to highlight better drug targets for antiretroviral therapies and further our understanding of related systems that replicate RNAs.

Figure 1

Overview of reverse transcription mechanism. Reverse transcription with tRNA Lys 3 annealing to the viral RNA (1). RT initiates with minus strand cDNA synthesis until it reaches the repetitive region R at the 5′ end of the vRNA template (2). The first strand transfer occurs, resulting in the extended primer annealed to the complementary repetitive region R at the 3′ end of the template (3). cDNA synthesis proceeds again and the vRNA template is degraded except for the ppt (4). Plus-strand synthesis begins with RT using the remaining ppt track as a primer (5) and extending the primer until it reaches the PBS region (6). All remaining RNA is degraded (7) and the complementary PBS sequences are used to facilitate the second strand transfer so plus strand cDNA synthesis can continue (8). The PBS region of the minus strand is extended to copy the U3, R, and U5 regions (9). The final product is double stranded DNA with U3, R, and U5 regions flanking the protein coding region of the genome.

HIV-1 reverse transcription

Structural snapshots of reverse transcription initiation

Common among many retroviruses, HIV reverse transcription initiates using the terminal 3′OH of a host primer tRNA that has been previously annealed to the genomic RNA during viral packaging. The global mechanism of reverse transcription of genomic single-stranded RNA to double-stranded DNA has been established over the past 30 years ( Figure 1 ) [10]. Upon HIV cellular entry and release of the capsid core into the cytoplasm, a packaged copy of RT initiates reverse transcription on a complex formed between tRNA Lys 3 primer and the ∼9200 bp genomic RNA. Reverse transcription begins at the primer binding site (PBS) located at the 5′ end of the genomic RNA. When reverse transcription of the 5′ end is completed, the newly synthesized negative-sense strand is transferred to the 3′ end of the genomic RNA. The synthesis of negative-strand DNA creates a primer-template DNA–RNA hybrid, which is recognized and cleaved by the RNaseH domain of RT. As reverse transcription proceeds, the genomic RNA is digested, leaving RNaseH resistant polypurine tract (PPT) fragments to serve as primers for positive strand cDNA synthesis. The initiation stage of reverse transcription, where RT must recognize the tRNA-genomic RNA complex, is slow and non-processive compared to later stage elongation steps [11,12].

The reverse transcriptase initiation complex (RTIC) involves an 18 base-pair double-stranded duplex formed between the 3′ end of the tRNA primer and genomic RNA, which must be bound by RT; decades of structural attempts failed to capture this complex due to its dynamic nature, as RT dissociates rapidly from RNA–RNA complexes. To circumvent this challenge, Larsen et al. solved the first RTIC structure by implementing a covalent cross link between RT and full length tRNA Lys 3 annealed to 101 nucleotides of the viral 5′ UTR [13 •• ]. This cryogenic electron microscopy (cryoEM) structure has a global resolution of 8 Å and a core resolution of 4.5 Å. Despite the high flexibility of the complex, the higher resolution of the core allowed for visualization of RTIC in a pre-translocation state after one round of nucleotide incorporation, not primed for the next round of catalysis ( Figure 2 a). The complex provides the first view of the primer binding site (PBS), vRNA helix1 (H1), and vRNA helix2 (H2) and reveals large RNA rearrangements induced by RT binding ( Figure 2 b). In particular the conformation of tRNA Lys 3 in the RTIC has shifted from a three-way junction structure observed in solution to an elongated helix involving refolding of the 5′ portion of the tRNA [14]. This structure underscored the importance of RNA structural rearrangements in the RT initiation process, and provided an initial snapshot, albeit at low resolution, of the process.

Figure 2

The structure of RTIC. (a) The RTIC cryoEM map by Larsen et al. [13 •• ] reveals density of the vRNA/tRNA PBS duplex beyond the nucleic acid binding cleft within the RT (PDB ID: 6B19). The p66 and p51 subunits are depicted in purple and grey. RT is bound to a vRNA/tRNA duplex depicted in gold and maroon. (b) In addition to the PBS duplex accommodated within the RT, the vRNA forms two helixes H1 and H2, while the tRNA forms an extended helix conformation, shown in RTIC schematic representation of the secondary structure. (c) The polymerase active site is further divided into the fingers, thumb, palm, and connection subdomains. Encircling RT’s active site is the canonical reverse transcription catalysis cycle. (d) Helical geometries are shown for DNA/DNA (PDB ID: 4C64, [65]), DNA/RNA (PDB ID: 1EFS, [66]), and RNA/RNA (PDB ID: 1RNA, [67]), representative of the duplexes involved in reverse transcription.

Another view of the RTIC core was solved by Das et al., presenting a crystal structure of the RT in complex with an RNA homoduplex [15 •• ]. The complex was formed by crosslinking RT and 23 nucleotides of vRNA annealed to 17 nucleotides of tRNA Lys 3, forming the PBS helix that spans RT’s cleft. Both the cryoEM and crystal structures capture RT with a hyperextended thumb and open fingers bound to an RNA duplex ( Figure 3 a). Relative to elongation structures, the duplex is shifted away from the active site and toward the fingers. While both independent RTIC complexes are catalytically active, neither structure captures the RTIC poised for nucleotide incorporation ( Figure 2 c). Instead the 3′ primer terminus is displaced from the active site by 5–7 Å ( Figure 3 b), and the fingers domain of RT remains open. This is consistent with the slow rates of nucleotide incorporation during the initiation phase [11,12]. Difficulties in capturing the 3′ terminus poised for nucleotide incorporation are likely caused by the absence of dNTP or dNTP analogues in these structures, the low affinity of RT for RNA–RNA duplexes, and the high energetic cost of properly positioning the 3′ end at the active site.

Figure 3

RT complexes in initiation and elongation, focusing on the core. (a) Both RTIC structures have open fingers and hyperextended thumb, Larsen et al. [13 •• ] cryoEM in red (PDB ID: 6B19) and Das et al. [15 •• ] crystal structure in green (PDB ID: 6HAK), compared to an elongation complex solved by Huang et al. [68] that is an RT-DNA/DNA structure in pink (PDB ID: 1RTD). (b) The 3′ termini in initiation structures are displaced by ∼6 Å from the active site with respect to the elongation complex; same color scheme as above (c) Top view of RNase H active site shows that template and primer are flexible to engage with the RNase H region in different RT-nucleic acid complexes; in the Tian et al. [17 •• ] structure, RNA is in green and DNA is in cyan (PDB ID: 6BSH), compared to Larsen et al. [13 •• ] structure where vRNA is shown in yellow and tRNA shown in red (PDB ID: 6B19). (d) Closeup view of the RNase H region demonstrates how the RNase H grip structure is pushing the DNA strand away and the RNA strand is pulled into the active site in an active DNA–RNA hybrid (PBD ID: 6BSH), while the vRNA and tRNA remain rigid and do not engage with the RNase H region in the RTIC (PBD ID: 6B19).

The conformation of RT in both RTIC structures most closely resembles that of RT in complex with double-stranded DNA bound to Nevirapine, a non-nucleoside reverse transcriptase inhibitor (NNRTI) [16]. In this structure the drug is bound distal to the polymerase active site of RT and allosterically locks the primer grip in a displaced position. NNRTI binding to RT decreases its activity by preventing the 3′ terminus from positioning at the polymerase active site for nucleotide catalysis, similar to what is observed in RTIC structures. Although the two RTIC structures were captured with a displaced 3′ end, the primer grip must be flexible to accommodate the 3′ primer terminus for catalysis.

The RNase site during initiation is not engaged

The RT RNase H site cleaves the RNA strand of RNA/DNA heteroduplexes during reverse transcription, by recognizing their novel helical geometry and covalent backbone chemistry. In accordance with biochemical data, the RNase H active site is not engaged in both RTIC structures, where the PBS RNA homoduplex adopts A-form geometry ( Figure 2 d). In contrast, the Yang group [17 •• ] published the only structure of HIV-1 RT engaging in RNA cleavage of an RNA/DNA hybrid. In this 2.65 Å crystal structure, the DNA strand is pushed away by the RNase H primer grip and the RNA strand is pulled into the active site by a trio of catalytic aspartic acid residues ( Figure 3 c,d). Between the catalytic carboxylates and the phosphate backbone, two divalent cations are coordinated and poised for catalysis. On a larger scale, the RT fingers open in the polymerase active site and the overall p66 and p51 subunits are farther apart, conformational changes that are believed to be a consequence of RT’s degradation-competent mode in this structure.

In contrast, Sarafianos et al. solved a crystal structure of RT in complex with the PPT RNA/DNA duplex, demonstrating why this nucleic acid structure is resistant to RNase H cleavage [18]. The PPT sequence contains two stretches of four or more consecutive rA/dTs with a narrower minor groove than a normal DNA–RNA hybrid. These regions, known as A-tracts, are rigid but the region immediately following can bend inducing base pair mismatching at the RNase H site. Contacts to RT stabilize this irregular structure, blocking access of RNaseH catalytic residues to the RNA.

vRNA plasticity in reverse transcription initiation

vRNA plays a regulatory role in many stages of viral replication-reverse transcription, RNA transcription and transport, viral assembly and maturation [19, 20, 21]. The vRNA folds into secondary and tertiary structures with potential long-range interactions [22,23 • ]. The polymerase active site can only accommodate single-stranded RNA or DNA. Thus, by creating a network of complex secondary and tertiary structures along its full length, the vRNA would possibly create energetic barriers to regulate reverse transcription.

The 5′ UTR of HIV genomic RNA is highly conserved and harbors elements central to viral replication, including the PBS sequence where tRNA Lys 3 hybridizes and the dimerization initiation site (DIS) [24,25]. An elegant biochemical, biophysical and structural work has defined the conformations of regions of the 5′UTR around the DIS [26], which can transition between at least two conformational states [27, 28, 29]. In a monomeric state, the DIS is sequestered by base pairing to an upstream region called U5 [30,31 •• ]. In the second state, the DIS palindromic sequence is exposed, leading to a kissing loop dimerization captured via cryoEM by Zhang et al. [32 • ]. Brigham et al. showed that host tRNA Lys 3 stabilizes kissing loop dimerization, but not the extended dimer [33 •• ]. The addition of nucleocapsid (NC), a viral RNA chaperone that melts RNA structures, stabilized the formation of an extended dimer independent of tRNA Lys 3 or heat annealing.

As discussed above, RT transiently binds the template/primer duplex at the PBS site. By employing single-molecule Förster Resonance Energy Transfer (smFRET) studies, Coey et al. show that tRNA 5′ end stabilized RT on the PBS helix and decreased RT-RNA dissociation rates [34 •• ]. Another contributor to the non-processive nature of initiation is H2, which poses a significant energetic barrier and must melt to single-stranded RNA for reverse transcription to proceed. Coey et al. showed that RT binding stabilizes H1 and reduces global structural heterogeneity, potentially destabilizing H2 as well ( Figure 4 a–c). Specifically, H2 is hypothesized to contribute to the characteristic pause after incorporation of 3 deoxynucleotides during reverse transcription initiation [35,36]. In Larsen et al. the authors captured the paused RTIC with a +3 elongated primer, elucidating conformational heterogeneity of the vRNA-tRNA Lys 3 by both cryoEM and FRET studies [37 •• ]. The H2 helix is lowered into the RT polymerase active site, shifting the PBS helix along the nucleic acid binding cleft into three distinct conformations stabilized by RNase H domain contacts. These results demonstrated how H2 creates a steric barrier for rapid polymerization by pushing the 3′ end of the primer strand away from the RT active site. Furthermore, RT has been shown to adopt productive or unproductive flipped binding orientations, with the flipped orientation dominating until RT incorporates the sixth nucleotide [38]. The tendency of RT to bind in a flipped orientation likely contributes to the slowness of reverse transcription initiation.

Figure 4

vRNA structure modulation by RT binding at initiation. (a) Schematics for smFRET experimental setup to measure the H1 helix formation by fluorescently labeling the vRNA construct at 3′ and 5′ of H1. Histograms of population densities show that H1 (high FRET) is stabilized in presence of RT (adopted from Coey et al. [34 •• ]). (b) Schematics for smFRET experimental setup to measure the global fold of the initiation complex, with vRNA construct labeled at 5′ of H1 and at the 5′ of tRNA. Histograms of population densities show that upon RT binding the global conformation shifted towards a lower FRET state, indicating a stabilization of a global conformation for the initiation complex. (c) Schematics for smFRET experimental setup to measure the global fold of the initiation complex, switching the vRNA construct label to the 3′ of H1. The results using this labeling scheme showed again that upon RT binding there is a stabilization of the initiation complex formation.

tRNA lysine3′s role in initiation

tRNA Lys 3 plays a central role in reverse transcription. The cryoEM RTIC structure presents an elongated tRNA Lys 3 helix, contrary to its canonical three-way junction structure. The PBS sequence of the vRNA is specific to tRNA Lys 3, yet HIV-1 selectively packages exactly two copies of vRNA and an estimated 20–770 tRNA copies into encapsulated viral particles [39,40 • ,41 • ,42]. The function of the tRNA high copy number is unknown; however, in all HIV virions, specific tRNA Lys 3 annealing to the PBS is essential for reverse transcription initiation. Mutated viruses can use other tRNAs to prime reverse transcription if the HIV PBS is mutated to be complementary to the new tRNA’s 3′ end; however, these virions have slower rates of viral maturation and their PBS sequences eventually revert back to complement tRNA Lys 3, unless the A-rich loop is mutated to be complementary to the tRNA anticodon loop, suggesting additional important vRNA-tRNA Lys 3 interactions outside of the PBS [43]. Biochemical evidence also supports potential tRNA Lys 3 interactions beyond the A-rich loop, including C-rich region, and primer activation signal (PAS) of the vRNA that are important for reverse transcription efficiency [44, 45, 46]. These interactions likely rely on tRNA Lys 3 post-transcriptional base modifications mcm 5 s 2 U34 and ms 2 t 6 A37 for improved stability, infectivity, and transcript production [44,47,48]. In the RTIC structure, the tRNA Lys 3 refolds to possess an extended conformation; however, natural base-modified tRNA Lys 3 in the RTIC could stabilize alternate conformations and promote long-range intermolecular interactions with the vRNA, highlighting a role for these posttranscriptional modifications.

tRNA participates in both the first (−) and the second (+) strand transfer. In the first strand transfer tRNA has been proposed to bind to the 3′UTR of vRNA and facilitate the first (−) strand transfer [49]. Disrupting these long-range RNA–RNA interactions by mutagenesis reduces the efficiency of strand transfer both in vitro and in vivo [49,50]. In addition, tRNA Lys 3 base modifications are important for the efficiency and fidelity of (+) DNA strand transfer [51]. The dynamic role of tRNA throughout the process of HIV reverse transcription remains a fascinating topic for further investigation.

Open questions

HIV-1 reverse transcription structural biology

Although the recent RTIC structures illuminate reverse transcription initiation, the details are obscured by the current resolution limits of these structures. The transition between initiation and elongation phases of reverse transcription also requires further characterization. How does polymerization occur during initiation and how does it compare to the elongation phase? Does the RNase H activity of RT serve a cooperative or anti-cooperative role during nucleotide incorporation? Do multiple RTs work together to accomplish these activities? These are some of the questions that could be addressed by solving higher-resolution structures and characterizing intermediate states during the process of reverse transcription.

RNA structure is likely the central regulator of the progression of reverse transcription; How does RNA structure evolve and change during the process of reverse transcription initiation and subsequent elongation? How do these structural changes relate to reverse transcription rates during each step? While short vRNA constructs have been shown to function both as a template and as a regulator of reverse transcription initiation and elongation, the process of RT initiation must be explored in the context of the full genomic RNA. How does higher order structure in the context of dimeric vRNA affect initiation and elongation of RT? What is the role of capsid integrity during reverse transcription? What role(s) do other factors in the virion play in modulating RNA structure? Specifically, how does the abundant NC RNA chaperone affect the structure and dynamics of the vRNA? These open questions highlight many exciting avenues for retroviral research to understand and inhibit this complex virus.

Juxtaposition of HIV-1 RTICs and related systems

The principles involved in HIV-1 reverse transcription can provide insights into cellular processes that share mechanisms with retroviruses. Retrotransposons, which are mobile genetic elements, use a system of transcription, reverse transcription, and integration to move around the human genome. Additionally, a subset of retrotransposons known as Human Endogenous Retroviruses can be enveloped and pass from cell to cell similar to retroviruses like HIV [52,53]. Work to understand the details and the consequences of retrotransposon activity have linked it to various aspects of human health ranging from placental morphogenesis to diseases like cancer and haemophilia [53, 54, 55, 56]. Similarly, telomerase uses an RNA template and reverse transcriptase to protect chromosome ends during genome replication. A recent structure of a telomerase complex shows that its global architecture differs from the HIV-1 RTICs; however, the telomerase complex also utilizes highly structured RNA as a template and regulator to synthesize the DNA repeats of the telomere [57 • ,58]. These similarities could allow knowledge gained from HIV-1 RTIC characterization to further our understanding of telomerase complexes and the role of RNA structure in telomere protection.

Implications for drug design

Many antiretroviral drugs target reverse transcription. Nucleoside reverse transcriptase inhibitors (NRTIs) incorporate directly at the 3′ end of cDNA and block polymerization. In contrast, NNRTIs bind to RT allosterically and cannot be incorporated into the growing cDNA strand [59]. Multiple RT structures bound to NNRTIs and/or NRTIs together in complex with RNA/DNA and DNA/DNA duplexes have revealed the molecular mechanism of RT inhibition during elongation by directly or indirectly perturbing the RT polymerase active site [16,60, 61, 62]. Yet the slow kinetics of reverse transcription initiation can increase its vulnerability to drugs and constitute another inhibition mechanism. There is evidence that nevirapine, an NNRTI, targets initiation [63], but such an inhibition mechanism remains uncharacterized. Further biochemical and structural data of RTIC bound to drugs could direct the development of improved HIV drugs and provide insight into mechanisms of resistance.

All virions need to replicate their genetic material to be infectious. This requires a polymerase bound to either RNA or DNA for transcription. HIV drugs that target either RNA and/or DNA polymerization may be used to inhibit viruses with similar transcription mechanisms like tenofovir, an HIV NRTI used to treat HSV. Remdesivir is a promising antiviral that is currently in phase III clinical trials for SARS-Cov-2. The inhibition mechanism of SARS-Cov-2 RNA dependent RNA polymerase by Remdesivir is similar to azidothymidine (AZT) inhibition of HIV-1 RT [64 • ], as demonstrated by recent structural work on an RNA-replicase complex. Remdesivir and AZT, are both NRTI adenosine analogs that target the polymerase active site, requiring the coordination of two Mg 2+ ions by aspartic acid residues for productive inhibition. In the face of the COVID-19 pandemic, similarities between the viruses highlight potential for using HIV-1 antivirals to treat COVID-19 patients, or as a starting point for designing drugs that effectively target the SARS-COV-2 RdRp active site. To tackle these devastating pathogens, the role of RNA–protein interactions and RNA structure must be understood. The interplay of structural and dynamic methods, grounded in rigorous in vivo and in vitro biochemistry, will unravel the pressing mysteries of viral RNAs.

Conflict of interest statement

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as: