Fragile protein folds: sequence and environmental factors affecting the equilibrium of two interconverting, stably folded protein conformations

Abstract Recent research on fold-switching metamorphic proteins has revealed some notable exceptions to Anfinsen's hypothesis of protein folding. We have previously described how a single point mutation can enable a well-folded protein domain, one of the two PAS (Per-ARNT-Sim) domains of the human ARNT (aryl hydrocarbon receptor nuclear translocator) protein, to interconvert between two conformers related by a slip of an internal β strand. Using this protein as a test case, we advance the concept of a “fragile fold”, a protein fold that can reversibly rearrange into another fold that differs by a substantial number of hydrogen bonds, entailing reorganization of single secondary structure elements to more drastic changes seen in metamorphic proteins. Here we use a battery of biophysical tests to examine several factors affecting the equilibrium between the two conformations of the switching ARNT PAS-B Y456T protein. Of note is that we find that factors which impact the HI loop preceding the shifted I β strand affect both the equilibrium levels of the two conformers and the denatured state which links them in the interconversion process. Finally, we describe small molecules that selectively bind to and stabilize the wild-type conformation of ARNT PAS-B. These studies form a toolkit for studying fragile protein folds and could enable ways to modulate the biological functions of such fragile folds, both in natural and engineered proteins.


Introduction
Anfinsen's hypothesis, which states that a protein's primary sequence encodes a unique fold or conformation, has dominated the study of protein folding for almost 50 years (Anfinsen, 1973). However, it is increasingly clear that a range of exceptions to the "one sequence, one fold" concept are widely found in biology. One counterexample is the intrinsic disorder found in a significant number of functional proteins and protein regions. These intrinsically disordered regions do not adopt a stable three-dimensional structure, instead existing as conformational ensembles of states that may include pre-formed structural nuclei (Dyson, 2016). Other counterexamples are provided by proteins which interconvert among multiple folded states, ranging from a "fragile fold", in which substitution of a few amino acids -even one -results in the domain co-existing in two states Ha and Loh, 2012) to the general concept of a "metamorphic protein", which is one that can reversibly adopt different stable folds in different environmental conditions (Murzin, 2008).
We consider a protein or domain to have a fragile fold if it can populate two stable folds and interconvert reversibly between them, doing so via breaking and reforming a substantial number of hydrogen bonds. This concept differs from that of simple protein conformational switches (Gerstein and Echols, 2004), mainly in the extent of hydrogen bond network remodeling. For example, we previously reported the Y456T mutant of PAS-B (Per-ARNT-Sim) domain of the human ARNT (aryl hydrocarbon receptor nuclear translocator) protein, which undergoes a three-residue β-strand slip requiring 15 of the 26 inter-strand hydrogen bonds in the β sheet to be broken and re-formed while still retaining the same overall topology ). In metamorphic proteins, which can be considered an extreme case of a fragile fold, an even greater number of hydrogen bonds differ between the two folds as seen in lymphotactin (which alternates between a conventional chemokine fold and a dimeric β sandwich; Kuloglu et al., 2002;Tuinstra et al., 2008). These studies suggest that protein folds may hide unforeseen flexibility or fragility which can be trivially accessed by small changes in sequence or environmental conditions, facilitating the evolution of new folds and protein domains (Yadid et al., 2010;Tuinstra et al., 2008;Alexander et al., 2009;Dishman et al., 2021).
To experimentally probe such fragile folds, we here use the abovementioned ARNT PAS-B Y456T point mutant as a model system. ARNT is a eukaryotic bHLH PAS (basic helix-loop-helix -Per-ARNT-Sim) transcription factor that dimerizes with other bHLH PAS proteins (e.g., AhR (aryl hydrocarbon receptor), HIF-α) to bind DNA and regulate gene expression in response to a varied set of stimuli (Labrecque et al., 2013). In addition to the bHLH DNA binding module and C-terminal coactivator domains, these proteins contain two PAS domains (PAS-A and PAS-B), which are essential for heterodimerization (Erbel et al., 2003;Chapman-Smith et al., 2004;Wu et al., 2015). The typical PAS domain fold is 100-120 residues in length and consists of several α helices packed against one side of a five-stranded antiparallel β sheet, often enclosing a ligand-binding cavity (Card et al., 2005;Wu et al., 2015Wu et al., , 2019Bisson et al., 2009) (Fig. 1a). PAS domains provide binding sites for many diverse binding partners, including other PAS domains (Cardoso et al., 2012;Erbel et al., 2003;Huang et al., 2012;Scheuermann et al., 2009;Wu et al., 2015), adjacent helices (Harper et al., 2003;Nash et al., 2011;Rivera-Cancel et al., 2014), and coiled-coil coactivators (CCCs) (Guo et al., 2013(Guo et al., , 2015. Notably, changes in the occupancy or configuration of bound small-molecule ligands can modulate the protein-protein interactions of many PAS domains (Scheuermann et al., 2009), opening the door to natural and artificial control of signaling pathways.
While designing point mutants to disrupt the interactions between the ARNT PAS-B domain and other binding partners, we fortuitously discovered that the Y456T variant existed in a slow conformational equilibrium between two equally populated conformations ). Solution NMR studies revealed that the new conformation primarily differs from the native fold by a +3 slip in register and accompanying inversion of the central Iβ strand (Fig. 1b); hence, we termed the conformations "WT" and "SLIP." Interestingly, the SLIP conformation does not bind HIF-2α PAS-B , consistent with the involvement of the ARNT PAS-B β sheet in this interaction (Scheuermann et al., 2009;Wu et al., 2015). We subsequently discovered 105 Å 3 of interconnected internal cavities in a high-resolution WT crystal structure (Guo et al., 2013), which are likely to have collapsed in the SLIP conformation ( Fig. 1b) (Xu et al., 2021). Additionally, interconversion between the two conformations appears to require global unfolding, as revealed by data from a combination of solution NMR approaches, spanning real-time NMR ) to high-pressure perturbation of the equilibrium (Xu et al., 2021) and as might be expected from the large number of hydrogen bonds disrupted by the β-strand slip (Fig. S1 in the Supplement).
Here we more broadly examine other factors that determine the equilibrium populations of these two folds, with interest in both characterizing these effects and artificially manipulating the interconversion as a possible route to engineering novel switchable proteins. We start by characterizing the temperature dependence of the interconversion process, allowing us to extract thermodynamic parameters of it. We also show that several features of the HI loop region, which precedes the shifting Iβ strand, can also influence the WT / SLIP ratio. In addition, we demonstrate mutations in this region, and the Hβ strand proceeding the HI loop can also affect the denatured state of the protein, resulting in differential refolding to one of the two conformations during interconversion. Lastly, we identify compounds that can preferentially bind one conformer over another, giving us the ability to shift the relative populations between these two conformations. Many of the approaches discussed here exemplify ways of studying and manipulating other fragile folds or metamorphic proteins, both in natural and engineered contexts as ligand-regulated sensors and switches (Ha and Loh, 2017).

Cloning, expression, and purification of ARNT PAS-B mutants
Plasmid DNA encoding the human ARNT PAS-B domain (residues 356-470) was used to introduce various point mutations, using the QuikChange site-directed mutagenesis kit (Stratagene). Following PCR amplification and digestion of  Card et al., 2005), highlighting the F444, F446, and Y456 residues with space-filling representations. Internal cavities unique to the WT conformation (total volume: 105 Å 3 ) identified from a subsequent crystal structure (PDB / 4EQ1; Guo et al., 2013) are shown in grey wireframe. A tobacco etch virus (TEV) protease site was inserted into the HI loop as part of this study; the insertion site in labeled in pink. (b) Schematics of the Iβ-strand residues in ARNT PAS-B wild type (blue sticks) and F444Q/F446A/Y456T (orange sticks; PDB / 2K7S;  structures. The external (solvent) face is above each strand and the internal side is below each strand. The location of the WT-specific internal cavities shown for reference in both WT and F444Q/F446A/Y456T. the parental template with DpnI restriction enzyme, using the manufacturer's instructions, the modified plasmid was transformed into the pHis-parallel bacterial expression vector (Sheffield et al., 1999). The accuracy of each mutation was verified by sequencing. ARNT PAS-B mutant constructs were transformed into Escherichia coli BL21(DE3) cells. For isotopically labeled protein expression, cultures were grown in M9 minimal media containing 1 g L −1 15 NH 4 Cl for uniformly 15 N-labeled samples or 1 g L −1 15 NH 4 Cl and 3 g L −1 13 C glucose for uniformly 13 C-and 15 N-labeled samples (at 37 • C). Once the cell density (A 600 ) measured 0.6-0.8, protein expression was induced with 0.5 mM isopropyl β-D-thiogalactoside. Following 15 h at 20 • C, cell pellets were harvested and resuspended in 20 mL of 50 mM Tris (pH 7.5, tris(hydroxymethyl)aminomethane), 15 mM NaCl, and 20 mM imidazole. Cells were lysed by high-pressure extrusion, centrifuged, and filtered using a 0.45 µm pore-size filter. The supernatant was loaded over a Ni 2+ -NTA affinity column (with NTA representing nitrilotriacetic acid) and eluted using a linear 20-500 mM imidazole gradient. The resulting sample was exchanged into an imidazole-free buffer (50 mM Tris (pH 7.5), 15 mM NaCl) and incubated overnight in the presence of His 6 -TEV protease. Following His 6 -tag cleavage, the remaining protein was purified away from free His 6 -tag and His 6 -TEV with an additional pass over a Ni 2+ -NTA column and concentrated in an Amicon pressure-driven ultrafiltration cell (MilliporeSigma, Burlington, MA) with YM-10 (10 kDa) filters. The resulting protein contains a four-residue vector-derived N-terminal cloning artifact, GAMD, plus residues 356-470 of ARNT PAS-B.

NMR analysis of ARNT PAS-B Y456T equilibrium constant
Solution NMR studies of ARNT PAS-B and its mutants were conducted in 50 mM Tris, 17 mM NaCl, and 5 mM 1,4dithiothreitol (DTT). To determine the effects that buffer and pH have on the stability of each conformation, ARNT PAS-B Y456T was exchanged into either 50 mM Tris (pH range 7-9) or piperazine-N,N -bis(2-ethanesulfonic acid) (PIPES) (pH range 6-7.5), 17 mM NaCl, and 5 mM DTT. 15 N/ 1 H HSQC (heteronuclear single quantum coherence) spectra of 200 µM protein were recorded more than 20 h post-exchange to allow adequate time for any perturbations to the conformational equilibrium to be established. To determine the effects of temperature on the equilibrium constant, 15 N/ 1 H HSQC spectra of 200 µM ARNT PAS-B Y456T in 50 mM Tris, 17 mM NaCl, and 5 mM DTT were recorded after incubation for a sufficient time (2-4 h) to ensure equilibrium was established at temperatures between 278 and 333 K in 5 K increments. All NMR spectra were acquired on Varian Inova (Palo Alto, CA) and Bruker Avance III (Billerica, MA) spectrometers, processed using NMRpipe (Delaglio et al., 1995) and NMRFx (Johnson, 2018;Norris et al., 2016) and analyzed using NMRViewJ (Johnson and Blevins, 1994;Johnson, 2018). We measured the WT / SLIP conformational 66 X. Xu et al.: Effects of ligands and other factors on a fragile PAS fold equilibrium as the relative volume of several well-dispersed peaks originating from each conformation.

NMR analysis of unfolded state of ARNT PAS-B mutants
We populated the unfolded state of the wild-type, Y456T, and F444Q/F446A/Y456T ARNT PAS-B proteins by urea denaturation. HNCO, HN(CA)CO, HNCACB, and CBCA(CO)NH spectra were collected at 25 • C on 500 µM uniformly 13 C-and 15 N-labeled protein in 8 M urea, 50 mM Tris (pH 7.5), 20 mM NaCl, and 5 mM β-mercaptoethanol and were used to assign ∼ 95 % of the HN, N, C , Cα, and Cβ resonances. We then used HN, N, Cα, and Cβ chemical shifts to predict S 2 order parameters using the TALOS+ program (Shen et al., 2009) and the Cα and Cβ chemical shifts to predict residual secondary structure using the SSP program (Marsh et al., 2006). Chemical shift assignments for the urea-denatured states of the proteins have been submitted to the BMRB (Biological Magnetic Resonance Bank) database as noted in the "Data availability" section.

NMR-based evaluation of small-molecule binding to ARNT PAS-B WT and F444Q/F446A/Y456T variant
Small-molecule ligands for ARNT PAS-B wild-type protein were identified in a previously reported NMR-based fragment screen (Guo et al., 2013). In summary, this screen utilized 15 N/ 1 H HSQC spectra to monitor ligand binding as overviewed in Fig. 4a, based on comparisons of spectra of 15 N-labeled ARNT PAS-B wild type recorded in isolation and in the presence of potential small-molecule ligands. We assembled a library of 762 chemical fragments (average molecular weight (MW) of 203 Da, ±73 Da SD), initially screening these in pools of five (250 µM protein, 1 mM ligand), with hits identified as pools which produced substantial changes in 15 N/ 1 H peak locations or intensities. Compounds from such pools were subsequently individually screened with single-point (500 µM) additions to 15 N-labeled ARNT PAS-B, with individual hits titrated over six or more concentrations to establish initial estimates of affinity and binding site location. As noted in Guo et al. (2013), 10 compounds from this screen were considered to be ARNT PAS-B binders; data supporting this are shown in titrations shown in Fig. 4 (KG-548) and Fig. S7. All 10 of these compounds were then counterscreened against the F444Q/F446A/Y456T ARNT PAS-B variant, with 15 N/ 1 H HSQC spectra recorded of 360 µM 15 N-labeled protein and 500 µM ligand.

KG-548 and KG-655 titration analyses
To determine the binding affinity for compound KG-548, we conducted a 15 N/ 1 H HSQC titration series with increasing concentrations of 25,50,100,200,500, and 1000 µM) against a constant 200 µM concentration of ARNT PAS-B Y456T. 15 N/ 1 H HSQC spectra were recorded 24 h after samples were mixed to allow sufficient time to reach equilibrium. The concentration of ligand-bound WT conformation (C WT,B ) was derived from peak intensities, plotted versus the ligand concentration (C L ), and fit to the following equation to derive dissociation constant (K d ) and maximal binding (B max ): We attempted to determine the concentration of ligand-bound protein directly from peak intensities; however, this analysis was complicated by the broadening of WT peaks in the presence of KG-548. In order to overcome this challenge, we used the following protocol to determine the concentration of ligand-bound WT conformation. We measured peak volumes for those peaks that showed no chemical shift changes, which report on both bound and unbound states of the protein. The relative peak volumes of the peaks corresponding to each conformation indicate the relative concentrations of total WT and SLIP conformations but not the ligand-bound and ligand-free concentrations. We calculated the concentration of the SLIP conformation (C +3 ), which does not bind compound appreciably, and the total concentration of WT conformation (C WT,T ) at each concentration of compound by comparing these peak volumes (V WT , V +3 ) to those in the absence of compound: Then, assuming that the SLIP conformation remains in the same equilibrium with the unbound WT conformation as in the absence of compound, we estimated the concentration of ligand-bound WT conformation as Most of the residues used in the titration analysis for KG-548 showed chemical shift changes and peak broadening in the presence of KG-655, rendering them unusable for the method described above. We therefore conducted 13 C/ 1 H HSQC titration experiments instead. We monitored the L391 δ1 methyl peaks which are particularly upfield shifted and well resolved for both WT and SLIP conformations . L391 is also a good probe to extract and compare binding affinities of KG-548 and KG-655 as it is not directly involved in the binding of either compounds (and thus has minimal chemical shift changes observed in the presence of KG-548 or KG-655). We conducted the 13 C/ 1 H HSQC titration series with increasing concentration of KG-655 (0, 500, 1000, 2500, 5000, 10 000 µM) against 250 µM concentration of ARNT PAS-B Y456T and extracted C L and K d as described above, assuming the equilibrium between the unbound WT conformation and the SLIP conformation remains unchanged at all ligand concentrations. For comparison, we additionally conducted a 13 C/ 1 H HSQC titration series for KG-548 (0, 500, 1000, 2000, 3000, 4000 µM) against 250 µM concentration of ARNT PAS-B Y456T. We attempted to match the concentration of KG-548 to the concentrations used for KG-655, but we were unable to exceed a maximum of 4000 µM for KG-548 without substantially increasing DMSO (dimethyl sulfoxide) concentration above 2 % DMSO.

Impact of temperature and ionic strength on conformational equilibrium
To characterize the thermodynamics of the WT / SLIP conformational change, we used 15 N/ 1 H HSQC spectra of ARNT PAS-B Y456T to calculate the populations of WT and SLIP states at temperatures between 278 and 328 K (Fig. S2). Below 300K, the conformational equilibrium is linearly dependent on temperature from 275 to 303 K (Fig. S2a), with a progressive bias towards the WT conformation with increasing temperature, suggesting that the conversion process is endothermic. Using data for 303 K and below, we plotted the natural logarithm of the equilibrium constant versus the inverse of the temperature and used the linear region of the plot to determine H (6.8 kcal mol −1 ), S (23.9 cal mol −1 K −1 ), and G 298 K (−0.31 kcal mol −1 ) of the SLIP to WT conversion (Fig. 2b), confirming that it is indeed endothermic but slightly favored overall due to entropic considerations. Above 300 K, we observed a leveling of the equilibrium at approximately 2 : 1 (WT / SLIP) (Fig. S2b). We attribute this to protein aggregation at higher temperatures, as seen by a decrease in peak intensity across the 15 N/ 1 H HSQC spectrum, likely due the partial unfolding during interconversion between WT and SLIP conformations Xu et al., 2021).
Studies on several proteins that slowly interconvert between two different conformations, such as lymphotactin (Tuinstra et al., 2008) and drk SH3 (Zhang and Forman-Kay, 1995), have shown that solvent conditions such as counterion identity and ionic strength can readily shift such equilibria. Therefore, we determined whether changes in such parameters similarly affect the equilibrium between the two folded states of ARNT PAS-B Y456T. We assessed the impact of changing buffers and pH values (either Tris or PIPES buffers, over total range of pH 6.0-9.0), along with salt concentrations (50-200 mM NaCl) by using 15 N/ 1 H HSQC spectra to determine the relative populations of the two states (Fig. S3). We discovered that these effects were minimal, as no substantial (greater than a 5 % deviation) changes were observed for independent pH and salt titrations. These results are consistent with the relatively similar amino acid types exposed to solvent in the two conformations of the Iβ strand.

Impact of HI loop length and sequence on conformational equilibrium
We previously reported that additional mutations in the Y456T background can alter the WT / SLIP equilibrium (Table 1, left column). At an extreme, we could lock the protein into the SLIP conformation by adding the F444Q and F446A point mutations on the Hβ strand to the Y456T background, enabling structural characterization of the SLIP conformation . Another potential contributor to the WT / SLIP equilibrium is the HI loop (residues 447-454), which connects the Hβ and the Iβ strand (Fig. 1a). This loop is highly flexible in the wild-type ARNT PAS-B domain, showing intermediate exchange broadening of NMR signals at several sites (Card et al., 2005). In the SLIP conformation, the HI loop shortens and packs more tightly against the domain core; as well, the N448-P449 peptide bond within the loop isomerizes from trans (WT) to cis (SLIP). We have previously shown that although the P449 residue has minor impact on the equilibrium WT / SLIP levels, this residue plays a critical role in the kinetics of interconversion . To further explore the role of P449, we investigated several changes to P449 here (Table 1, right column). First, we generated a P449A point mutant to proline to keep the 448-449 peptide bond in a trans configuration and provide more flexibility in the HI loop, both of which are normally associated with the WT conformation. In the 15 N/ 1 H HSQC spectrum of this mutant, multiple peaks were observed for each amide residue (approximately threefold more than in wild type), where none of which overlap with SLIP peaks (Fig. S4). This suggests that the added flexibility by the P449A mutation destabilizes the WT conformation and allows ARNT PAS-B to sample and adopt new stable conformations different from the previously characterized SLIP state. In contrast, mutating P449 to either Ala or Gly in the Y456T background retained the two-state WT / SLIP equilibrium but shifted it from the initial 50 : 50 value to 32 : 68 or 9 : 91, respectively. Interestingly, a P449A mutation in the SLIP-locked triple mutant (F444Q/F446A/Y456T) background repopulated a small fraction of the WT conformation, reverting the WT / SLIP equilibrium to 7 : 93. These results suggest that the identity of the P449 residue impacts the WT / SLIP equilibrium, albeit in a role secondary to mutations in the β-sheet residues. 15 N/ 1 H HSQC spectra of the newly generated ARNT PAS-B mutants P449G/Y456T and F444Q/F446A/P449A/Y456T are shown in Fig. S5.
Having found that mutations that increased flexibility at the HI loop proline could affect the relative populations of the WT and SLIP conformations, we wanted to test whether the length of the HI loop itself could contribute to the equilibrium. By integrating a TEV protease site into the middle of the HI loop (E-N-L-Y-F-Q, inserted between Y450 and S451), we could examine how booth this insertion and the subsequent removal of the covalent linkage would allow the  < 1 > 99 * Mutation P449A resulted in a complex mixture of WT and three additional non-SLIP conformations, populated to 60 %, 14 %, and 12 % (detailed in Fig. S4).
Iβ strand to slip while keeping the protein folded (Fig. 1a, TEV insertion site on the HI loop shown in pink). NMR characterization of Y456T with this loop insertion showed that the equilibrium between WT and the SLIP conformation was shifted to 20 : 80, suggesting that the addition of six residues in the HI loop favors slippage of the Iβ strand and adoption of the SLIP conformation. Notably, after cutting with TEV protease, cleaving between the newly introduced glutamine and S451, the equilibrium between the two conformations remained unchanged (Fig. S5). The sample was also applied to a MonoQ column to separate the two conformations, as we previously demonstrated ); immediately after injection onto the column, a precipitate formed, rendering it impossible to carry out the separation. We interpret the precipitation as arising from dissociation of the Iβ-strand peptide, as it is not observed either for this construct prior to TEV protease treatment nor for the wild-type protein. These results confirm that length in the HI loop influences the conformation of the Iβ strand.

Mutations that promote SLIP conformation also affect the denatured state of ARNT PAS-B
Since the interconversion between WT and SLIP conformations proceeds through a chiefly unfolded transition state Xu et al., 2021), we hypothesized that the structure of the denatured state of the ARNT PAS-B may bias the WT / SLIP equilibrium. To address this possibility, we recorded triple-resonance NMR experiments on ARNT PAS-B wild type, Y456T, and F444Q/F446A/Y456T in 8 M urea (a concentration at which most of the protein should be unfolded, Fig. S6) and used these data to assign the majority of backbone resonances for the three variants. From these data, we predicted backbone amide N-H S 2 order parameters and secondary structure propensities in the unfolded state from measured chemical shifts using TALOS+ and SSP (Shen et al., 2009;Marsh et al., 2006), respectively. The predicted order parameters are consistent with all three proteins being highly flexible ensembles, with the majority of residues (85/119, 71 %) displaying S 2 values < 0.6 and only 11 residues displaying S 2 values > 0.75. Consistent with this denatured state, the SSP algorithm predicted low amounts of residual secondary structure for all three proteins, with most residues populating from 0 % to 25 % extended structure (Fig. 3). The predicted order parameters are very similar for all three variants; the greatest difference is between residues 440 and 450 (Hβ strand and HI loop), where the denatured F444Q/F446A/Y456T mutant (SLIP conformation) displays more order than the wild-type or Y456T proteins. In addition, the triple mutant has decreased extended structure propensity in the same region. Some of these differences may be due to the mutations present in this region; this is more likely for the secondary structure propensity, which also detects a difference between the mutated and wild-type proteins in the region around residue 456 (also mutated). However, the S 2 values mainly differ in the Hβ-strand region and also follow a rank order of (F444Q/F446A/Y456T)>Y456T>wild type for residues 438-442. These findings show that the mutations that favor the SLIP conformation in the folded state also affect the residual structure in the unfolded state.

Compounds can preferentially bind one of the two ARNT PAS-B conformations
PAS domains can bind a variety of natural and artificial small-molecule ligands (Henry and Crosson, 2011), many of which confer regulatory control. Combined with the observation of internal cavities within ARNT PAS-B (Guo et al., 2013), we tested whether small, artificial ligands could preferentially bind to either the WT or SLIP conformations of ARNT PAS-B. We did this by counterscreening previously identified WT-binding ligands against the SLIPlocked F444Q/F446A/Y456T variant. The source of these ligands was a NMR-based screen of a library of 762 chemical fragments (MW of 203 Da on average, ±73 Da SD) which we previously used in similar screens of a variety of targets (Amezcua et al., 2002;Best et al., 2004;Guo et al., 2013;Scheuermann et al., 2009). For ARNT PAS-B, the screen started with initial 15 N/ 1 H HSQC spectra of 15 Nlabeled protein with pools of five fragments (1 mM each), which were ranked in order of the largest ligand-induced chemical shift and peak intensity perturbations compared to the apoprotein (manually exempting pools which appeared to lead to protein denaturation). Pools showing substantial changes were then deconvoluted into individual compounds which were independently added to identify ARNT PAS-B WT binders (Fig. 4a). Eighteen hits were identified and were tested in separate titration experiments: 10 of which with good solubility were considered to be ARNT PAS-B binders (Guo et al., 2013). To test whether these 10 ligands are specific to only the WT conformation of ARNT PAS-B, we collected 15 N/ 1 H HSQC spectra of the F444Q/F446A/Y456T variant (360 µM) mixed with the ligands (500 µM). Interestingly, none of the ligands showed chemical shift perturbations to the variant (example NMR spectra are shown in Fig. 4b; the rest of the screen results are shown in Fig. S7), indicating all these compounds are WT specific. One of the tested compounds, KG-548 (5-(3,5bis(trifluoromethyl)phenyl)tetrazole), exhibited slow exchange behavior when titrated against WT ARNT PAS-B, with over 30 amide sites substantially broadened in the 15 N/ 1 H HSQC spectrum, (Fig. 4b, middle). The affected peaks arose from residues that mostly localized to the β sheet (Fig. 5a). Interestingly, KG-548 was shown to be the strongest disruptor of ARNT PAS-B and CCCs interaction in the previous study (Guo et al., 2013), suggesting that its binding affects the β-sheet surface. More recently, through high-pressure NMR analysis, complemented by site-directed mutagenesis studies, we confirmed KG-548 to be a surface binding ligand, interacting hydrophobically with residues I364 and I458 on the external β-sheet surface of wild type ARNT PAS-B (Gagné et al., 2020), which would explain the preferential binding, as I458 is flipped inward to the core of the protein in the SLIP conformation (Figs. 1b and S1b).

KG-548 and KG-655 drive the conformational equilibrium towards the WT state
Since KG-548 preferentially binds the wild-type ARNT PAS-B, the addition of this compound to Y456T should drive the 50 : 50 (WT / SLIP) equilibrium towards the WT conformation. We titrated increasing concentrations of KG-548 (0-1000 µM) against 200 µM ARNT PAS-B Y456T, observing a shifted equilibrium of 77 : 23 (WT / SLIP) at the highest ligand concentration (Fig. 5b). Due to the broadening of several peaks in the presence of the compound, we analyzed only those peaks that were not affected by ligand binding. Despite the difficulties posed by the broadening of peaks of interest (see Methods, Sect. 2.5), we determined the binding affinity (K d = 684±33.1 µM) and maximum binding (B max = 111±7.1 µM, comparable to ∼ 100 µM WT conformation in 200 µM ARNT PAS-B Y456T sample) of KG-548 for the WT conformation of ARNT PAS-B Y456T (Fig. 5c). Another ligand we characterized in detail was KG-655 (3,5-bis(trifluoromethyl)phenol), a fragment of KG-548, which was also shown to disrupt ARNT PAS-B binding to TACC3 (Guo et al., 2013). We previously showed that this ligand binds to two sites of wild-type ARNT PAS-B, both to the external side of the β-sheet surface and internally to the core of the domain (Gagné et al., 2020). We again performed titration experiments with increasing concentration of KG-655 (0-10 mM) and expectedly saw binding specificity towards the WT conformation, similar to KG-548 (Fig. S8a). Due to extensive chemical shift changes and broadening of many amide peaks in the presence of KG-655, we turned to 13 C/ 1 H-HSQC titration experiments to extract binding affinities (see Methods, Sect. 2.5). We chose to monitor the upfield-shifted L391 δ1 methyl signals, because these peaks were well resolved in 1D and 2D NMR spectra and have been previously used to monitor the relative population changes between the two conformations Xu et al., 2021) (Fig. S8b). As an initial control, we calculated the dissociation constant (K d = 414 ± 7.1 µM) of KG-548 using this approach, comparable to the number reported above (K d = 684 ± 33.1 µM). With the same method, we extracted a K d of KG-655 (1947 ± 152 µM) to ARNT PAS-B Y456T (Fig. S8c). Despite having extremely low binding affinity, the binding of KG-655 undoubtedly shifted the equilibrium towards the WT conformation. Interestingly, the surface binding of KG-655 to the WT conformation of ARNT PAS-B Y456T appeared to be abolished, leaving internal binding as the only binding mode for KG-655 (Fig. S9). We posit this is potentially due to the loss of necessary interactions between the ligand and residue Y456. Taken together, these data confirm our previous findings that compounds can preferentially bind to a specific conformation of ARNT PAS-B and shift the equilibrium in the process. As noted above, we achieved population shifts with both surface-and corebinding ligands (KG-548 and KG-655, respectively).
The two examples above support a four-state model with the following states varying by conformation (WT, SLIP) and ligand binding (apo, bound) (Fig. 5d). In theory, the compound also binds the SLIP conformation, but our NMR data show that the affinity between the two is minor, with no bound state visible even at the highest ligand concentration.
Assuming that an invisible bound state would represent at most 10 % of the protein in the SLIP conformation at the highest ligand concentration, we calculated a lower bound of 9 mM for the dissociation constant of KG-548 using Eq. (1) and even higher for KG-655. Taking KG-548 as an example, the value calculated allows us to estimate the equilibrium constant between WT-bound and SLIP-bound states, as presented in Fig. 5d. With the addition of compound to Y456T, the compound binds the WT conformation, creating a new WT bound state. This process depletes the WT unbound state, thereby driving the SLIP conformation towards WT to re-establish equilibrium.

Discussion
We have previously established the structural plasticity of the PAS-B domain of ARNT, which is highly sensitive to the side chain at position 456, located in the Iβ strand Xu et al., 2021). In the wild-type protein, this position is occupied by a tyrosine, and the side chain is solvent-exposed. Reducing the size of Y456 to a smaller side chain enables ARNT PAS-B to enter an equilibrium between two stable conformations: the original WT state and a new SLIP state with a three-residue slip and inversion of the Iβ strand which places this side chain  (Guo et al., 2013). 15 N/ 1 H HSQC spectra of 250 µM protein with mixtures of five compounds (1 mM each) were acquired and were scored for spectral differences between apoprotein spectra (black) and those with compounds (red). Combinations which produced substantial peak shift or intensity changes (e.g., KG-545 to KG-549) were deconvoluted by acquiring spectra of individual protein/ligand mixtures, which were followed with NMR titration experiments for quantitative characterizations. into the core of the protein. Here we extend our prior demonstrations of the ability of nearby mutations to influence the WT / SLIP equilibrium by evaluating the impact of environmental conditions, the HI loop, and small-molecule binding on this interconversion. Taken together, these features give control over this uncovered flexibility intrinsic to this PAS domain.
It is clear that many PAS domains have significant intrinsic flexibility, particularly in the β-sheet surface and the final β hairpin, consisting of HI loop linking the Hβ and Iβ strands. Supporting this idea, residues in the HI loop show intermediate exchange broadening in 15 N/ 1 H spectra of the wild-type ARNT PAS-B domain (Card et al., 2005), and when this domain is subjected to mechanical unfolding in silico, about 1/3 of the unfolding trajectories include a transient intermediate in which the C-terminal β hairpin is unfolded (Gao et al., 2012). While this flexibility is expected to play a role in ligand entry to or exit from the core of a PAS domain, it is not evident in the static crystal structures of ligand-bound PAS domains. For example, only few residues in HIF-2α PAS-B show a significant (> 0.5 Å) backbone shifts between the apo and ligand-bound states Wu et al., 2015;Scheuermann et al., 2013). While the ARNT PAS-B mutations described here are not found naturally to the best of our knowledge, they appear to have uncovered hidden flexibility in sequences virtually identical to the wild-type protein. Our data here and previously suggest several structural determinants which enable the substantial β-strand rearrangement between the WT and SLIP conformations; similar characterization of the effects on dynamics using CPMG-type or other solution NMR experiments may provide additional mechanistic insights as well. Given the results shown in Fig. 3, we suggest that attention on the unfolded (or nascently refolding) states are of particular interest in how these ultimately determine the relative ratio of different conformations .
Our studies here and elsewhere Xu et al., 2021) on ARNT PAS-B Y456T comprise a toolkit for examining fragile protein folds and metamorphic proteins. Many of the approaches and lessons discussed here could be applicable to other similar biological systems, including some that have already been applied. For a simple example, the metamorphic protein IscU, which forms iron-sulfur clusters in E. coli, interconverts between two states in the process of carrying out its function; this in- terconversion requires cis-trans isomerizations of two proline peptide bonds (Dai et al., 2012). This is reminiscent of the isomerization of the P448-N449 peptide bond that accompanies the switch from the WT to the SLIP conformation in ARNT PAS-B Y456T.
Critical to the ability for controlled switching between states is a small free-energy difference between them. For ARNT PAS-B, we identify this as being −0.3 kcal mol −1 at 298 K -less than one hydrogen bond, illustrating that the loss or gain of only a few interactions would be enough to switch between the folds. The breakdown of enthalpic and entropic contributions ( H: +6.8 kcal mol −1 , S: +23.9 cal mol −1 K −1 ) suggests that there are more fa-vorable interactions in the SLIP state, but entropic considerations favor the WT state. Calculation of thermodynamic parameters of interconversion is relatively straightforward and could provide hints to the environmental triggers that favor one state over another in metamorphic proteins. Fragile folds may interconvert through unfolded intermediates or proceed through a series of partially structured transitions (Tyler et al., 2011;Zhao et al., 2016;Khatua et al., 2020). We previously reported that the interconversion between the WT and SLIP conformations in ARNT PAS-B Y456T goes through a mostly unfolded intermediate that then refolds into two similarly stable folded states Xu et al., 2021). We find that the de-natured state of the F444Q/F444A/Y456T mutant that favors the SLIP conformation has slightly greater order at the C terminal end of the Hβ strand. We posit that the Hβ strand may form a folding core or nucleus in the F444Q/F446A/Y456T mutant that promotes folding into the SLIP conformation; rigidity of the C-terminal end of the Hβ strand may force the Iβ strand to slip toward the C-terminus of the protein.
It is also interesting and counterintuitive that, in this case, removing phenylalanine residues results in greater order in the unfolded state, since folding cores are often made up of such hydrophobic residues (Alexandrescu and Shortle, 1994;Buck et al., 1996;Klein-Seetharaman et al., 2002).
Changes in β-strand register have been noted in wellfolded proteins, albeit rarely (Eigenbrot et al., 2001;Goldberg, 1998;Tuinstra et al., 2008;Wright and Scarsdale, 1995;Volkov et al., 2016). These findings prompted in silico studies to investigate such phenomena, where rearrangements of β-strand register have been observed through implicit and explicit solvent molecular dynamics (MD) simulations (Panteva et al., 2011;Li et al., 2007). However, during MD simulations of β-hairpin folding, it is often difficult to establish native β-strand register (Shao et al., 2013). Chong and co-workers proposed an "aromatic crawling" mechanism in which β-strand register is established via initial transient anchoring into hydrophobic pockets, specifically mediated by phenylalanine residues (Panteva et al., 2011). While the situation in ARNT PAS-B is consistent with this proposed mechanism, it is not entirely the same, as F444, F446, and Y456 form a permanent hydrophobic cluster that clearly stabilizes the WT conformation. We suggest that this may reflect an underlying biophysical phenomenon: the importance of hydrophobic clusters on the "inside" face of the hairpin. In the case of proper folding, the formation of hydrophobic clusters on one face of the sheet drives the correct alignment of the β strands relative to each other and formation of cross-strand hydrogen bonds (Shao, 2015;Shao et al., 2013). This description may be applicable to the situation of ARNT PAS-B Y456T and F444Q/F446A/Y456T, as these two mutants progressively weaken a hydrophobic cluster (in this case found on the solvent-facing side of the β sheet), thus allowing the Iβ strand to slip during folding. In fact, due to the hydrated cavity within ARNT PAS-B, it is an almost "inside-out" protein, with the solvent-exposed side of the β sheet containing 12 polar and 11 nonpolar residues and 8 polar and 12 nonpolar on the other side. The C-terminal β hairpin of the HI loop is the most egregious example, with the Hβ strand containing more aromatic residues on the solvent-exposed side and more polar residues on the other side and both sides of the Iβ strand containing approximately the same numbers of polar and nonpolar residues. This unconventional arrangement of side chains on the ARNT PAS-B domain, likely due to its internal water-filled cavities and function as a protein binding site (Guo et al., 2013), may be the driving force behind its fragile fold. Similar studies of the unfolded states of other fragile protein folds may yield insights as well.
Work from our lab and others has established that PAS domains that contain surface grooves and interior cavities can bind ligands that modulate their interactions with other proteins (Guo et al., 2015;Henry and Crosson, 2011;Scheuermann et al., 2009;Wu et al., 2015;Gagné et al., 2020). Since the C-terminal β hairpin is immediately adjacent to the interior cavity, we wondered if binding to a ligand could distinguish between the WT and SLIP conformations of ARNT PAS-B Y456T. Remarkably, we identified both surface-and core-binding ligands that selectively bound to one conformation over another. Two of these small molecules, KG-548 and KG-655, bind to the WT conformation with mid micromolar (µM) to low millimolar (mM) affinities and do not appreciably bind the SLIP conformation (K d > 9 mM). Intriguingly, we have also shown that KG-548 and KG-655 binding to wild-type ARNT PAS-B can disrupt interactions with coiledcoil coactivators (Guo et al., 2015), suggesting that the conformational flexibility under study here may be relevant for biological interactions involving ARNT.
Finally, we emphasize our findings could complement other methods to manipulate conformational switches of proteins. For example, we have shown that equilibrium of protein conformations can also be pressure dependent, mainly because of their volume and compressibility differences (Xu et al., 2021). Selective binding of ligands to one conformation over the other can affect such parameters, change interconversion thermodynamics and kinetics, and therefore provide insights into the transition process. Other metamorphic proteins such as lymphotactin (Tuinstra et al., 2008) and IscU (Markley et al., 2013) have been shown to bind to specific binding partners in different folds. The N11L mutant of the Arc repressor, another engineered fragile fold, binds DNA and concomitantly drives the equilibrium to the native fold (Cordes et al., 2000). Our approach should be effective for characterizing these ligand interactions with fragile protein folds, as long as these proteins give rise to good NMR signals. Characterizing these interactions could help to reveal novel mechanisms for regulating biological activity by switching between two distinct structures and a means of controlling such switches (Ha and Loh, 2017).

Conclusions
We here characterize the thermodynamic components of a substantial change in protein conformation -a β-strand register shift in a variant of the ARNT PAS-B domain -and demonstrate how it can be manipulated with a mix of environmental and sequence changes. Notably, we can exert control over the equilibrium using small-molecule compounds that preferentially bind one of the two states. Our findings suggest that it is possible to control the switch between the two structures using small molecules, providing a route for application to other proteins exhibiting fragile folds. Data availability. Backbone chemical shift assignments of the urea-denatured WT, Y456T, and F444Q/F446A/Y456T ARNT PAS-B proteins have been deposited at BMRB (Biological Magnetic Resonance Bank) with the following accession codes: 50761 (wild type), 50763 (Y456T), and 50762 (F444Q/F446A/Y456T). All other data are available upon request.