Facilitating the structural characterisation of non-canonical amino acids in biomolecular NMR

Kuschert, Sarah; Stroet, Martin; Chin, Yanni Ka-Yan; Conibear, Anne Claire; Jia, Xinying; Lee, Thomas; Bartling, Christian Reinhard Otto; Strømgaard, Kristian; Güntert, Peter; Rosengren, Karl Johan; Mark, Alan Edward; Mobli, Mehdi

doi:https://doi.org/10.5194/mr-4-57-2023

Articles | Volume 4, issue 1

https://doi.org/10.5194/mr-4-57-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/mr-4-57-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 4, issue 1

Research article

|

24 Feb 2023

Research article |

| 24 Feb 2023

Facilitating the structural characterisation of non-canonical amino acids in biomolecular NMR

Sarah Kuschert, Martin Stroet, Yanni Ka-Yan Chin, Anne Claire Conibear, Xinying Jia, Thomas Lee, Christian Reinhard Otto Bartling, Kristian Strømgaard, Peter Güntert, Karl Johan Rosengren, Alan Edward Mark, and Mehdi Mobli

Download

Final revised paper (published on 24 Feb 2023)
Preprint (discussion started on 29 Nov 2022)

Interactive discussion

Status: closed

RC1:
'Comment on mr-2022-22', Anonymous Referee #1, 14 Dec 2022

This article addresses an important and increasingly common problem, namely how to provide a standardised and highly automated procedure for handling non-canonical amino acids in NMR structure determination. The workflow of the software is clearly described and its performance convincingly demonstrated in the results section. Any remaining shortcuts are also honestly addressed. All in all, this work will greatly facilitate the work of many NMR groups and especially promote the study of ncAAs.

Citation: https://doi.org/10.5194/mr-2022-22-RC1
- AC1: 'Reply on RC1', Mehdi Mobli, 16 Jan 2023
  
  We thank the reviewer for their comments.
  
  Citation: https://doi.org/10.5194/mr-2022-22-AC1
RC2:
'Comment on mr-2022-22', Anonymous Referee #2, 09 Jan 2023

This manuscript describes a largely automatic procedure for incorporating non-canonical amino-acid residues (ncAAs) and other post-translational modifications (PTMs) into NMR-based structure calculations using the programs CYANA and CNS. The procedure builds on a previously published webserver-based facility called the Automated Topology Builder (ATB) that creates optimised geometries, topology files and force-field parameters for independent molecules starting from a set of (non-optimised) 3D co-ordinates. The new aspects described in the present manuscript are concerned with extensions that can be used with the ATB to facilitate building structures in a protein context, i.e. allowing for partial structures to be attached in place of a canonical sidechain or at one of the polypeptide chain termini. A significant part of this process is the development of software for naming atoms within ncAAs or PTMs in a standardised fashion consistent with IUPAC recommendations; this might sound like an bureaucratic nicety, but inconsistencies in atom naming can all too easily create tedious obstacles to the development or sharing of methodology or even results. These are important issues, and I believe designing a generally applicable scheme to make the setting up of such calculations more convenient and standardised is a very worthwhile goal. I therefore welcome this contribution and support its publication, subject to attention being paid to the following points:

1) While using successive letters from the Greek alphabet to indicate the (smallest) number of bonds separating a sidechain atom from the protein mainchain in an ncAA is clearly consistent with the spirit of the IUPAC recommendations, it seems to me that formally speaking this is a step beyond the IUPAC recommendations themselves that, as far as I am aware, refer only to atoms in the 20 canonical amino acids. I don't believe the two references for the IUPAC recommendations given in the manuscript (Huang et al., 1970 and Markley et al., 1998) describe this additional step to include atom naming in ncAAs; are there other references that could be cited that would formally set such a precedent?

2) The Greek alphabet contains only 24 letters, and while this is doubtless sufficient for uniquely naming atoms in the great majority of ncAAs or PTMs following the approach outlined in this paper, it is not sufficient in all cases. For instance, glycans containing linear chains of more than 4 sugars would exceed a 24-atom chain-length limit, and the structure of glycosylphosphatidylinositol (GPI) anchors, which are a not uncommon form of N-terminal PTM, involve more than 40 bonding steps from the polypeptide backbone. Do the IUPAC recommendations have anything to say about atom naming in such cases? Assuming they do not, what do the present authors propose in such cases when the Greek alphabet runs out of letters?

3) In some of the more detailed sections of the manuscript it becomes apparent that some steps in creating files to represent complicated structures do require some manual intervention, e.g. to complete the bonding scheme for some ring structures. I can see that it may well be very much more difficult to write software that correctly handles such complications fully automatically, and I don't believe that the need for manual intervention to complete the implementation in such cases is necessarily much of a problem, but I do feel the issue should be more clearly discussed in the manuscript. The authors could comment briefly on whether they are planning to attempt the automatic handling of such remaining cases, or whether that would be impractical. I think it would also be helpful to add a short but clear statement in a rather more visible location in the paper as to what are the fundamental limitations on fully automatic operation in the present implementation.

4) It is probably inevitable that the implementation of a new approach such as this is rooted in the environment of the particular program using which it was developed, in this case CYANA. However, the transfer of the approach to a different program environment is important if the method is to be widely adopted, as presumably the authors of this contribution hope it will be. It may not be practical to go very far down this road, but it might have been nice to see the method worked through using, for instance, XPLOR-NIH, ARIA or AMBER. Are there steps for which the use of CYANA or associated programs is currently unavoidable?

Citation: https://doi.org/10.5194/mr-2022-22-RC2
- AC2: 'Reply on RC2', Mehdi Mobli, 20 Jan 2023
  
  1) While using successive letters from the Greek alphabet to indicate the (smallest) number of bonds separating a sidechain atom from the protein mainchain in an ncAA is clearly consistent with the spirit of the IUPAC recommendations, it seems to me that formally speaking this is a step beyond the IUPAC recommendations themselves that, as far as I am aware, refer only to atoms in the 20 canonical amino acids. I don't believe the two references for the IUPAC recommendations given in the manuscript (Huang et al., 1970 and Markley et al., 1998) describe this additional step to include atom naming in ncAAs; are there other references that could be cited that would formally set such a precedent?
  The reviewer is correct, IUPAC have not made a specific recommendation for such cases. We invited IUPAC to review our paper and provide comments on our approach. They responded by confirming that no convention exists and suggested additional references to IUPAC documents, which have been included.
  2) The Greek alphabet contains only 24 letters, and while this is doubtless sufficient for uniquely naming atoms in the great majority of ncAAs or PTMs following the approach outlined in this paper, it is not sufficient in all cases. For instance, glycans containing linear chains of more than 4 sugars would exceed a 24-atom chain-length limit, and the structure of glycosylphosphatidylinositol (GPI) anchors, which are a not uncommon form of N-terminal PTM, involve more than 40 bonding steps from the polypeptide backbone. Do the IUPAC recommendations have anything to say about atom naming in such cases? Assuming they do not, what do the present authors propose in such cases when the Greek alphabet runs out of letters?
  We have expanded the standard nomenclature from 24 to (600) by using 2 letter symbols from position 25 (AA, AB, … AW | BA, BB, … BW |…| WA, WB, .. WW). An explanatory sentence has been added to the manuscript. We have also highlighted the problem to IUPAC. Note our code on GitHub is intended to be dynamic and will be amended to match any official recommendation by IUPAC.
  3) In some of the more detailed sections of the manuscript it becomes apparent that some steps in creating files to represent complicated structures do require some manual intervention, e.g. to complete the bonding scheme for some ring structures. I can see that it may well be very much more difficult to write software that correctly handles such complications fully automatically, and I don't believe that the need for manual intervention to complete the implementation in such cases is necessarily much of a problem, but I do feel the issue should be more clearly discussed in the manuscript. The authors could comment briefly on whether they are planning to attempt the automatic handling of such remaining cases, or whether that would be impractical. I think it would also be helpful to add a short but clear statement in a rather more visible location in the paper as to what are the fundamental limitations on fully automatic operation in the present implementation.
  A paragraph has been added to the discussion regarding both the existing limitations and future directions.
  “The manual modifications that were noted in specific cases are largely due to limitations and compatibility issues of the tools used in the pipeline. The treatment of ring closures by CYLIB has been noted previously. The need for a torsion angle between connected aromatic rings is a consequence of existing rules within CYLIB. The limitations within the import function in CcpNmr is subject of current development by the authors of that software. The manual steps required for compatibility with CNS are largely due to the use of atom types to define both Lennard-Jones parameters as well as the bonded terms. Name clashes in the atom type definitions that arise from combining multiple ATB generated building blocks within a single system must be addressed to ensure the intended parameters are used. Further refinement of the ATB outputs to improve compatibility with different packages (e.g., NIH-Xplore, AMBER, ARIA) will be the subject of future work.”
  4) It is probably inevitable that the implementation of a new approach such as this is rooted in the environment of the particular program using which it was developed, in this case CYANA. However, the transfer of the approach to a different program environment is important if the method is to be widely adopted, as presumably the authors of this contribution hope it will be. It may not be practical to go very far down this road, but it might have been nice to see the method worked through using, for instance, XPLOR-NIH, ARIA or AMBER. Are there steps for which the use of CYANA or associated programs is currently unavoidable?
  In section 2.6 we outline how we use CNS to perform water refinement. The topology and parameter files required by CNS are in principle compatible with XPLOR-NIH, ARIA and AMBER (i.e. these can be used for cartesian based structure calculations without the need to use CYANA). The additional paragraph in the discussion now also emphasise this.
  
  Citation: https://doi.org/10.5194/mr-2022-22-AC2
RC3:
'Comment on mr-2022-22', Bruno Kieffer, 20 Jan 2023

The manuscript of Sarah Kushert et al. describes an extension of a program (ATB) that facilitates the modelling of non-canonical amino-acids using commonly used structure determination programs such as CYANA or CNS. This contribution is particularly useful for the community since the use of modified peptides in pharmacopea receives a growing interest. The manuscript presents applications on various compounds including stapled peptides which is of particular interest. One important aspect of this work is the description of an automated atom naming procedure that complies with the IUPAC standards. Such tool represents an important and valuable technical contribution for researchers working in structural biology. However, in its present form, the manuscript focus mainly on applications with CYANA program while the authors state (and I tend to believe it) that the approach is general and applicable to CNS or XPLOR-NIH. The manuscript could be greatly improved if the authors provide some examples of this eucumenism with structure calculations performed by CNS or XPLOR-NIH. Beside, I have several specific points that should be addressed to improve the overall clarity:

- Figure 2 shows the template for the description of ncAAs. What happens if the nitrogen atom of the peptide bond is not bound to an hydrogen but to a carbon such as in methylated AA or di-amino butyric acid found in some bacterial siderophores or a modified proline ?

- The statement in the first result paragraph is rather odd:

" In general the recalculated structures are very similar to those previously calculated ...."

I disagree with the statement that it is beyond the scope of the work to compare in detail the results of both procedure. The demonstration that the automated approach delivers the same results as the manual one should be provided in a quantitative way and the origin of possible "subtle" differences should be carefully analysed and addressed. The results of structure calculations should comply with accepted standards showing the tables with structural statistics.

- As already mentioned, comparative structural calculations should be provided also for CNS or XPLOR-NIH. It would be very helpful to have the example of a topology entry for a modified amino-acid in one of the routinely used force field of CNS.

- In section 3.3, the authors present a practical application on a stapled peptide. Details that are provided should be displaced in the method section rather. As for other examples, a table recapitulating structural statistics should be provided. It would also be interesting to detail how the cis-trans isomery of the double bond is defined from the input structure.

- It would be very interesting to provide an example where fluorinated amino-acids are incorporated in a peptide or a protein.

Citation: https://doi.org/10.5194/mr-2022-22-RC3
- AC3: 'Reply on RC3', Mehdi Mobli, 01 Feb 2023
  
  See attached file.
  
  Citation: https://doi.org/10.5194/mr-2022-22-AC3

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Mehdi Mobli on behalf of the Authors (02 Feb 2023) Author's response Author's tracked changes Manuscript

ED: Publish as is (07 Feb 2023) by Nicola Salvi

AR by Mehdi Mobli on behalf of the Authors (07 Feb 2023)

Short summary

The 20 genetically encoded amino acids provide the basis for most proteins and peptides that make up the machinery of life. This limited repertoire is vastly expanded by the introduction of non-canonical amino acids (ncAAs). Studying the structure of protein-containing ncAAs requires new computational representations that are compatible with existing modelling software. We have developed an online tool for this to aid future structural studies of this class of complex biopolymer.