The ability to make robust inferences about the dynamics of biological
macromolecules using NMR spectroscopy depends heavily on the application
of appropriate theoretical models for nuclear spin relaxation. Data
analysis for NMR laboratory-frame relaxation experiments typically
involves selecting one of several model-free spectral density functions
using a bias-corrected fitness test. Here, advances in statistical model selection theory, termed bootstrap aggregation or bagging, are applied
to

Since the original publications in the early 1980s, the model-free
formalism of

Several authors have addressed model selection by employing the
principle of parsimony or Occam's razor

To illustrate the issue more concretely, a typical data analysis
protocol uses a nonlinear weighted least-squares algorithm to fit
experimental spin relaxation data with a set of model-free spectral
density functions

The present paper addresses model selection error by using the approach
of bootstrap aggregation or bagging. This concept originated from a
desire to improve the performance of machine learning algorithms. Thus,
Breiman showed that predictor accuracy and stability improved when
averaging predictor values obtained from bootstrap replicates of the
original training set

Bootstrap aggregation improves parameter stability; consequently, the resulting variations in model-free parameter values, for example between atomic sites or functional states in a given macromolecule, are more likely to be biologically or chemically meaningful. Although applicable to most model selection situations, bootstrap aggregation exhibits the most pronounced benefits when the data justify two distinct models with similar degrees of certainty.

Bootstrap aggregation for model-free analysis of NMR spin relaxation rate constants is illustrated by application to backbone amide

In the following, the notation used by Efron is rephrased in terms appropriate for NMR spin relaxation data

The extended model-free spectral density function used to fit

Model 1:

Model 2:

Model 3:

Model 4:

Model 5:

In general, a non-parametric bootstrap sample is generated by draws with
replacement from the original data

A conventional non-parametric bootstrap determination of the standard
deviations of the parameters

In contrast to the conventional procedure, bootstrap aggregation determines both the optimal fitted model and associated model parameters for each bootstrap sample. Thus, the optimal model

To make the above formalism concrete, suppose that for a given set of
spectral density values, model selection and parameter optimization for

A smoothed standard deviation for

In bootstrap aggregation, the reported results consist of the smoothed
estimators

Backbone amide

As noted above, the

To avoid such highly unrepresentative possibilities, bootstrap samples were generated by enumerating the

Bootstrap selections.

The data were analyzed using three procedures. First, a conventional
analysis, Eq. (

Flow chart for bootstrap aggregation for the model-free formalism. Indices

Values of a local

The results of the conventional analysis using

Model-free parameters from conventional model selection using

Model-free parameters from conventional model selection using

Smoothed model-free parameters from bootstrap aggregation to determined smoothed parameter estimates and uncertainties. Values of

Bootstrap simulations in which a single optimal model is utilized provide an alternative to Monte Carlo simulations for estimation of (unsmoothed) parameter uncertainties. The uncertainties in

Comparison of model-free parameter uncertainties.

Model selection for selected residues.

For each residue, the top line lists the

The performance of the conventional analysis, in which a single optimal
model is chosen, and bootstrap aggregation, in which parameter values
are smoothed over all models, are illustrated for particular residues
Arg 11, Arg 26, and Asp 32. Table

Model-free parameters for selected residues.

For each residue, parameter values for Models 1–5 are calculated from the fit of the original data to the relevant spectral density function, with errors determined by Monte Carlo simulation. The model selected by

To further illustrate bootstrap aggregation for Arg 11, Arg 26, and Asp 32, Figs.

Distribution of model-free parameters from bootstrap aggregation for residue Arg 11. Color coding is

Distribution of model-free parameters from bootstrap aggregation for residue Arg 26. Color coding is

Distribution of model-free parameters from bootstrap aggregation for residue Asp 32. Color coding is

Comparison of individual fits for Arg 11 of

Comparison of individual fits for Arg 26 of

Comparison of individual fits for Asp 32 of

The difficulties posed by conventional model selection strategies, in
which a single optimal model is chosen using

The results shown for residue Arg 11 in Tables

The results shown for residue Arg 26 in Tables

The results shown for residue Asp 32 in Tables

The present application of bootstrap aggregation used spin relaxation
data recorded at four static magnetic fields. A total of 6859 bootstrap
samples were used to calculate smoothed parameter estimates. Data
recorded at three static magnetic fields provide nine spectral density
values but allow only

Model selection error is a classical problem in statistics and has been
recognized as a concern in the model-free analysis of NMR spin
relaxation data since the work of

Aggregation improves parameter stability by averaging over all models
represented in the bootstrap sample. As applied to

A Jupyter notebook (Python 3.6) is provided for performing all data
analyses reported in the publication. The NMR data analyzed in the
publication are available at Mendeley Data
(

The supplement related to this article is available online at:

AGP conceived the project. Calculations and writing of the paper were performed by TC and AGP.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Geoffrey Bodenhausen Festschrift”. It is not associated with a conference.

Arthur G. Palmer III acknowledges the support from the National Institutes of Health. Some of the work presented here was conducted at the Center on Macromolecular Dynamics by NMR Spectroscopy located at the New York Structural Biology Center, supported by the NIH National Institute of General Medical Sciences. Arthur G. Palmer III is a member of the New York Structural Biology Center. This paper is dedicated to Prof. Geoffrey Bodenhausen on the occasion of his 70th birthday.

This research has been supported by the National Institutes of Health (grant nos. R35GM130398 and P41GM118302).

This paper was edited by Malcolm Levitt and reviewed by three anonymous referees.