đŸ•·ïž Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 129 (from laksa147)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

â„č Skipped - page is already crawled

đŸš«
NOT INDEXABLE
✅
CRAWLED
6 months ago
đŸ€–
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffFAILdownload_stamp > now() - 6 MONTH6.7 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/
Last Crawled2025-09-26 04:24:49 (6 months ago)
First Indexednot set
HTTP Status Code200
Meta TitleJames–Stein Estimator Improves Accuracy and Sample Efficiency in Human Kinematic and Metabolic Data - PMC
Meta DescriptionHuman biomechanical data are often accompanied with measurement noise and behavioral variability. Errors due to such noise and variability are usually exaggerated by fewer trials or shorter trial durations and could be reduced using more trials or ...
Meta Canonicalnull
Boilerpipe Text
Abstract Human biomechanical data are often accompanied with measurement noise and behavioral variability. Errors due to such noise and variability are usually exaggerated by fewer trials or shorter trial durations and could be reduced using more trials or longer trial durations. Speeding up such data collection by lowering number of trials or trial duration, while improving the accuracy of statistical estimates, would be of particular interest in wearable robotics applications and when the human population studied is vulnerable (e.g., the elderly). Here, we propose the use of the James–Stein estimator (JSE) to improve statistical estimates with a given amount of data or reduce the amount of data needed for a given accuracy. The JSE is a shrinkage estimator that produces a uniform reduction in the summed squared errors when compared with the more familiar maximum likelihood estimator (MLE), simple averages, or other least squares regressions. When data from multiple human participants are available, an individual participant’s JSE can improve upon MLE by incorporating information from all participants, improving overall estimation accuracy on average. Here, we apply the JSE to multiple time series of kinematic and metabolic data from the following parameter estimation problems: foot placement control during level walking, energy expenditure during circle walking, and energy expenditure during resting. We show that the resulting estimates improve accuracy—that is, the James–Stein estimates have lower summed squared error from the ‘true’ value compared with more conventional estimates. Keywords: James–Stein estimator, Maximum likelihood estimator, Biomechanics, Walking, Locomotion, Parameter estimation, Accuracy, Sample size, Kinematics, Metabolic energy rate, Trial duration, Sample efficiency Introduction Measured biomechanical data are often accompanied with sources of error, such as incomplete or missing data, measurement noise, and high human variability [ 1 – 5 ]. The effect of measurement noise and movement variability can be sometimes mitigated using signal processing tools, such as band-pass filters or even simple averaging [ 2 , 6 ]. Such defense against noise and variability is aided by having larger sample sizes; conversely, insufficient sample sizes can result in low statistical power and poor replicability [ 7 ]. Here, we present the use of an underutilized statistical approach, the James–Stein estimator, heretofore not used for biomechanical data—to enhance accuracy of estimated biomechanical variables especially when the sample size is limited [ 8 – 11 ]. Our focus here is on time series data, where an analog of sample size is the duration for which the time series data are collected: usually, longer duration trials provide higher accuracy in parameter estimation [ 4 , 12 – 14 ]. For example, it is conventional to average over two to three minutes of steady-state metabolic data to obtain an acceptable mean metabolic energy rate (i.e., metabolic cost) value during walking or other exercise [ 15 – 20 ]. Similarly, authors have characterized the minimum number of walking steps required to estimate step kinematic variability during treadmill walking [ 21 ], how sample size and number of steps affects running biomechanical estimates [ 13 ], and how quantities derived from step-to-step locomotor variability related to stability and control increase in accuracy with the number of steps [ 4 , 5 , 12 ]. However, the number of steps or the duration of walking during an experiment may be limited due to functional impairment in populations, such as the elderly, amputees with prosthesis or exoskeletons, or individuals with musculoskeletal disorders. In such cases, methods to reduce trial duration while retaining estimation accuracy are of particular interest. Similarly, reducing the trial duration may be useful in human in the loop optimization [ 22 , 23 ] of wearable devices, such as robotic exoskeletons and prostheses, where a large number of trials may need to be performed. Here, we show how the James–Stein estimator allows such reduction of trial duration for a given accuracy of the statistical estimate. The simplest way to obtain a statistical estimate of a quantity from many samples or long trial durations is via averaging. Other estimation methods include linear and nonlinear regression. Averaging, linear regression, or nonlinear regression are special cases of a broader class of parameter estimation methods called ‘maximum likelihood estimation’ (MLE): given a model of the noise, maximum likelihood estimation computes the parameter values that makes the observed data have the maximum likelihood. MLE methods are popular due to their satisfying the properties of asymptotic normality, consistency, and efficiency [ 24 ]. Despite the popularity of MLE (especially its aforementioned special cases), Stein [ 8 , 9 ] showed that in some situations, there exists a better estimator than the maximum likelihood estimator with lower summed squared error, so MLE is considered ‘inadmissible.’ The James–Stein estimator [ 8 – 11 ] (JSE) is one such better estimator. JSE was popularized in the 1970s by a baseball example [ 10 , 25 ]. Researchers first computed the batting averages for each player over the first 45 at bats (Fig.  1 a), that is, using just the initial few games of the baseball season. These averages (‘MLEs’) provide an inaccurate estimate of the true batting average over an entire season, which typically has over 450 at bats (median). Then, the researchers applied the James–Stein estimator correction for each player, based on all the players’ initial season averages, resulting in new JSE-based estimates. Remarkably, these JSEs were better estimates of the players’ full season averages—even though the JSEs are still based on just the initial part of the season and do not use the full season data. Specifically, the JSEs had lower summed squared error compared with the MLEs when averaged across all participants. Our goal is to examine whether such a result would be true in a variety of biomechanical time series data, improving the accuracy of shorter duration trials by combining data from multiple participants via JSE (Fig.  1 b). Fig. 1. The potential of the James–Stein Estimator. a Famous baseball example illustrating the potential of using JSE to reduce error, compared with simple averages or MLE [ 10 , 25 ]. b Using James–Stein estimator in biomechanics. Statistical estimates of parameters can be obtained for individual participants using averaging, regression, or maximum likelihood estimation (MLE). For limited sample sizes or trial durations per participant, MLEs have some error compared with the true value. The JSE uses information from all the MLEs to produce a new parameter estimate for each participant, which on average will have lower error than those from MLEs. James–Stein estimation sits within a broader class of meta-analysis-related methods, usually used to obtain a common mean value from multiple studies, but sometimes also used to improve the estimates of individual means when appropriate: related methods include Bayesian shrinkage methods, empirical Bayes estimators, isotonic regression, pretest estimators, and other regularized estimators using ridge or lasso regularization [ 26 – 29 ]. Such estimators have been shown to be useful recently in a variety of biomedical contexts involving diabetes [ 27 ], cancer [ 27 , 30 – 32 ], Alzheimer’s disease [ 27 ], COVID-19 [ 30 ], Creutzfeld–Jacob disease [ 28 ], spectroscopy [ 33 ], genomics [ 34 , 35 ], blood pressure [ 36 ], dermal issues [ 37 ], body fat [ 38 ], and neural recordings [ 39 – 41 ]. Here, we focus on James–Stein estimation for its computational simplicity and target biomechanical estimation problems not previously investigated with such techniques. In this manuscript, we apply JSE to the following human time series datasets and related estimation problems: (1) estimating foot placement control during walking derived from pelvis and foot kinematic data [ 5 ]; (2) estimating the steady-state metabolic energy rate ( E ˙ walk ) during walking in a circle derived from metabolic time series [ 18 ]; and (3) estimating the mean resting metabolic energy rate ( E ˙ rest ) during quiet sitting [ 17 , 18 , 42 , 43 ]. We hypothesize that we can achieve significant improvement in estimation accuracy when utilizing JSE compared with MLE (Fig.  1 ) and find that JSE does accomplish such error reduction. Methods We first describe how the JSE correction is done if we have an initial estimate and how we compare the results with the ‘true’ value of the estimate. We then describe the experiments and datasets that we consider for JSE, the biomechanical quantities estimated with each dataset, and how these quantities are estimated. All protocols were approved by the Ohio State University IRB and participants took part in the experiments with informed consent. Computing JSE for Better Person-Specific Estimates Say we have k individual participants and we have computed the maximum likelihood estimates (MLEs) y i ( i = 1 
 k ) of some quantity for each of these participants. These MLEs could be could be of any quantity of interest and could be obtained by linear regression, simple averaging, or some other nonlinear fitting process, as described more in Sect. “ Biomechanical Datasets Considered and Quantities Estimated ”. The corresponding JSE z i for each participant is determined by shrinking the individual MLEs y i toward the global MLE average y ÂŻ = ∑ i y i / k by a ‘shrinkage factor’ c using the following remarkably simple equation [ 8 , 10 ]: The shrinkage factor c in James–Stein estimation is evaluated as follows: where SE ÂŻ 2 is the variance of the individual MLEs, thus an estimate of errors in the MLE [ 6 ]. If the individual MLEs have unequal variances, SE ÂŻ 2 , the mean of these variances is used as a simple heuristic. A shrinkage factor c of 1 means that the JSE for an individual is the same as its individual MLE and a c value of 0 means that the JSE shrunk the values by 100% toward the global MLE average, so that now all the JSE’s are equal to the global MLE average y ÂŻ [ 8 , 10 ]. To assess the performance of the JSE in reducing estimation error, we computed summed squared error of the MLEs y i and JSEs z i from the ‘true values’ x i of the quantity as follows: and compared these quantities. Of course, the ‘true value’ of any quantity is unknowable except with infinite data. So, in the following, given long-enough trial durations for each participant, we use the parameter estimate using the full trial duration as the ‘true’ value, as a proxy for the real true value—analogous to the baseball example (Fig.  1 a). Also analogous to the baseball example, we use the estimates for some shorter time duration as the maximum likelihood estimate (MLE), to distinguish it from the ‘true’ value. Because the MLEs use shorter than available data, they are necessarily imperfect and the goal is to see if JSEs systematically improve their accuracy and move them toward the ‘true’ value. We will drop the quotes from ‘true’ in the following sections. The computational cost of obtaining the JSE are low: once the individual MLEs y i , the shrinkage factor c , and the MLE mean y ÂŻ are known, Eq. ( 1 ) implies just three elementary arithmetic operations per JSE, so a total of 3 k arithmetic operations; of course, computing the mean MLE and the shrinkage factor also require O ( k ) operations. So, overall, the JSE requires O ( k ) arithmetic operations, that is, the computational complexity grows linearly in the number of participants k . Biomechanical Datasets Considered and Quantities Estimated Foot Placement Dynamics in Level-Ground Treadmill Walking The experiments and the quantities estimated follow that in [ 5 ]. Eight participants ( N = 8 , 4 males and 4 females, with height 1.70±0.09 m, age 31.88 ± 10.3 years, and mass 67.15 ± 19.2 kg (mean ± SD)) walked on a treadmill at belt speed 1.3 m/s and zero incline for 2 min (Fig.  2 a, left). Three-dimensional marker-based motion capture (Vicon T20, 100 Hz) was used to track the position of a sacral marker on the pelvis representing the body’s center of mass, and two heel markers representing each foot, used in the analysis (Fig.  2 a, right). Fig. 2. Biomechanical datasets used and quantities estimated. a Treadmill walking experiments and point-mass walking model for foot placement. Participant walks on a constant speed and level-ground treadmill with motion capture markers placed on pelvis, thighs, shanks, and feet (left). Point-mass walking model with the human body represented by the position of pelvis as the center of mass and two massless legs (right). Foot placement deviations of the next stance foot in response to deviation from the mean pelvis trajectory is also shown. b Metabolic rate of turning: walking in circles. Participant walking on circle drawn on the floor of radius 2 m and walking at a 1.26-m/s speed while collecting metabolic rate measurements using a face mask. c Metabolic rate of resting. Participant sitting in place while collecting metabolic rate measurements using a face mask. Humans step in the direction of the perturbation to the center of mass [ 4 , 5 , 44 , 45 ]. As in [ 4 , 5 , 44 , 45 ], we estimate the change in foot placement following a deviation in the pelvis position (Fig.  2 a, right). The motion data were analyzed by fitting linear models between deviations in pelvis states at midstance as input and the next foot position as output. The pelvis state at midstance P comprises the sideways position, sideways velocity, and forward velocity, that is, P = ( X pelvis , X ˙ pelvis , Y ˙ pelvis ) . The next stance foot position is S = ( X foot , Y foot ) . We used least squares regression to estimate the 2 × 3 Jacobian matrix ( J ) relating the next stance foot position to the midstance pelvis state as in the following linear model: where P ∗ and S ∗ are trial-wise mean values of P and S and Δ P and Δ S are deviations from these trial-wise means. The elements of the matrix J are sensitivities or partial derivatives of the individual output variables relative to the individual input variables: for instance, J (1, 1) is the sensitivity of sideways foot placement ( X foot ) to deviations in sideways pelvis marker velocity ( X ˙ pelvis ) at midstance and can be denoted as ∂ X foot / ∂ X ˙ pelvis . See [ 4 , 5 , 44 , 45 ] for more details. For illustrative purposes, we applied the JSE only to one element of the J matrix, namely the sensitivity ∂ X foot / ∂ X ˙ pelvis . Each trial had a minimum of 100 steps. So for each participant, we computed the true value of the partial derivative ∂ X foot / ∂ X ˙ pelvis using the 100 steps, the maximum available data. We also computed this quantity for a series of smaller trial durations ranging from 15 to 90 steps using the same procedure. The error covariance σ i 2 of these estimates are obtained directly from the least squares regression for each participant- i and each trial duration. Metabolic Rate During Walking in a Circle The experimental data for this section is drawn from [ 18 ], with N = 11 participants: 6 males and 5 females, with height 1.74±0.10 m, age 33.55±22.2 years, and mass 74.01±12.5 kg (mean ± SD). Participants walked along marked circles on the ground at four different circle radii ( R = 1, 2, 3, 4 m) and four different constant tangential speeds ranging from 1.58 to 1.8 m/s for a total of 16 trials (Fig.  2 b). Participants walked for six to seven minutes to achieve a sufficient metabolic steady state. Respiratory oxygen and carbon dioxide flux was measured using indirect calorimetry (Oxycon Mobile). For illustrative purposes, we only considered the trials with radius 2 m and approximate speed of 1.26 m/s. We computed metabolic rate per unit mass, E ˙ , using the Brockway equation [ 15 ]: where E ˙ is in W / kg and V ˙ is in mL s - 1 kg - 1 . To estimated the steady state from any fraction of the total trial duration (e.g., first 2 min), we fit an exponential E ˙ = a 0 + a 1 e - t / τ , where τ is the time constant. Each participant’s trial had a minimum of 54 data points. So, for each participant, we compute the true value to be the steady-state metabolic rate E ˙ computed from the 54 data points using the exponential fit. Similarly, we compute the steady-state metabolic rate E ˙ using the exponential fit for shorter duration fractions of the trials, ranging from 15 data points (about 25 s) to 51 points (about 2 min). For such metabolic data, the number of data points is correlated with the trial duration but the proportionality is not perfect because of breathing rate variability (see appendix A and Fig.  6 ). Error variance σ i 2 of the individual steady-state estimates are computed using bootstrap statistics. Fig. 6. Trial durations of all participants’ circle-walking metabolic trials in seconds. The mean values are indicated by bolded red circles. Variations in trial duration among participants result from differences in their breathing rates. Metabolic Rate During Resting Analogous to walking metabolic energy expenditure analysis from Sect. “ Metabolic Rate During Walking in a Circle ”, we compute E ˙ for quiet sitting experiments. We collected these energy expenditure datasets from multiple previously conducted experiments [ 17 , 18 , 42 , 43 ], giving data for N = 27 participants: 23 males and 4 females, with height 1.79 ± 0.05 m, age 24.41 ± 4.5 years, and mass 75.42 ± 9.9 kg (mean  ±  SD). In these studies, as a baseline for metabolic rate of movement, participants were asked to sit quietly for 6–7 min and measure oxygen O 2 and carbon dioxide C O 2 volume rates to estimate resting metabolic rates via a simple average over the duration. Each participant’s resting trial had at least 40 data points, so we compute the ‘true value’ to be the mean metabolic rate E ˙ over all the 40 data points. We also computed the mean for shorter fractions of the trial duration, ranging from 10 data points to 35 data points; these estimates are labeled ‘MLEs.’ Error variance σ 2 for the individual estimates are computed by first computing the data deviations from mean for each participant, pooling these deviations over all participants, and determining the standard error σ from these deviations via bootstrap. Results The JSE can move data toward the truth Fig.  3 shows how JSE improves the MLEs obtained from a particular short duration trial for each participant, moving the individual MLE’s toward the mean of all the MLEs. This results in the spread of the JSEs being lower than the spread of the MLEs, and closer on average to the ‘true’ values obtained from the longer duration trials in terms of summed squared errors. Fig. 3. Visualizing how JSE shrinks the MLE toward a Walking kinematics: model comparison between the JSE and the MLE relative to the TRUTH for coefficient ÎŽ X Foot / ÎŽ X Hip using 25 steps or data points of MLE data from 100 steps or data points of TRUTH data. Number of participants, k , is 8 and they correspond to the scatter points. b Walking metabolics: model comparison between the JSE and MLE relative to the TRUTH for E ˙ using 21 data points of MLE data from 54 data points of TRUTH data. Number of participants, k , is 11 and they correspond to the scatter points. c Sitting metabolics: model comparison between the JSE and MLE relative to the TRUTH for E ˙ using 20 data points of MLE data from 40 data points of TRUTH data. Number of participants, k , is 27 and they correspond to the scatter points. Each data point in the scatter plot represents a participant’s individual estimate of a kinematic or a metabolic parameter using full or subset of full trials. For foot placement gain ∂ X Foot / ∂ X Hip , when the MLEs were obtained from only 25 steps, the JSEs shrunk the MLEs by about 17% toward their mean ( c = 0.829 , Fig.  3 a, d). This resulted in a summed squared error for the JSEs about 46.9% lower than that of MLEs ( p = 0.0475 , paired t test). For circle-walking metabolic rate, when the MLEs were obtained using only 21 data points, the JSEs shrunk the MLEs toward their mean by about 7% ( c = 0.927 , Fig.  3 b, e). This resulted in a summed squared error for the JSEs about 26.3% lower than that for the MLEs ( p = 0.0272 , paired t test). Finally, for resting metabolic rate, when the MLEs were obtained using only 20 data points, the JSEs shrunk the MLEs by 5% toward their mean (or c = 0.953). This resulted in a summed squared error for the JSEs about 9.7% lower than that for the MLEs ( p = 0.0149 , paired t test). While the SSE with respect to the true values is reduced by the JSEs, we generally do not expect the means of the true value, MLEs, and JSEs to be significantly different. That the means of MLE and the true values are close together and unbiased is why JSE is defined as moving the MLEs toward their grand mean. Performing paired t tests, we find that means of true values and MLEs are not significantly different for foot placement control ( p = 0.508 ) and not significantly different for walking metabolic rates ( p = 0.573 ). However, true and MLE values were significantly different for the resting metabolic rates ( p = 0.009 ), suggesting that the MLE sample is slightly biased from the true sample, potentially explaining the lower error reduction by JSE for the resting metabolic rates. Of course, by definition, the JSEs and MLEs have the same mean, and t tests confirm that they are not significantly different ( p = 1 ). JSE reduces error more for shorter duration trials For the range of short duration trials we considered, we found that the JSE increased accuracy compared with MLE in all three estimation problems (Fig.  4 ). Specifically, when compared with MLE, the summed squared error (SSE) for the JSE was lower by about 13–56% for foot placement control, 4–85% for circle-walking metabolic rate, and 5–14% for resting metabolic rate data, with the higher reductions being for the shorter duration trials (Fig.  4 a-f). Both the absolute reduction in error (Fig.  4 a–c) and the relative reduction in error (Fig.  4 d–f) generally decrease as the MLE sample size approaches the ‘true’ sample size. Correspondingly, the shrinkage factor c increases with increasing MLE sample size for all estimation problems, approaching c = 1.0, indicating that shrinkage decreases with larger MLE sample sizes. The most shrinkage (low c values) occurs at small sample sizes, with foot placement kinematics and walking metabolics experiencing relatively higher shrinkage than sitting metabolics estimation problem. Fig. 4. The James–Stein estimator (JSE) led to reduction in summed squared errors (SSE) compared with maximum likelihood estimation (MLE) for foot placement kinematics, walking energy rate, and resting energy rate. a – c Summed squared error (SSE) values for the MLE and the JSE datasets for each parameter estimation problem. Best-fit curve is overlaid on the scatter points for the error from JSE shown in pink color. We fit the SSE from JSE to an exponential decay model of the form: y = a 1 · e - b 1 · x where the parameters a 1 and b 1 are estimated using linear regression on the log-transformed SSE from JSE data, and x is the number of steps or data points. d – f Percent reduction in summed squared error for each parameter estimation problem. i – k Shrinkage factor c for each parameter estimation problem. When the MLE sample size is very close to the true sample size, the JSE sometimes ‘apparently’ performs worse relative to MLE, as illustrated in the case of walking metabolics, where we get negative percent reduction in error (Fig.  4 e). This artifact can be explained by observing that because the true sample size is finite, so what we call the true estimate already contains some random error. Thus, when the MLE sample size gets too close to the true sample size, the error estimate for MLE and JSE become unreliable because of comparable error in the ‘true’ value. Of course, in this limit, we might as well use the full trial duration estimate, as the improvement due to JSE’s improvement is quite small anyway and the trial durations for the MLE and the ‘true’ estimate are not that different. Alternatively, we could apply JSE to the full duration estimate to obtain a further slight improvement. At the other extreme, when the sample size is too small—smaller than what we have considered in Fig.  4 —the MLE has high error and the JSE may be even more unreliable due to poor estimation of the within participant error σ i in the JSE formula [Eq. ( 1 )]. So, it is not advisable to use such small sample sizes. Discussion We introduced the James–Stein estimator (JSE) as an approach to enhance the accuracy of parameter estimation in the context of biomechanical data, particularly when dealing with limited sample sizes. We applied the JSE to three different time series datasets: foot placement control during walking, steady-state metabolic energy rate during walking, and mean resting metabolic energy rate during sitting. We observed that the JSE effectively shrinks individual maximum likelihood estimates (MLEs) toward the global average, bringing them closer to the true values as a whole. This shrinkage improved the estimation accuracy, reducing the summed squared error (SSE). Overall, the JSE demonstrated its potential to enhance parameter estimation in biomechanical studies, especially when working with smaller sample sizes and larger variability. While the JSE reduced SSE in all estimation problems, it is more effective when the datasets are such that the individual estimates have lower accuracy compared with between-participant variance. These properties may arise from inter-participant variability, measurement noise, or sample size limitation. The JSE was successful at substantially reducing SSE in the case of foot placement kinematics and walking metabolics estimation problems, but only a more minimal reduction in MSE was obtained for the observed in the resting metabolic rate dataset, where the within-individual variance was relatively lower. The theorems guaranteeing JSE outperforming MLE by minimizing SSE were originally derived under the following assumptions: the dimension of the quantities being estimated is greater than or equal to three, the variables are independent from each other, and the underlying distribution is normal [ 8 , 10 , 25 ]. However, the normality assumption is not critical to JSE and violating it only reduces the accuracy improvements obtained [ 10 ]. The additional error is negligible if the number of quantities estimated, k , is greater than 15 and minimal if k is as low as 9 [ 10 ]. In the context of biomechanical data, if we estimate one quantity per human participant from their respective time series data, the number of quantities estimated, k equals the number of individuals. So we expect greater insensitivity to assumptions when there are more individuals. The shrinkage factor c as computed from Eq. ( 2 ) may not always be positive. If c is negative, the James–Stein estimator moves the estimate ‘beyond’ the global average and the performance of JSE is compromised. In these cases, the positive part JSE z + should be implemented [ 46 ]: that is, when the shrinkage factor c is negative, we replace it by zero, avoiding error increase due to JSE. The positive part JSE dominates the regular JSE, that is, better than the regular JSE in some cases, while being identical to it in other cases [ 46 ]. James–Stein estimation can be related to and contrasted with Bayesian inference [ 47 ]: in classical Bayesian inference, there is a fixed ‘prior’ distribution and a new data point is corrected toward the mean of this prior distribution (Fig.  5 a); in James–Stein estimation, the JSEs are obtained by reducing the larger MLE values and increasing the smaller MLE values, bringing them closer to the grand MLE average (Fig.  5 b)—and this can be interpreted as moving the MLEs toward an empirically based prior obtained from the individual MLEs (Fig.  5 c). If reliable prior distributions are known for the parameters of interest, it may be better to use a classical Bayesian approach in lieu of JSE, whereas JSE has the advantage of computational simplicity and not needing a prior. Given JSE’s interpretation as an empirical Bayes estimator with simple assumptions [ 25 , 47 ], an intermediate approach between JSE and a classical Bayesian approach could be an empirical Bayes approach that more precisely estimates an empirical prior before the posterior is computed [ 27 ]. Alternatively, when other structural information is available about the parameters, such information can be used to improve the statistical estimates: for instance, researchers have used isotonic regression when it is known that the parameters to be estimated are ordered with a known order [ 30 ]; similarly, if it is known that most of the parameters to be estimated have zero values and the non-zero values are sparse, one could use a pretest estimation procedure [ 30 ]. Our work involved estimating parameters that did not have such known properties and reliable prior distributions of these parameters are not available. Fig. 5. James–Stein estimation and Bayesian inference. a Classical Bayesian inference uses a fixed ‘prior’ and a new data point with its own error distribution (‘likelihood’) to get a better estimate, namely the ‘posterior.’ b Given MLEs for a few participants, with their corresponding error distributions, James–Stein estimation simply shrinks the MLEs toward the grand mean of all the MLEs. c One can interpret the JSEs as being related to an empirical Bayes correction, in which a prior is implicitly obtained from all the MLEs. By lowering the summed squared error for a given trial duration (Fig.  4 ), JSE can potentially allow using shorter trial duration for a given accuracy. If a particular typical error value is sought, one can use a plot like Fig.  4 to read off the corresponding trial duration that would provide this desired accuracy level when using JSE. By improving estimation accuracy and efficiency on average, JSE has the potential to provide better biomechanical personalization. While JSE decreases overall mean-squared error on average, some participants may experience worsened outcomes, reflecting that the JSE does not provide individually better estimates for every parameter [ 48 ], just overall summed squared error across all parameters estimated. We showed statistically significant improvement in the accuracy for all three datasets we examined. However, such improvements are only for some range of sample sizes and clearly not for all individuals. Thus, using JSE in lieu of MLE requires a value judgment to improve the overall error across participants, while potentially making estimates of some participants worse—especially outlier individuals. This tradeoff is analogous to similar value judgment in other medical decision-making [ 49 , 50 ]. For example, different patients may have varying responses for a fixed drug dose, which could be established through clinical trials, potentially suggesting individualized medicine approaches. However, a truly individualized prescription may not always be practical. So the next best solution is to account for inter-participant variability in drug response via demographics, anatomic variations, etc., thus implicitly improving typical individual outcome by studying population-level variability—potentially at the cost of affecting outlier individuals adversely. To implement JSE in a biomedical or wearable device design context, one would process patients or participants in groups—or potentially use the history of patient or participant data, or perhaps their summary statistics, to improve the wearable device or parameter estimate for the most recent patient. The original JSE formula [ 25 , 28 ] was derived by considering that the individual means (MLEs) were drawn from a Gaussian distribution and are independent. So, while high variance in these means (representing one kind of data heterogeneity) is naturally accommodated by JSE, larger heterogeneity in the dataset—for instance, consisting of multiple distinct participant populations, each drawn from a different Gaussian with a distinct mean—can degrade performance of JSEs. In such cases, it may be meaningful to cluster the data by other relevant participant characteristics before applying JSEs to the individual clusters. One might ask the following question: ‘if the JSE is indeed better than the MLE in terms of accuracy, why has it not replaced the MLE yet?’ [ 51 ]. Efron [ 51 ] partially answers this question by stating that the statistical community often adhere to conventional methods such as the MLE to protect individual inferences from the influence of group-based approaches [ 51 ]. In addition, researchers often prefer the familiarity, simplicity, and widespread applicability of the MLE compared with other estimators. Finally, the MLE is often preferred because it has zero bias, that is, it does not tend to underestimate or overestimate a population parameter. JSE increases bias to reduce variance and mean-squared error [ 25 , 52 – 54 ] (Appendix B and Fig.  7 ). So a user may need to decide how much bias is acceptable while implementing the JSE. Fig. 7. Bias, variance, and Mean-Squared Error (MSE) as a function of the shrinkage factor c , for a representative MLE sample size of 10 steps from the foot placement walking dataset. The optimal computed shrinkage value c for this is about 0.648, shown as a black vertical line. In summary, James–Stein estimator could be a valuable tool for improving parameter estimation in biomechanical studies, particularly when sample sizes are small or variability among participants is high. Future work could consider biomechanical data of populations with limited mobility (i.e., short and variable) to improve accuracy for such datasets. Additionally, introducing JSE to other types of datasets, such as high-dimensional biomechanical data from wearable sensors, could also open up new possibilities for enhancing prediction accuracy in personalized healthcare applications. Acknowledgements The work was supported in part by NIH grant R01GM135923-01 and NSF SCH Grant 2014506. Appendix A: Trial Durations for Metabolic Data For a given number of data points, trial duration is not exactly but only approximately consistent across participants. This is because Oxycon system collects data breath by breath and does not have a fixed sampling rate. e.g., a 2-min trial may consist of 35 up to 60 data points (Fig.  6 ). Appendix B: James–Stein Estimator Optimizes Bias-Variance Trade-Off The James–Stein estimator can be understood as minimizing the mean-squared error given all the data, by optimally trading off between bias (systematic error) and variance (random error). We can estimate the bias and variance ( σ 2 ) as follows, with the Mean-Squared Error (MSE) being the sum of these two quantities [ 25 , 53 , 54 ]: The JSE decreases the overall MSE by reducing the variance at the cost of increasing the bias, see Fig.  7 for an illustration using the foot placement walking dataset [ 8 , 25 ]. Data Availability The data for this manuscript is available at http://datadryad.org/stash/share/fVGgnWsMzPAxO1gPrGPuOXEAm0msSDd79KxjmF7qVAo for review purposes and will also be available publicly on the Dryad database at DOI: http://dx.doi.org/10.5061/dryad.3j9kd51v9 . Declarations Conflict of interest The authors declare that they have no Conflict of interest. Footnotes Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. References 1. Winter, D. A., H. G. Sidwall, and D. A. Hobson. Measurement and reduction of noise in kinematics of locomotion. Journal of biomechanics . 7(2):157–159, 1974. [ DOI ] [ PubMed ] [ Google Scholar ] 2. Woltring, H. J. On optimal smoothing and derivative estimation from noisy displacement data in biomechanics. Human Movement Science . 4(3):229–245, 1985. [ Google Scholar ] 3. Sternad, D., and Abe, M.O. Variability, noise, and sensitivity to error in learning a motor task. Motor control: Theories, experiments, and applications 267–294 2011. 4. Seethapathi, N., and M. Srinivasan. Step-to-step variations in human running reveal how humans run without falling. Elife . 8:38371, 2019. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 5. Wang, Y., and M. Srinivasan. Stepping in the direction of the fall: the next foot placement can be predicted from current upper body state in steady-state walking. Biology letters . 10(9):20140405, 2014. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 6. Giakas, G., and V. Baltzopoulos. A comparison of automatic filtering techniques applied to biomechanical walking data. Journal of Biomechanics . 30(8):847–850, 1997. [ DOI ] [ PubMed ] [ Google Scholar ] 7. Knudson, D. Confidence crisis of results in biomechanics research. Sports biomechanics . 16(4):425–433, 2017. [ DOI ] [ PubMed ] [ Google Scholar ] 8. Stein, C. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, vol. 3, pp. 197–207 . University of California Press 1956. 9. James, W. and Stein, C. Estimation with quadratic loss. In: Breakthroughs in Statistics: Foundations and Basic Theory , pp. 443–460. Springer, New York, NY 1992. 10. Efron, B., and C. Morris. Stein’s paradox in statistics. Scientific American . 236(5):119–127, 1977. [ Google Scholar ] 11. Friedman, D. J., and D. C. Montgomery. Evaluation of the predictive performance of biased regression estimators. Journal of Forecasting . 4(2):153–163, 1985. [ Google Scholar ] 12. Wang, Y. System Identification Around Periodic Orbits with Application to Steady State Human Walking. Columbus, Ohio: The Ohio State University, 2013. [ Google Scholar ] 13. Oliveira, A. S., and C. I. Pirscoveanu. Implications of sample size and acquired number of steps to investigate running biomechanics. Scientific Reports . 11(1):3083, 2021. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 14. Forrester, S. E. Selecting the number of trials in experimental biomechanics studies. International Biomechanics . 2(1):62–72, 2015. [ Google Scholar ] 15. Brockway, J. Derivation of formulae used to calculate energy expenditure in man. Human nutrition. Clinical nutrition . 41(6):463–471, 1987. [ PubMed ] [ Google Scholar ] 16. Adeyeri, B., S. A. Thomas, and C. J. Arellano. A simple method reveals minimum time required to quantify steady-rate metabolism and net cost of transport for human walking. Journal of Experimental Biology .225(15):244471, 2022. [ DOI ] [ PubMed ] [ Google Scholar ] 17. Seethapathi, N., and M. Srinivasan. The metabolic cost of changing walking speeds is significant, implies lower optimal speeds for shorter distances, and increases daily energy estimates. Biology letters . 11(9):20150486, 2015. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 18. Brown, G. L., N. Seethapathi, and M. Srinivasan. A unified energy-optimality criterion predicts human navigation paths and speeds. Proceedings of the National Academy of Sciences . 118(29):2020327118, 2021. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 19. Gottschall, J. S., and R. Kram. Energy cost and muscular activity required for propulsion during walking. Journal of Applied Physiology . 94(5):1766–1772, 2003. [ DOI ] [ PubMed ] [ Google Scholar ] 20. Selinger, J. C., and J. M. Donelan. Estimating instantaneous energetic cost during non-steady-state gait. Journal of Applied Physiology . 117(11):1406–1415, 2014. [ DOI ] [ PubMed ] [ Google Scholar ] 21. Owings, T. M., and M. D. Grabiner. Measuring step kinematic variability on an instrumented treadmill: how many steps are enough? Journal of biomechanics . 36(8):1215–1218, 2003. [ DOI ] [ PubMed ] [ Google Scholar ] 22. Felt, W., J. C. Selinger, J. M. Donelan, and C. D. Remy. “body-in-the-loop’’: Optimizing device parameters using measures of instantaneous energetic cost. PloS one . 10(8):0135342, 2015. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 23. Zhang, J., P. Fiers, K. A. Witte, R. W. Jackson, K. L. Poggensee, C. G. Atkeson, and S. H. Collins. Human-in-the-loop optimization of exoskeleton assistance during walking. Science . 356(6344):1280–1284, 2017. [ DOI ] [ PubMed ] [ Google Scholar ] 24. Myung, I. J. Tutorial on maximum likelihood estimation. Journal of mathematical Psychology . 47(1):90–100, 2003. [ Google Scholar ] 25. Efron, B., and C. Morris. Data analysis using stein’s estimator and its generalizations. Journal of the American Statistical Association . 70(350):311–319, 1975. [ Google Scholar ] 26. Van Houwelingen, J. Shrinkage and penalized likelihood as methods to improve predictive accuracy. Statistica Neerlandica . 55(1):17–34, 2001. [ Google Scholar ] 27. Bhattacharyya, A., S. Pal, R. Mitra, and S. Rai. Applications of bayesian shrinkage prior models in clinical research with categorical responses. BMC Medical Research Methodology . 22(1):126, 2022. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 28. Röver, C., and T. Friede. Dynamically borrowing strength from another study through shrinkage estimation. Statistical Methods in Medical Research . 29(1):293–308, 2020. [ DOI ] [ PubMed ] [ Google Scholar ] 29. Taketomi, N., Michimae, H., Chang, Y.-T., and Emura, T. meta. shrinkage: An r package for meta-analyses for simultaneously estimating individual means. Algorithms 15(1), 26 2022. 30. Taketomi, N., Y. Konno, Y.-T. Chang, and T. Emura. A meta-analysis for simultaneously estimating individual means with shrinkage, isotonic regression and pretests. Axioms . 10(4):267, 2021. [ Google Scholar ] 31. Hill, S. M., R. M. Neve, N. Bayani, W.-L. Kuo, S. Ziyad, P. T. Spellman, J. W. Gray, and S. Mukherjee. Integrating biological knowledge into variable selection: an empirical bayes approach with an application in cancer biology. BMC bioinformatics . 13:1–15, 2012. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 32. Bernardinelli, L., and C. Montomoli. Empirical bayes versus fully bayesian analysis of geographical variation in disease risk. Statistics in medicine . 11(8):983–1007, 1992. [ DOI ] [ PubMed ] [ Google Scholar ] 33. Chu, H. O., E. Buchan, D. Smith, and P. G. Oppenheimer. Development and application of an optimised bayesian shrinkage prior for spectroscopic biomedical diagnostics. Computer Methods and Programs in Biomedicine .245:108014, 2024. [ DOI ] [ PubMed ] [ Google Scholar ] 34. Valenta, Z., and J. Kalina. Exploiting stein’s paradox in analysing sparse data from genome-wide association studies. Biocybernetics and Biomedical Engineering . 35(1):64–67, 2015. [ Google Scholar ] 35. Wang, H., C.-C. Chiu, Y.-C. Wu, and W.-S. Wu. Shrinkage regression-based methods for microarray missing value imputation. BMC systems biology . 7:1–12, 2013. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 36. Riley, R. D., K. I. Snell, G. P. Martin, R. Whittle, L. Archer, M. Sperrin, and G. S. Collins. Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small. Journal of Clinical Epidemiology . 132:88–96, 2021. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 37. Gupta, S., Jangid, N. and Sethi, A. Fedstein: Enhancing multi-domain federated learning through james-stein estimator. arXiv preprint arXiv:2410.03499 2024. 38. Amin, M., H. Ashraf, H. S. Bakouch, and N. Qarmalah. James stein estimator for the beta regression model with application to heat-treating test and body fat datasets. Axioms . 12(6):526, 2023. [ Google Scholar ] 39. Angjelichinoski, M., M. Soltani, J. Choi, B. Pesaran, and V. Tarokh. Deep pinsker and james-stein neural networks for decoding motor intentions from limited data. IEEE Transactions on Neural Systems and Rehabilitation Engineering . 29:1058–1067, 2021. [ DOI ] [ PubMed ] [ Google Scholar ] 40. Momot, A., Momot, M. and Leski, J. Bayesian and empirical bayesian approach to weighted averaging of ecg signal. Bulletin of the Polish Academy of Sciences Technical Sciences, 341–350 2007. 41. Momot, A., Momot, M., and Leski, J. Empirical bayesian averaging of biomedical signals. In: Proc. XI International Conference MIT 2006, pp. 176–181 2006. 42. Handford, M. L., and M. Srinivasan. Sideways walking: preferred is slow, slow is optimal, and optimal is expensive. Biology letters . 10(1):20131006, 2014. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 43. Muralidhar, S.S., Marin, N., Melick, C., Alwan, A., Wang, Z., Baldwin, R., Walcott, S., and Srinivasan, M. Metabolic cost for isometric force scales nonlinearly and predicts how humans distribute forces across limbs. bioRxiv 2023. 44. Joshi, V., and M. Srinivasan. A controller for walking derived from how humans recover from perturbations. Journal of The Royal Society Interface . 16(157):20190027, 2019. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 45. Perry, J. A., and M. Srinivasan. Walking with wider steps changes foot placement control, increases kinematic variability and does not improve linear stability. Royal Society open science .4(9):160627, 2017. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ] 46. Gao, J., and D. B. Hitchcock. James-stein shrinkage to improve k-means cluster analysis. Computational Statistics & Data Analysis . 54(9):2113–2127, 2010. [ Google Scholar ] 47. Efron, B. Empirical bayes: Concepts and methods. In: Handbook of Bayesian, Fiducial, and Frequentist Inference.Boca Raton, FL: Chapman and Hall/CRC, 2024, pp. 8–34. 48. Van Calster, B., M. Smeden, B. De Cock, and E. W. Steyerberg. Regression shrinkage methods for clinical prediction models do not guarantee improved performance: Simulation study. Statistical methods in medical research . 29(11):3166–3178, 2020. [ DOI ] [ PubMed ] [ Google Scholar ] 49. Hofmann, B. On value-judgements and ethics in health technology assessment. Poiesis & Praxis . 3:277–295, 2005. [ Google Scholar ] 50. Mertz, M., Prince, I., and Pietschmann, I. Values, decision-making and empirical bioethics: a conceptual model for empirically identifying and analyzing value judgements. Theoretical Medicine and Bioethics , 1–21 2023. [ DOI ] [ PMC free article ] [ PubMed ] 51. Efron, B. Large-scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Vol. 1, New York, NY: Cambridge University Press, 2012. [ Google Scholar ] 52. Muandet, K., Fukumizu, K., Sriperumbudur, B., Gretton, A., and Schölkopf, B. Kernel mean estimation and stein effect. In: International Conference on Machine Learning, pp. 10–18 2014. PMLR 53. Murphy, K. P. Machine Learning: a Probabilistic Perspective. Cambridge, MA: MIT press, 2012. [ Google Scholar ] 54. Chau, J. Demystifying Stein’s Paradox: A quick insight in shrinkage estimation. https://jchau.org/2021/01/29/demystifying-stein-s-paradox/ [Accessed: Sept 10, 2024] 2021. Associated Data This section collects any data citations, data availability statements, or supplementary materials included in this article. Data Availability Statement The data for this manuscript is available at http://datadryad.org/stash/share/fVGgnWsMzPAxO1gPrGPuOXEAm0msSDd79KxjmF7qVAo for review purposes and will also be available publicly on the Dryad database at DOI: http://dx.doi.org/10.5061/dryad.3j9kd51v9 .
Markdown
[Skip to main content](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#main-content) ![](https://pmc.ncbi.nlm.nih.gov/static/img/us_flag.svg) An official website of the United States government Here's how you know Here's how you know ![](https://pmc.ncbi.nlm.nih.gov/static/img/icon-dot-gov.svg) **Official websites use .gov** A **.gov** website belongs to an official government organization in the United States. ![](https://pmc.ncbi.nlm.nih.gov/static/img/icon-https.svg) **Secure .gov websites use HTTPS** A **lock** ( Locked padlock icon) or **https://** means you've safely connected to the .gov website. Share sensitive information only on official, secure websites. [![NCBI home page](https://pmc.ncbi.nlm.nih.gov/static/img/ncbi-logos/nih-nlm-ncbi--white.svg)](https://www.ncbi.nlm.nih.gov/) Search Log in - [Dashboard](https://www.ncbi.nlm.nih.gov/myncbi/) - [Publications](https://www.ncbi.nlm.nih.gov/myncbi/collections/bibliography/) - [Account settings](https://www.ncbi.nlm.nih.gov/account/settings/) - Log out Primary site navigation ![Close](https://pmc.ncbi.nlm.nih.gov/static/img/usa-icons/close.svg) Logged in as: - [Dashboard](https://www.ncbi.nlm.nih.gov/myncbi/) - [Publications](https://www.ncbi.nlm.nih.gov/myncbi/collections/bibliography/) - [Account settings](https://www.ncbi.nlm.nih.gov/account/settings/) Log in - [Journal List](https://pmc.ncbi.nlm.nih.gov/journals/) - [User Guide](https://pmc.ncbi.nlm.nih.gov/about/userguide/) - ## PERMALINK Copy As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsement of, or agreement with, the contents by NLM or the National Institutes of Health. Learn more: [PMC Disclaimer](https://pmc.ncbi.nlm.nih.gov/about/disclaimer/) \| [PMC Copyright Notice](https://pmc.ncbi.nlm.nih.gov/about/copyright/) ![Springer logo](https://cdn.ncbi.nlm.nih.gov/pmc/banners/logo-springeropen.png) Ann Biomed Eng . 2025 Apr 16;53(7):1604–1614. doi: [10\.1007/s10439-025-03718-x](https://doi.org/10.1007/s10439-025-03718-x) - [Search in PMC](https://pmc.ncbi.nlm.nih.gov/search/?term=%22Ann%20Biomed%20Eng%22%5Bjour%5D) - [Search in PubMed](https://pubmed.ncbi.nlm.nih.gov/?term=%22Ann%20Biomed%20Eng%22%5Bjour%5D) - [View in NLM Catalog](https://www.ncbi.nlm.nih.gov/nlmcatalog?term=%22Ann%20Biomed%20Eng%22%5BTitle%20Abbreviation%5D) - [Add to search](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/?term=%22Ann%20Biomed%20Eng%22%5Bjour%5D) # James–Stein Estimator Improves Accuracy and Sample Efficiency in Human Kinematic and Metabolic Data [Aya Alwan](https://pubmed.ncbi.nlm.nih.gov/?term=%22Alwan%20A%22%5BAuthor%5D) ### Aya Alwan 1Mechanical and Aerospace Engineering, The Ohio State University, 201, W. 19th Ave., Columbus, OH 43210 USA Find articles by [Aya Alwan](https://pubmed.ncbi.nlm.nih.gov/?term=%22Alwan%20A%22%5BAuthor%5D) 1,✉, [Manoj Srinivasan](https://pubmed.ncbi.nlm.nih.gov/?term=%22Srinivasan%20M%22%5BAuthor%5D) ### Manoj Srinivasan 1Mechanical and Aerospace Engineering, The Ohio State University, 201, W. 19th Ave., Columbus, OH 43210 USA Find articles by [Manoj Srinivasan](https://pubmed.ncbi.nlm.nih.gov/?term=%22Srinivasan%20M%22%5BAuthor%5D) 1 - Author information - Article notes - Copyright and License information 1Mechanical and Aerospace Engineering, The Ohio State University, 201, W. 19th Ave., Columbus, OH 43210 USA Associate Editor Joel Stitzel oversaw the review of this article. ✉ Corresponding author. Received 2024 Oct 15; Accepted 2025 Mar 19; Issue date 2025. © The Author(s) 2025 **Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit [http://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/). [PMC Copyright notice](https://pmc.ncbi.nlm.nih.gov/about/copyright/) PMCID: PMC12185630 PMID: [40238045](https://pubmed.ncbi.nlm.nih.gov/40238045/) **Previous version available:** This article is based on a previously available preprint posted on bioRxiv on April 11, 2025: "[James-Stein estimator improves accuracy and sample efficiency in human kinematic and metabolic data](https://pmc.ncbi.nlm.nih.gov/articles/PMC11507741/)". ## Abstract Human biomechanical data are often accompanied with measurement noise and behavioral variability. Errors due to such noise and variability are usually exaggerated by fewer trials or shorter trial durations and could be reduced using more trials or longer trial durations. Speeding up such data collection by lowering number of trials or trial duration, while improving the accuracy of statistical estimates, would be of particular interest in wearable robotics applications and when the human population studied is vulnerable (e.g., the elderly). Here, we propose the use of the James–Stein estimator (JSE) to improve statistical estimates with a given amount of data or reduce the amount of data needed for a given accuracy. The JSE is a shrinkage estimator that produces a uniform reduction in the summed squared errors when compared with the more familiar maximum likelihood estimator (MLE), simple averages, or other least squares regressions. When data from multiple human participants are available, an individual participant’s JSE can improve upon MLE by incorporating information from all participants, improving overall estimation accuracy on average. Here, we apply the JSE to multiple time series of kinematic and metabolic data from the following parameter estimation problems: foot placement control during level walking, energy expenditure during circle walking, and energy expenditure during resting. We show that the resulting estimates improve accuracy—that is, the James–Stein estimates have lower summed squared error from the ‘true’ value compared with more conventional estimates. **Keywords:** James–Stein estimator, Maximum likelihood estimator, Biomechanics, Walking, Locomotion, Parameter estimation, Accuracy, Sample size, Kinematics, Metabolic energy rate, Trial duration, Sample efficiency ## Introduction Measured biomechanical data are often accompanied with sources of error, such as incomplete or missing data, measurement noise, and high human variability \[[1](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR1)–[5](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR5)\]. The effect of measurement noise and movement variability can be sometimes mitigated using signal processing tools, such as band-pass filters or even simple averaging \[[2](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR2), [6](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR6)\]. Such defense against noise and variability is aided by having larger sample sizes; conversely, insufficient sample sizes can result in low statistical power and poor replicability \[[7](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR7)\]. Here, we present the use of an underutilized statistical approach, the James–Stein estimator, heretofore not used for biomechanical data—to enhance accuracy of estimated biomechanical variables especially when the sample size is limited \[[8](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR8)–[11](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR11)\]. Our focus here is on time series data, where an analog of sample size is the duration for which the time series data are collected: usually, longer duration trials provide higher accuracy in parameter estimation \[[4](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR4), [12](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR12)–[14](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR14)\]. For example, it is conventional to average over two to three minutes of steady-state metabolic data to obtain an acceptable mean metabolic energy rate (i.e., metabolic cost) value during walking or other exercise \[[15](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR15)–[20](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR20)\]. Similarly, authors have characterized the minimum number of walking steps required to estimate step kinematic variability during treadmill walking \[[21](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR21)\], how sample size and number of steps affects running biomechanical estimates \[[13](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR13)\], and how quantities derived from step-to-step locomotor variability related to stability and control increase in accuracy with the number of steps \[[4](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR4), [5](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR5), [12](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR12)\]. However, the number of steps or the duration of walking during an experiment may be limited due to functional impairment in populations, such as the elderly, amputees with prosthesis or exoskeletons, or individuals with musculoskeletal disorders. In such cases, methods to reduce trial duration while retaining estimation accuracy are of particular interest. Similarly, reducing the trial duration may be useful in human in the loop optimization \[[22](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR22), [23](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR23)\] of wearable devices, such as robotic exoskeletons and prostheses, where a large number of trials may need to be performed. Here, we show how the James–Stein estimator allows such reduction of trial duration for a given accuracy of the statistical estimate. The simplest way to obtain a statistical estimate of a quantity from many samples or long trial durations is via averaging. Other estimation methods include linear and nonlinear regression. Averaging, linear regression, or nonlinear regression are special cases of a broader class of parameter estimation methods called ‘maximum likelihood estimation’ (MLE): given a model of the noise, maximum likelihood estimation computes the parameter values that makes the observed data have the maximum likelihood. MLE methods are popular due to their satisfying the properties of asymptotic normality, consistency, and efficiency \[[24](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR24)\]. Despite the popularity of MLE (especially its aforementioned special cases), Stein \[[8](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR8), [9](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR9)\] showed that in some situations, there exists a better estimator than the maximum likelihood estimator with lower summed squared error, so MLE is considered ‘inadmissible.’ The James–Stein estimator \[[8](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR8)–[11](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR11)\] (JSE) is one such better estimator. JSE was popularized in the 1970s by a baseball example \[[10](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR10), [25](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR25)\]. Researchers first computed the batting averages for each player over the first 45 at bats (Fig. [1](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#Fig1)a), that is, using just the initial few games of the baseball season. These averages (‘MLEs’) provide an inaccurate estimate of the true batting average over an entire season, which typically has over 450 at bats (median). Then, the researchers applied the James–Stein estimator correction for each player, based on all the players’ initial season averages, resulting in new JSE-based estimates. Remarkably, these JSEs were better estimates of the players’ full season averages—even though the JSEs are still based on just the initial part of the season and do not use the full season data. Specifically, the JSEs had lower summed squared error compared with the MLEs when averaged across all participants. Our goal is to examine whether such a result would be true in a variety of biomechanical time series data, improving the accuracy of shorter duration trials by combining data from multiple participants via JSE (Fig. [1](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#Fig1)b). ### Fig. 1. [![Fig. 1](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3a0/12185630/7401ff307235/10439_2025_3718_Fig1_HTML.jpg)](https://www.ncbi.nlm.nih.gov/core/lw/2.0/html/tileshop_pmc/tileshop_pmc_inline.html?title=Click%20on%20image%20to%20zoom&p=PMC3&id=12185630_10439_2025_3718_Fig1_HTML.jpg) [Open in a new tab](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/figure/Fig1/) The potential of the James–Stein Estimator. **a** Famous baseball example illustrating the potential of using JSE to reduce error, compared with simple averages or MLE \[[10](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR10), [25](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR25)\]. **b** Using James–Stein estimator in biomechanics. Statistical estimates of parameters can be obtained for individual participants using averaging, regression, or maximum likelihood estimation (MLE). For limited sample sizes or trial durations per participant, MLEs have some error compared with the true value. The JSE uses information from all the MLEs to produce a new parameter estimate for each participant, which on average will have lower error than those from MLEs. James–Stein estimation sits within a broader class of meta-analysis-related methods, usually used to obtain a common mean value from multiple studies, but sometimes also used to improve the estimates of individual means when appropriate: related methods include Bayesian shrinkage methods, empirical Bayes estimators, isotonic regression, pretest estimators, and other regularized estimators using ridge or lasso regularization \[[26](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR26)–[29](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR29)\]. Such estimators have been shown to be useful recently in a variety of biomedical contexts involving diabetes \[[27](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR27)\], cancer \[[27](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR27), [30](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR30)–[32](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR32)\], Alzheimer’s disease \[[27](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR27)\], COVID-19 \[[30](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR30)\], Creutzfeld–Jacob disease \[[28](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR28)\], spectroscopy \[[33](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR33)\], genomics \[[34](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR34), [35](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR35)\], blood pressure \[[36](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR36)\], dermal issues \[[37](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR37)\], body fat \[[38](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR38)\], and neural recordings \[[39](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR39)–[41](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR41)\]. Here, we focus on James–Stein estimation for its computational simplicity and target biomechanical estimation problems not previously investigated with such techniques. In this manuscript, we apply JSE to the following human time series datasets and related estimation problems: (1) estimating foot placement control during walking derived from pelvis and foot kinematic data \[[5](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR5)\]; (2) estimating the steady-state metabolic energy rate (E ˙ walk) during walking in a circle derived from metabolic time series \[[18](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR18)\]; and (3) estimating the mean resting metabolic energy rate (E ˙ rest) during quiet sitting \[[17](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR17), [18](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR18), [42](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR42), [43](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR43)\]. We hypothesize that we can achieve significant improvement in estimation accuracy when utilizing JSE compared with MLE (Fig. [1](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#Fig1)) and find that JSE does accomplish such error reduction. ## Methods We first describe how the JSE correction is done if we have an initial estimate and how we compare the results with the ‘true’ value of the estimate. We then describe the experiments and datasets that we consider for JSE, the biomechanical quantities estimated with each dataset, and how these quantities are estimated. All protocols were approved by the Ohio State University IRB and participants took part in the experiments with informed consent. ### Computing JSE for Better Person-Specific Estimates Say we have *k* individual participants and we have computed the maximum likelihood estimates (MLEs) y i (i \= 1 
 k) of some quantity for each of these participants. These MLEs could be could be of any quantity of interest and could be obtained by linear regression, simple averaging, or some other nonlinear fitting process, as described more in Sect. “[Biomechanical Datasets Considered and Quantities Estimated](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#Sec4)”. The corresponding JSE z i for each participant is determined by shrinking the individual MLEs y i toward the global MLE average y ÂŻ \= ∑ i y i / k by a ‘shrinkage factor’ *c* using the following remarkably simple equation \[[8](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR8), [10](https://pmc.ncbi.nlm.nih.gov/articles/PMC12185630/#CR10)\]: | | |---| | z i \= y ÂŻ \+ c ( y i \- y ÂŻ ) . |
Readable Markdownnull
Shard129 (laksa)
Root Hash7295144728021232729
Unparsed URLgov,nih!nlm,ncbi,pmc,/articles/PMC12185630/ s443