In this paper, we introduce a deep learning-based framework for robust and consistent VOI extraction, segmentation, and analysis of the different trabecular bone compartments from micro-CT scans (5 µm) of the epiphyseal-metaphyseal region in mouse tibiae. We trained a deep learning-based classification model to classify cross-sectional 2D slices from micro-CT images of mouse tibiae into four key compartments: epiphyseal bone, growth plate, primary spongiosa, and secondary spongiosa. Additionally, we proposed a regional probability distribution approach to detect the three transitional landmarks between these compartments: represents the transitional interface between the epiphyseal bone and the growth plate, represents the transitional interface between the growth plate and the primary spongiosa, and represents the transitional interface between the primary and secondary spongiosa. This enables a consistent extraction of standardized VOIs of the trabecular bone compartments across different experimental groups. This method is widely applicable, supporting studies on epiphyseal bone for cartilage and osteoarthritis research, as well as analyses of the secondary spongiosa alone or combined with the primary spongiosa in the metaphysis for osteoporosis and bone remodeling research. We validated the classification model using three micro-CT mouse tibia datasets22,23,24, totaling 40 bone scans. These datasets encompassed pharmacological interventions, mechanical loading variations, and aged mice with minimal trabecular bone. Manual annotations from three independent human raters were used as ground truth, and the model achieved a mean F1-score of 0.96 for the epiphyseal bone, 0.95 for the growth plate, 0.92 for the primary spongiosa, and 0.99 for the secondary spongiosa across all datasets. The axial positions of the transitional landmarks were statistically equivalent to those identified by at least two manual annotators, within a 0.05 mm margin (p 0.05). To assess generalizability, we evaluated the model on an external dataset12 comprising two groups: risedronate treatment alone and risedronate combined with ML. The model achieved a mean F1-score of 0.99 for the epiphyseal bone, 0.97 for the growth plate, 0.92 for the primary spongiosa, and 1.0 for the secondary spongiosa, and the axial positions of the predicted transitional landmarks were statistically equivalent to manual annotation within a 0.05 mm margin (p 0.05). Subsequently, deep learning-based segmentation was performed within standardized VOIs for each group, segmenting the epiphyseal trabecular bone, primary spongiosa, and secondary spongiosa. These segmentations were then used to conduct a comprehensive cross-sectional and 3D morphological and statistical analysis of the trabecular bone compartments, facilitating consistent comparisons across experimental groups and enabling direct comparisons within and between trabecular compartments. We further investigated the effects of inconsistent VOI definitions, particularly in the secondary spongiosa, by comparing our anatomically defined VOIs with the conventional fixed-offset approach downstream of the growth plate. Our results highlight the limitations of this traditional method, demonstrating its potential to undermine consistency and to lead to misleading statistical interpretations in trabecular bone analysis. The proposed method enables automated, robust, and consistent analysis of trabecular compartments in the epiphyseal-metaphyseal region of the tibia in murine models, accelerating preclinical skeletal research and the assessment of drug treatment effectiveness.
We performed 5-fold cross-validation during the evaluation of the classification model's performance against the manual annotation using test sets from three datasets: Dataset 1, Dataset 2, and Dataset 3, as reported in Table 1. The model demonstrated strong overall performance, accurately classifying cross-sectional 2D slices of micro-CT scans from the epiphyseal-metaphyseal region of the mouse tibia. Across all datasets, the model achieved a mean F1-score of 0.96 for the epiphyseal bone, 0.95 for the growth plate, 0.92 for the primary spongiosa, and 0.99 for the secondary spongiosa. In Dataset 1, the model performed robustly across all classes, with F1-scores of 0.96 for the epiphyseal bone, 0.95 for the growth plate, 0.93 for the primary spongiosa, and 0.99 for the secondary spongiosa. Performance remained consistently high across all PTH treatment groups (PTH0 to PTH80). In Dataset 2, the model maintained high classification performance, achieving F1-scores of 0.96 for the epiphyseal bone, 0.95 for the growth plate, 0.91 for the primary spongiosa, and 0.99 for the secondary spongiosa. Model predictions remained high across ML conditions with varying peak dynamic load magnitudes under SN (0N, 6 N, and 12 N). In Dataset 3, which included aged mice with significantly reduced bone volume fraction, the model achieved F1-scores of 0.96 for the epiphyseal bone, 0.95 for the growth plate, 0.90 for the primary spongiosa, and 0.99 for the secondary spongiosa. Notably, the model consistently achieved the highest performance in the epiphyseal bone and secondary spongiosa compartments across all datasets, which are key compartments for trabecular bone analysis. The secondary spongiosa reached a mean F1-score of 0.99, while the epiphyseal bone maintained a mean F1-score of 0.96, highlighting the reliability of the classification. Although classification performance for the primary spongiosa was modestly lower in Dataset 2 and Dataset 3 (F1-scores ranging from 0.90 to 0.91), this is likely due to the limited presence of the primary spongiosa in those datasets. In such cases, the focus on secondary spongiosa alone remains appropriate for metaphyseal bone analysis, consistent with the standard study protocols. This was not the case in groups with abundant primary spongiosa, where classification performance remained strong across different PTH dose groups in Dataset 1, underscoring its robustness across diverse trabecular architectures.
To further analyze the model's performance, confusion matrices were examined across different datasets and groups (Fig. 1). The model exhibited classification accuracies approaching 100% for epiphyseal bone and secondary spongiosa. Across all groups and datasets, the secondary spongiosa consistently achieved very high accuracy, with only a few slices misclassified as the adjacent anatomical region (i.e., primary spongiosa), demonstrating the effectiveness of the proposed method. Similarly, the epiphyseal bone exhibited excellent classification accuracy, with only minor misclassifications occurring in the adjacent growth plate region. While classification accuracy remained high for the growth plate and primary spongiosa, occasional misclassifications occurred at the interfaces between these regions, but the overall results were excellent as illustrated in Table 1. For instance, in the PTH0 group (i.e. vehicle treated), the model correctly classified 94% of the growth plate instances. The remaining false positives were misclassified as 4% epiphyseal bone and 2% primary spongiosa. Such misclassifications is expected due to the inherent limitations of cross-sectional analysis in the proximal tibia. As certain slices contain a mixture of adjacent regions, the model encounters uncertainty when assigning class labels, leading to minor errors at anatomical boundaries as discussed above. Transitional zones naturally exist between the anatomically defined compartments, resulting in overlapping features across adjacent regions. This structural continuity introduces minor ambiguity not only in automated classification but also in the manual approach.
To demonstrate the model's ability to generalize to external data under different experimental setups, we retrained the model on all three datasets to obtain the final model, and tested it on the external dataset. This dataset included two distinct mice groups: one treated with risedronate (15 µg/kg/day) alone, and the other with risedronate combined with ML. As shown in Table 1, the model maintained strong classification performance across all compartments, achieving a mean F1-score of 0.99 for the epiphyseal bone, 0.97 for the growth plate, 0.92 for the primary spongiosa, and 1.0 for the secondary spongiosa. The confusion matrices in Fig. 1 illustrate near-perfect classification in both groups, particularly for the epiphyseal bone and secondary spongiosa. Despite variations in bone morphology and experimental conditions, the model extracted anatomical landmarks consistently, demonstrating its robustness in trabecular bone analysis. These results confirm that the model effectively generalized to unseen experimental conditions while maintaining high classification accuracy and reliability across all the compartments in the epiphyseal-metaphyseal region of the proximal tibia in the mouse.
We further assessed the performance of the proposed method in identifying the transitional landmarks between the different compartments: the transitional interface between the epiphyseal bone and the growth plate (), the transitional interface between the growth plate and the primary spongiosa (), and the transitional interface between the primary and secondary spongiosa () along the tibia's longitudinal axis (Z=0 denotes the proximal end of the tibia). We compared the predicted values with the manual annotations provided by three expert annotators. These annotators are researchers with expertise in bone segmentation and anatomical landmark identification (see Fig. 2). The results indicate that the model predicts these landmarks consistently across all groups and datasets, closely matching the expert annotations. The model demonstrates similar or lower variability in landmark measurements compared to the experts in nearly all groups, with an inter-operator mean intra-class correlation coefficient (ICC) of 0.98 for 0.94 for , and 0.99 for across all groups for Dataset 1. The deviation of these measurements remains stable across different groups and is, in most cases, equal to or lower than that observed among expert annotations. For each dataset and across all three landmarks, statistical equivalence within a 0.05 mm margin (p 0.05) was established using the two one-sided t-tests (TOST). Equivalence was demonstrated both among the annotators and between the proposed model and at least two of the annotators.
The model maintains excellent performances even in cases with high drug dose, such as the PTH80-treated group, where a substantial amount of primary spongiosa was deposited downstream of the growth plate, the model's measurement of (1.48 ± 0.025 mm) remained consistent with the measurements from all three annotators: Annotator 1 (1.47 ± 0.03 mm), Annotator 2 (1.50 ± 0.05 mm), and Annotator 3 (1.47 ± 0.03 mm). At the other end of the spectrum, in the 19-month-old mice from Dataset 3 that have significantly reduced bone mass, the model's (0.97 ± 0.05 mm) closely matched the expert annotations, all ranging between 0.97 mm and 0.99 mm ± 0.05 mm. This is also the case for the other landmarks and where the proposed approach provides a consistent and more robust method for extracting these compartments. The model achieves mean values and standard deviations that closely align with those of the expert annotators (see Fig. 2) across diverse conditions, including different doses of PTH (0, 20, 40, and 80 µg/kg/day), in-vivo external ML with different peak dynamic load magnitudes under SN (0N, 6 N, and 12 N), and significantly aged mice with substantially reduced bone density. These results demonstrate the robustness of the proposed method, confirming its reliability in practical applications and its ability to generalize effectively to unseen datasets under varying physiological and experimental conditions. Moreover, the model eliminates intra-operator variability, a limitation commonly observed among human annotators. For comparison, the mean intra-operator ICC values for the manual annotators were 0.99, 0.93, and 0.98, respectively indicating slight inconsistencies even among individual annotators.
We also tested the extraction of the transitional landmarks on the external dataset, on the two groups (risedronate only) and risedronate combined with mechanical loading. The predicted landmarks , and were consistently extracted across different bones for both experimental groups, as shown in Fig. 3. For all three landmarks, statistical equivalence was established within a 0.05 mm margin (p 0.05) between the manual annotator and the proposed model. Visual assessment confirmed that the extracted cross-sections corresponded accurately to the expected anatomical regions, despite differences in orientation, drug treatments, and microarchitectural morphology (see Fig. 4). Specifically, was positioned at the transition where a small, non-calcified growth plate cartilage (excluding bridges) begins to disappear, aligned with the region containing a distinct stripe of non-calcified growth plate cartilage, and consistently intersected a stripe of primary spongiosa traversing the metaphyseal medulla. These results further validate the model's reliability in identifying key anatomical transitions across diverse datasets and experimental conditions.
We performed two types of morphological and statistical analyses on the extracted and segmented trabecular compartments of Dataset 1: (i) a 3D analysis based on the entire compartment and (ii) a 2D slice-by-slice cross-sectional analysis.
For the 3D analysis, we reported the morphological parameters bone volume fraction (BV/TV), trabecular thickness (Tb.Th), and trabecular separation (Tb.Sp) in Table 2. The results indicate distinct dose-dependent adaptations across compartments. In the mixed primary-secondary spongiosa (VOI: 1 mm distal from ), treatment had a statistically significant effect on BV/TV (). BV/TV increased in a dose-dependent manner across PTH0 (12.98 ± 1.60%), PTH20 (20.93 ± 0.38%), and PTH40 (24.15 ± 1.71%), with a much larger increase at PTH80 (39.30 ± 2.08%). Post hoc analysis showed statistically significant differences between all dose groups, indicating a clear and progressive increase in bone volume fraction within the mixed compartment. Treatment also had a statistically significant effect on Tb.Th (). Tb.Th exhibited a slight, non-significant decrease across PTH0, PTH20, and PTH40, ranging from 49.03 ± 1.92 µm to 45.91 ± 1.43 µm. A further reduction was observed at PTH80 (42.33 ± 1.39 µm), which differed significantly from all other groups, suggesting that trabecular thinning significantly increased at higher PTH doses within this compartment. Tb.Sp was also significantly affected by treatment (). A progressive decrease was observed across doses: from 263.5 ± 33.3 µm at PTH0 to 233.3 ± 14.2 µm at PTH20, 214.2 ± 12.6 µm at PTH40, and reaching 140.8 ± 32.7 µm at PTH80. Post hoc analysis revealed significant differences between PTH0 and both PTH40 and PTH80, as well as between PTH20 and PTH80 and between PTH40 and PTH80, indicating that trabecular separation significantly decreased at higher PTH doses within this compartment.
In the secondary spongiosa (VOI: 1 mm distal from ), treatment had a statistically significant effect on BV/TV (). BV/TV remained relatively stable across PTH0, PTH20, and PTH40, ranging from 8.49 ± 0.81% to 9.07 ± 0.83%, with no statistically significant differences between these groups. However, a marked increase was observed at PTH80 (16.57 ± 3.60%, ), which differed significantly from all other groups, suggesting that a high PTH dose is required to elicit a robust anabolic response in the defined VOI of the secondary spongiosa. Treatment also had a statistically significant effect on Tb.Th (). Tb.Th exhibited a slight, non-significant decrease across PTH0, PTH20, and PTH40 (from 49.58 ± 1.76 µm to 46.08 ± 2.05 µm). A statistically significant reduction was detected at PTH80 compared to the vehicle group PTH0 (44.40 ± 1.78 µm, ). In contrast, treatment had no statistically significant main effect on Tb.Sp (). Although Tb.Sp increased slightly from PTH0 to PTH40 (299.9 ± 35.9 µm to 307.6 ± 16.8 µm) before decreasing at PTH80 (280.0 ± 35.2 µm), these variations were not statistically meaningful and did not indicate a treatment effect in the secondary spongiosa.
In the epiphyseal bone (VOI: 0.25 mm proximal from ), treatment had a statistically significant effect on BV/TV (). BV/TV increased progressively with dose: 32.76 ± 2.28% at PTH0, 39.28 ± 1.69% at PTH20, 39.31 ± 2.51% at PTH40, and 43.45 ± 2.29% at PTH80. Post hoc analysis showed statistically significant differences between PTH0 and all other groups, and between PTH20 and PTH80. Treatment also had a statistically significant effect on Tb.Th (). Tb.Th increased from PTH0 to PTH40 (71.78 ± 1.52 µm to 76.00 ± 2.47 µm), followed by a sudden reduction at PTH80 (69.96 ± 4.75 µm). Post hoc analysis revealed significant differences between PTH40 and PTH80 only, suggesting limited structural adaptation of trabecular thickness in the entire compartment. Tb.Sp was also significantly affected by treatment (). A gradual decrease was observed across doses: from 246.1 ± 25.6 µm at PTH0 to 227.2 ± 10.5 µm at PTH20, 216.9 ± 24.3 µm at PTH40, and 187.8 ± 27.3 µm at PTH80. Post hoc analysis revealed significant differences between PTH0 and PTH80 only, suggesting a significant compaction of the trabecular network in the epiphyseal compartment at the highest dose.
For the 2D analysis, we measured bone area per total area (B.Ar/T.Ar), Tb.Th, and Tb.Sp across all cross-sections within each trabecular compartment, as reported in Fig. 5. In the mixed primary-secondary spongiosa (Fig. 5a), morphometric parameters were evaluated along a 1 mm region distal to . B.Ar/T.Ar exhibited a prominent increase immediately downstream of the growth plate, reaching a peak that both increased in magnitude and shifted distally with increasing PTH dose. This was followed by an inflection point and subsequent decline in all groups. The width and density of the primary spongiosa were dose-dependent, with higher PTH doses inducing an expansion of the high B.Ar/T.Ar region further distally. A significant elevation in the proximal portion of the profile was observed from PTH20 onward compared to the vehicle group. PTH20 and PTH40 displayed similar B.Ar/T.Ar trends along the -axis, while still being statistical different from one another, and converging toward the vehicle profile after approximately 0.5 mm. In contrast, PTH80 maintained a significantly elevated B.Ar/T.Ar across the entire VOI compared to all other groups (, red bars). For Tb.Th, values were lowest in the fine-textured primary spongiosa near , increasing to a local maximum around 0.4-0.6 mm before gradually declining distally. All PTH-treated groups exhibited significantly lower Tb.Th near compared to the vehicle, reflecting thinner trabeculae in the expanded primary spongiosa. PTH20 and PTH40 followed nearly identical profiles, with minor and localized significant differences relative to PTH0, which diminished with increasing distance from . In contrast, PTH80 induced a pronounced and statistically significant reduction in Tb.Th throughout the VOI relative to all other groups, with the divergence becoming apparent from approximately 0.25 mm distal to . For Tb.Sp, a clear dose-dependent decrease was observed. The most substantial reductions occurred proximally, where all PTH doses differed significantly from the vehicle group. The Tb.Sp profiles for PTH20 and PTH40 had significant local differences proximally, that gradually diminished beyond approximately 0.5 mm from . In contrast, PTH80 exhibited a pronounced and sustained reduction in Tb.Sp across the entire -axis, with highly significant differences from all other experimental groups.
In the secondary spongiosa (Fig. 5b), morphometric parameters were evaluated along a 1 mm region distal to . B.Ar/T.Ar remained largely unchanged across PTH0, PTH20, and PTH40, with minimal local statistical differences. In contrast, PTH80 exhibited a pronounced and consistent increase in B.Ar/T.Ar across the entire VOI compared to all other groups, with strong statistical significance (, red bars). For Tb.Th, PTH20 and PTH40 followed similar profiles along the -axis, exhibiting a statistically significant increase near compared to the vehicle group PTH0, followed by a progressively significant decrease with depth. No significant differences were observed between PTH20 and PTH40 throughout the VOI. In contrast, PTH80 showed a significant reduction in Tb.Th across the entire VOI compared to PTH0. Tb.Th in PTH80 was also significantly lower than in PTH20 and PTH40 in the proximal region (, red bars), but this difference diminished distally beyond approximately 0.5 mm from , where no statistical significance was detected (, blue bars). For Tb.Sp, both PTH20 and PTH40 showed a slight increase compared to PTH0 proximally, which was statistically significant up to approximately 0.3 mm from . Beyond this point, Tb.Sp converged with the PTH0 profile showing no statistical different distally (, blue bars). No significant differences were observed between PTH20 and PTH40 along the -axis. Compared to PTH0, PTH80 exhibited only a marginal decrease in Tb.Sp, with no statistical significance along the -axis. However, Tb.Sp in PTH80 was significantly lower than in PTH20 and PTH40 in the proximal region (, red bars), with profiles converging distally beyond approximately 0.75 mm from (, blue bars).
In the epiphyseal trabecular bone (Fig. 5c), morphometric parameters were evaluated along a 0.25 mm region proximal to . B.Ar/T.Ar exhibited a dose-dependent increase along the -axis, with statistically significant increases in all PTH-treated groups compared to the vehicle (, red bars). No significant differences were detected between PTH20 and PTH40 across the axis. In contrast, PTH80 induced a pronounced and sustained increase in B.Ar/T.Ar throughout the VOI, with statistically significant differences from all other PTH doses along the entire -axis. For Tb.Th, PTH20 induced a modest, non-significant increase near , which became more pronounced proximally, reaching statistical significance near the tibial plateau. PTH40 exhibited a similar profile to PTH20, with slightly higher values proximally. Only localized, minor differences were observed between PTH20 and PTH40 along the -axis. In contrast, PTH80 showed slightly lower Tb.Th than the vehicle group throughout the VOI, with almost no local statistically significant differences relative to PTH0. Notably, the highest PTH dose appeared to exert an opposite effect on Tb.Th compared to the lower doses (PTH20 and PTH40), as evidenced by localized statistical significance in the PTH20-PTH80 comparison and near-significant differences between PTH40 and PTH80 along almost the entire VOI.
For Tb.Sp, a clear dose-dependent reduction was observed. The most pronounced differences occurred proximally, where all PTH groups significantly differed from the vehicle, although these differences diminished closer to the tibial plateau. PTH40 showed a slightly lower Tb.Sp than PTH20, but no statistically significant differences were detected between these two groups along the axis. In contrast, PTH80 induced a pronounced and sustained decrease in Tb.Sp throughout the entire VOI, with highly significant differences from all other groups along the entire -axis, suggesting an increase of trabecular number.
Our 2D slice-by-slice cross-sectional and 3D morphological analyses of the trabecular compartments in the mouse tibia enabled robust and efficient comparisons across experimental groups, revealing dose-dependent responses in each compartment and distinct compartment-specific responses.
In this section, we examined the impact and limitations of defining the reference level for secondary spongiosa analysis using a fixed offset from the transitional interface between the growth plate and the primary spongiosa (), as traditionally adopted in the literature. We compared this approach with our proposed method, which extracts the transitional interface between the primary and secondary spongiosa (), and uses it as a reference level for VOI definition.
Specifically, as shown in Fig. 6, we analyzed B.Ar/T.Ar in metaphyseal trabecular bone across four different VOIs, each extending 1 mm distally from their respective reference level: (a) the mixed primary-secondary spongiosa, located directly downstream of ; (b) the secondary spongiosa, located downstream of ; (c) the trabecular bone, located 0.125 mm downstream of ; and (d) the trabecular bone, located 0.25 mm downstream of .
As illustrated in Fig. 6c,d, analyzing the "secondary spongiosa" from a fixed offset distal to the growth plate () produced markedly different morphological and statistical results compared to analyzing the secondary spongiosa starting from that we extracted with our proposed method (Fig. 6a). These discrepancies are attributable to the variability in the location of the transitional interface between the primary and secondary spongiosa, which differs across experimental groups and treatment conditions, as previously demonstrated in earlier sections (Figs. 2, 3, 5, 6).
Using a fixed offset to define the reference level can lead to the inclusion of morphologically different trabecular regions. Depending on the treatment group and the chosen offset, the analyzed VOI may fall within: (i) the primary spongiosa (e.g., PTH20, PTH40, PTH80 at 0.125 mm from , and PTH80 at 0.25 mm); (ii) a mixed primary-secondary region (e.g., PTH20 and PTH40 at 0.25 mm); or (iii) the secondary spongiosa alone (e.g., PTH0 at 0.25 mm) (see Fig 6a). This inconsistency introduces substantial variability in the inferred morphometric parameters and statistical interpretations.
A comparison of B.Ar/T.Ar profiles across the different VOIs (Fig. 6b-d) further illustrates this point. When using fixed offsets (c and d), B.Ar/T.Ar exhibits a proximal dose-dependent increase followed by a distal decline. However, this pattern reflects the inclusion of primary or mixed trabecular regions, rather than the mature secondary spongiosa. In contrast, analysis starting from (Fig. 6b) shows relatively stable B.Ar/T.Ar values across PTH0, PTH20, and PTH40, with little to no significant differences in the proximal region, underscoring the importance of anatomically consistent VOI definition.
This example, derived from a dataset exhibiting pronounced anatomical variability of the transitional interface between the primary and secondary spongiosa, illustrates the methodological limitations of choosing a fixed offset downstream of the growth plate to analyze the secondary spongiosa, and its potential to undermine consistency in the analysis and to produce misleading statistical interpretations. The differences in statistical results are not necessarily incorrect but rather reflect the morphological and statistical outcomes for a fixed region that may include different anatomical regions. It is also important to note that the primary spongiosa is characterized by a fine-textured, thinner, and denser structure compared to the secondary spongiosa, and performing statistical tests comparing these two distinct trabecular structures may lack meaningful interpretative value.
Although the transitional interfaces between trabecular compartments are not inherently real, but rather emergent in the context of bone growth, their identification remains necessary for morphometric analysis. The metaphyseal spongiosa represents a dynamic continuum across both spatial and temporal dimensions, with the primary spongiosa positioned at one end of this continuum. Clear region definitions enable meaningful comparisons between animals and treatment groups, ensuring that differences in bone morphology are not confounded by misaligned developmental stages. Consequently, categorizing these regions provides the specificity required for reproducible and consistent morphometric studies.