Motivation
Climbing performance is subject to subtle differences in exerted forces and motion of the body. Traditional methods for human force and motion estimation typically use dedicated sensors and provide precise measurements but require controlled environments and expensive setups. Instead, we propose to estimate body motion, forces, and torques directly from a single camera using no specialized equipment. This problem, however, is challenging as forces are not directly observable and must be inferred from the motion. Our approach combines vision-based human pose estimation with physics-based reasoning and derives accurate forces and torques from climbing videos.
Approach
We build on recent advances in computer vision and first estimate 2D human pose (Khirodkar et al. 2024) for each frame of the video. These estimates are lifted to 3D to compute joint motions and forces. Contact interactions with the climbing surface are then modeled under the assumptions of rigid and non-sliding support and known body mass distribution. Finally, similar to Li et al. 2022, we rely on inverse dynamics to refine all our estimates. By enforcing the consistency between motion, forces, and torques, we improve the precision for both (i) force estimates and (ii) trajectories of human body joints.
Results
We evaluate our method on videos with people climbing a campus board. The campusboard is equipped with four force sensors and provides ground truth force measurements at contacts. The comparison of automatically estimated forces with the ground truth reveals high accuracy of our method for different people and climbing styles. Qualitative and quantitative results of our method together with further details on the experimental setup are available at https://rihat99.github.io/climb_force.
Conclusion
This study presents an easy-to-deploy vision-based method to estimate climber motion and contact forces without specialized hardware. Results indicate that our approach effectively reconstructs force trends and improves 3D human motion estimation. We believe our work will support more efficient climber training and will help to develop Video Assistant Referee (VAR) systems for climbing competitions. Our future efforts will focus on refining contact models and improving force estimation for both indoors and outdoors environments.
References
Khirodkar, R., Bagautdinov, T., Martinez, J., Zhaoen, S., James, A., Selednik, P., Anderson, S., & Saito, S. (2024). Sapiens: Foundation for human vision models. Lecture Notes in Computer Science, 206–228. https://doi.org/10.1007/978-3-031-73235-5_12
Li, Z., Sedlar, J., Carpentier, J., Laptev, I., Mansard, N., & Sivic, J. (2022). Estimating 3D motion and forces of human–object interactions from internet videos. International Journal of Computer Vision, 130(2), 363–383. https://doi.org/10.1007/s11263-021-01540-1