Mr. Pose: An Easy Guide for Pose Estimation with Google’s MediaPipe

Loges Siva
9 min readOct 31, 2022

--

Learn to estimate human poses for exercises using MediaPipe by Google

Image of a human pose generated by DALL.E

Hi, let’s go over a short and informative read on pose estimation for exercises using the Mediapipe library by Google. This post is for developers interested in Deep Learning and Computer Vision or who want to integrate pose estimation solutions into their projects. This post will not go into the math, architecture, and research behind Mediapipe library. If you’re interested to learn the details, the links are attached in the references section.

Introduction 📜

In this article we will go through a pose estimation solution called “Mr. Pose” that leverages the MediaPipe library and math algorithms to estimate exercise.

Mr. Pose is a visual analytics application that helps humans to track the accuracy of exercise, and count repetitions or predict the exercise performed. MediaPipe lib provides a landmark model “BlazePose” that predicts the location of 33 landmarks in the human body. Mr. Pose uses the location and movement of the landmarks across the frames of video and exercises estimation algorithms to predict and track the exercise.

Since Mr. Pose completely utilizes the exercise algorithms without any trained ML model for exercise prediction, it provides high throughput performance and memory efficiency. This advantage ensures a high frame rate in low-powered and low-memory hardware.

Mr. Pose application supports four different exercise predictions and measurements as follows,

  • Pushup
  • Plank
  • Squat
  • Jumping Jack

Google’s MediaPipe

MediaPipe is a cross-platform, open-source library developed by Google providing cutting-edge ML solutions. The library is designed to work across platforms like iOS, Linux, Windows, and Android on high to low-powered hardware like Raspberry Pi. Some of the solutions provided by the library are Face Detection, Iris Detection, Human Pose Estimation, and Instant motion tracking. Refer to the MediaPipe page for solutions supported in different languages and platforms.

Simple python code snippet to initialize the Pose model and get results,

import mediapipe as mpmp_pose = mp.solutions.pose
pose = mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5)
results = pose.process(image)
33 pose landmarks. Source: MediaPipe Pose

Algorithms for Pose Prediction 📘

Before we get into the algorithms for exercise prediction and measurement, let’s go over some basic functions essential for the algorithms.

(1) Angle between two lines with one common point
Consider two lines A and B. Let line A be defined between points point1, and point2. Let line B be defined between points point2, and point3. Then, the angle between the two lines A and B with a common point point2 is calculated as follows,

Angle between two lines with one common point

(2) Angle between a line and horizontal
Consider a line between points point1, and point2. Then, the angle between the line and the horizontal (X-axis) is calculated as follows,

Angle between a line and horizontal

(3) Euclidean distance between two points
Consider two points point1, point2. Then, the euclidean distance between the points is calculated as follows,

Euclidean distance between two points

(4) Point position from a line
Consider a point and a line. The position of the point (i.e. right or left) from the line is calculated as follows,

Position of a point from a line

Pushup

To predict the exercise performed in the video is pushup, we focus on Shoulder, Hip, and Ankle landmarks (key points) in the human body.

For pushups, the following conditions are expected to be met,

  1. The angle between Shoulder-Hip line and Hip-Ankle line is expected to be close to horizontal. That is, close to 0° or 180° depending on the position (left or right) of the person performing the exercise. This indicates that the person in positioned parallel to the ground.
  2. The angle of line Shoulder-Ankle or Hip-Ankle are expected to be parallel to horizontal or at least close to it.
  3. The angle between Shoulder-Elbow line and Elbow-Wrist line is tracked across the frames.
  4. The angle of line Elbow-Wrist from horizontal is tracked across the frames.
  5. The above two conditions are tracked for 24 continuous frames in video using a counter. This is to avoid any false positives in measurement.
  6. After 24 continuous incrementation of counter, if the mean of angle from 4th condition for 24 frames is far from 90° and mean of difference between 1st, 12th and 24th frame angle from 3rd condition is greater than 5 (a small constant value), then the prediction is Pushup.

Plank

To predict the exercise performed in the video is a plank, we focus on similar landmarks and conditions as the pushup algorithm. The key difference is the 6th condition.

After 24 continuous incrementation of counter, if the mean of angle from 4th condition for 24 frames is closer to 90° and mean of difference between 1st, 12th and 24th frame angle from 3rd condition is less than 5 (a small constant value), then the prediction is Plank.

Squat

To predict the exercise performed in the video is a squat, we focus on Head, Hand, Foot, Hip, and Knee landmarks (key points) in the human body. Head point can be one of the available points: Nose, Left ear, Right ear, Left eye, or Right eye. Hand point can be one of the available points: Left wrist, Right wrist, Left pinky, Right pinky, Left index, or Right index. Foot point can be one of the available points: Left foot index, Right foot index, Left heel, Right heel, Left ankle, or Right ankle.

For squat, the following conditions are expected to be met,

  1. The Y-coordinate (i.e. height) of the head point is tracked across the frames.
  2. The angle of Shoulder-Ankle line or Hip-Ankle line from horizontal is expected to be close to 90°.
  3. The angle of Knee-Ankle line from horizontal is expected to be close to 90°.
  4. The angle of line Hip-Knee from horizontal is expected to be close to 90°.
  5. The mean height of head point from 1st condition for 24 frames is calculated. The mean value is normalized to the person height in the video based on head point and foot point as minval and maxval respectively.
  6. If the normalized height from previous condition is less than 0 after 24 continuous frames, then it indicates that the person is doing a downward motion in the video.
  7. All previous exercises (Pushup and Plank) conditions are expected to be not met. Then, the prediction is Squat.

Jumping Jack

To predict the exercise performed in the video is jumping jack, we focus on Head point and Hand point similar to the Squat exercise.

For jumping jack, the following conditions are expected to be met,

  1. The difference between Y-coordinates (i.e. height) of the head point and hand point are expected to be greater than 0. This indicates that the hand is above head position.
  2. The angle of Shoulder-Ankle line or Hip-Ankle line from horizontal is expected to be close to 90°. This indicates that the person is standing.
  3. All previous exercises (Pushup, Plank and Squat) conditions are expected to be not met. Then, the prediction is jumping jack.

Algorithms for Pose Measurement 📕

Pushup

To measure the pushup exercise performed in the video, we focus on Head, Ankle, and Wrist points.

For one pushup repetition, the following conditions are expected to be met,

  1. Get position of the Head point from the vertical line (at [frame width/2, frame height/2]) in the middle of the frame.
  2. Calculate the angle between Head-Ankle line and Ankle-Wrist line. If the position of Head point is right, then the angle is expected to be closer to 0°. Else if the position of Head point is left, then the angle is expected to be closer to 180°.
  3. Calculate the Y-coordinate distance between Head point and Ankle point. If the distance is less than 250 (a constant value) and previous condition is met, set pushup flag to True.
  4. If pushup flag is True and distance calculated by previous condition is greater than 300 (a constant value), increment the pushup counter and set pushup flag to False.
  5. If the previous condition is met, then it indicates that one full pushup is complete. Repeat the same process for entire video to count all repetitions.
Pushups measurement result video

Plank

To measure the plank exercise performed in the video, we focus on Shoulder, Elbow, Wrist, Hip, and Ankle points.

For plank, the following conditions are expected to be met,

  1. The angle between Shoulder-Hip line and Hip-Ankle line is expected to be close to horizontal. That is, close to 0° or 180° depending on the position (left or right) of the person performing the exercise.
  2. The angle of line Shoulder-Ankle or Hip-Ankle are expected to be parallel to horizontal or at least close to it.
  3. The angle between Shoulder-Elbow line and Elbow-Wrist line is tracked across the frames.
  4. The angle of line Elbow-Wrist from horizontal is tracked across the frames.
  5. The above two conditions are tracked for 24 continuous frames in video using a counter. This is to avoid any false positives in measurement.
  6. After 24 continuous incrementation of counter, if the mean of angle from (4)th condition for 24 frames is far from 90° and mean of difference between 1st, 12th and 24th frame angle from (3)rd condition is greater than 5 (a small constant value), the plank timer (in HH:MM:SS format) is started.
  7. If the previous condition fails, the plank timer is stopped till the condition is met once again in subsequent frames.
Plank measurement result video

Squat

To measure the squat exercise performed in the video, we focus on Head and Ankle points.

For one squat repetition, the following conditions are expected to be met,

  1. Calculate the Y-coordinate distance between Head point and Ankle point. Normalize the distance with 0 and height of frame as minval and maxval respectively.
  2. If the normalized distance is less than 0.5, the squat flag is set to True.
  3. If the squat flag is True and normalized distance is greater than 0.5, the squat counter is incremented. The squat flag is set to False.
  4. If the previous condition is met, then it indicates that one full squat is complete. Repeat the same process for entire video to count all the repetitions.
Squats measurement result video

Jumping Jack

To measure the jumping jack exercise performed in the video, we focus on Head, Hand, Shoulder, Ankle, and Hip points.

For one jumping jack repetition, the following conditions are expected to be met,

  1. Calculate the Y-coordinate distance between Head point and Hand point. Normalize the distance with 0 and height of frame as minval and maxval respectively.
  2. Calculate the angle of Shoulder-Ankle line or Hip-Ankle line from horizontal.
  3. If the normalized distance is greater than 0 and angle is closer to 90°, the jumping jack flag is set to True.
  4. If the jumping jack flag is True and normalized distance is less than 0, the jumping jack counter is incremented. Jumping jack flag is set to False.
  5. If the previous condition is met, then it indicates that one full jumping jack is complete. Repeat the same process for entire video to count all the repetitions.

General Requirements 🧙‍♂️

Follow the readme general requirements section in the repository and place the camera for live video or record a video of a person performing the exercise.

Code Requirements 🧙‍♀️

Follow the readme code requirements section in the repository and set up the environment. This is an essential part of the forth-coming section of this article.

How to Run 🏃‍♂️

To run the Mr. Pose application, clone the repository, install the requirements and run following command,

python mrpose.py --video <path to video file> --exercise <exercise to be measured>

Optional Arguments:
— video : Path to video source file.
If argument is not provided, then Mr.Pose will launch webcam for live video. Currently, live webcam video works only for exercise prediction.
— exercise : Choices are pushup, plank, squat, jumpingjack
If argument is not provided, then Mr.Pose will predict the exercise performed in the video. If argument is provided, then Mr.Pose will measure the mentioned exercise.

Conclusion 📜

Mr. Pose applications perform exercise prediction and measurement based on key-point positions, the angle between lines, and the distance between points using simple mathematics and algorithms. This application can be extended to many different exercises or other applications (like fall detection, walking, or running) using similar concepts.

Thanks to Google developers for open source MediaPipe library. Thanks to the creators of the BlazePose model which is used directly by Mr. Pose application through the library.

Hope this article was informative and detailed in explaining the use case of the MediaPipe library through an application. For any feedback or queries, please post them in the comments.

Happy Learning! 😊

References 🔖

[1] Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg, Matthias Grundmann, “MediaPipe: A Framework for Building Perception Pipelines”, arXiv:1906.08172, 2019

[2] Logeswaran Sivakumar, “Mr.Pose”, https://github.com/Logeswaran123/MrPose, 2022

[3] MediaPipe Pose, Google

--

--