Search
  • Antonello Calamea

Just Used Machine Learning in My Workout!

I’m a big fan of a bodyweight approach and generally of doing workouts, but I don’t like too much going to the gym.


Besides, in this time of forced lockdown due to Coronavirus, it could be useful to try a different way to approach fitness and training.


So I asked myself: is there a way to use Machine Learning in this area? Can I join two passions together to make something useful?


One of the main problems is having a way to validate the correctness of an exercise, so I did some experiments and tried an approach and found that…


Ok, don’t want to spoil anything, just continue reading to find out!


Frame the problem

As always, let’s start with framing the problem. What we want to achieve is having a way to assess the correctness of an exercise, using a video as input.


The optimum should be using live streaming, but let’s keep it simple using a file for now as, after all, we want to validate an approach before building something on top.


So, having a video of a (hopefully) proper execution should be the first step, as it can be used as a baseline to compare other ones.


Here’s the first video, did in my underground fatigue room :)


The baseline ok execution


My first thought was to use CNNs to build a classifier but, besides the number of needed examples, I’m not sure images sequence pixels can be useful to train a model about what is wrong and what is right in the exercise execution.


So, I did some research to find if there is the possibility to have different features using a video as input and found a great library, OpenPose, a “Real-time multi-person keypoint detection library for body, face, hands, and foot estimation”.


Seeing the demo videos, I understood could be very useful, so I tried to apply it to my problem and had this…


Using OpenPose


(I’ll write later, in the Appendix, all the necessary steps to setup)

As you can see in the video, the library works very well tracking different body parts (used the COCO configuration with 18 key points)

The cool thing is it’s possible to have as output a json file too, with all the positions, frame by frame, so it should be possible to have an alternative numeric representation of an exercise. So, doing some helper function and using Plotly, this is how this exercise looks considering the y-axis movements — skipping x-axis as less useful given the camera position.

Let’s call it “ok1”


Breakdown analysis of good exercise ok1


Nice, the next step is now to find a way to compare two different executions to spot if there are significant differences.

Let’s make first a visual comparison based on these metrics and let’s call this execution “fail1”


fail1


Let’s compare the graphs of the movements


Comparison between ok1 and fail1


There are evident differences

.

Let’s try with another failed performance (“fail2”)


fail2


and let’s compare with the baseline proper execution ok1


Comparison between ok1 and fail2


Let’s try now to compare two good performances (let’s call it the second “ok2”)


ok2



Comparison between ok1 and ok2


The curves look very similar, so we empirically tested this approach.


Now the question is: is there a way to evaluate the similarity between these univariate time-series curves, considering could have different timescale too?


It turns out there is something called Dynamic Time Warping that can be used “for measuring similarity between two temporal sequences”. More here


Is there an implementation in Python? Of course, using tslearn.metrics


So let’s crunch some numbers


Fist compare “ok1” with itself

dtw_value for feature nose_y is 0.0 
dtw_value for feature right_shoulder_y is 0.0 
dtw_value for feature right_elbow_y is 0.0 
dtw_value for feature right_wrist_y is 0.0 
dtw_value for feature left_shoulder_y is 0.0 
dtw_value for feature left_elbow_y is 0.0 
dtw_value for feature left_wrist_y is 0.0 
dtw_value for feature right_hip_y is 0.0 
dtw_value for feature right_knee_y is 0.0 
dtw_value for feature right_ankle_y is 0.0 
dtw_value for feature left_hip_y is 0.0 
dtw_value for feature left_knee_y is 0.0 
dtw_value for feature left_ankle_y is 0.0 
dtw_value for feature right_eye_y is 0.0 
dtw_value for feature left_eye_y is 0.0 
dtw_value for feature right_ear_y is 0.0 
dtw_value for feature left_ear_y is 0.0 
dtw_value for feature background_y is 0.0

So 0 values is the maximum similarity and a lower score means more similarity.


Let’s try now measuring ok1 and fail1

dtw_value for feature nose_y is 188.00378744123748
dtw_value for feature right_shoulder_y is 155.97642562435527
dtw_value for feature right_elbow_y is 156.39925059973916
dtw_value for feature right_wrist_y is 17.982641407757672
dtw_value for feature left_shoulder_y is 13.5329438534267
dtw_value for feature left_elbow_y is 158.0005797757085
dtw_value for feature left_wrist_y is 27.544745106825722
dtw_value for feature right_hip_y is 12.151614599714703
dtw_value for feature right_knee_y is 191.94638493339747
dtw_value for feature right_ankle_y is 223.23781654997444
dtw_value for feature left_hip_y is 263.0165952996121
dtw_value for feature left_knee_y is 195.8379463587177
dtw_value for feature left_ankle_y is 227.95958454954243
dtw_value for feature right_eye_y is 288.64055642788685
dtw_value for feature left_eye_y is 192.9321060365538
dtw_value for feature right_ear_y is 192.15753964939807
dtw_value for feature left_ear_y is 190.20149442225735
dtw_value for feature background_y is 189.09276308989186

I found useful adopting an overall value to have a more condensed info, such as the median

dtw_median : 189.6471287560746

Comparison between ok1 and fail2

dtw_value for feature nose_y is 65.28319682858675
dtw_value for feature right_shoulder_y is 38.87442004120449
dtw_value for feature right_elbow_y is 37.75683113715981
dtw_value for feature right_wrist_y is 18.907807197028447
dtw_value for feature left_shoulder_y is 19.50736795264806
dtw_value for feature left_elbow_y is 45.031636992674414
dtw_value for feature left_wrist_y is 36.101698713495466
dtw_value for feature right_hip_y is 13.248353503737741
dtw_value for feature right_knee_y is 39.45295418596681
dtw_value for feature right_ankle_y is 49.27277845829276
dtw_value for feature left_hip_y is 65.78598402395453
dtw_value for feature left_knee_y is 38.59586190254078
dtw_value for feature left_ankle_y is 44.54850474482842
dtw_value for feature right_eye_y is 64.17832564035923
dtw_value for feature left_eye_y is 50.02819053653649
dtw_value for feature right_ear_y is 50.233695101993064
dtw_value for feature left_ear_y is 45.21480605000976
dtw_value for feature background_y is 42.15576012017812
dtw_median : 43.35213243250327

Comparison between ok1 and ok2

dtw_value for feature nose_y is 16.023831603583467
dtw_value for feature right_shoulder_y is 11.24889546622242
dtw_value for feature right_elbow_y is 11.94796246520719
dtw_value for feature right_wrist_y is 20.509653605070962
dtw_value for feature left_shoulder_y is 19.65007578484111
dtw_value for feature left_elbow_y is 14.486468134089847
dtw_value for feature left_wrist_y is 7.208783392501132
dtw_value for feature right_hip_y is 14.17544715061928
dtw_value for feature right_knee_y is 25.759515076957445
dtw_value for feature right_ankle_y is 43.123581089700735
dtw_value for feature left_hip_y is 83.91171946754521
dtw_value for feature left_knee_y is 23.860467116131673
dtw_value for feature left_ankle_y is 44.80603683656928
dtw_value for feature right_eye_y is 91.27560108813313
dtw_value for feature left_eye_y is 31.263050533657154
dtw_value for feature right_ear_y is 25.735729785455852
dtw_value for feature left_ear_y is 12.39151408383979
dtw_value for feature background_y is 11.887661376402017
dtw_median : 20.079864694956036

So it seems this value can be used as an indicator to compare the correctness of two executions based on a threshold to be found.


As an empirically counter check, let’s try with other examples starting from this value


ok1 and check1 -> median 82.22671018607622

ok2 and check2 -> median 196.313312415643

ok and check3 -> median 25.03920782168309


It seems that a median lower than 30 could be a starting threshold


Let’s see them on video


No jumps allowed!



Incomplete



Ok!


Conclusion

This is just the beginning of this experiment: assuming this is the right approach, there are a lot of open points such as:


  • What about different persons with different heights? They need a personal baseline too or can be generalized?

  • What about a different camera position?

  • How can the threshold be inferred?

  • How to give more detailed suggestions about what was wrong in the execution?

  • How to process the relevant part of an exercise during a continuous video stream?

  • Can exercises with tools such as dumbbells be tracked? (hint: yes but with specific object detection libraries too)

I had some ideas to check, and I’ll do in the future, even because the possibilities are fantastic.


Imagine a workstation with a camera that

  • recognize you when you enter it with face identification

  • loads your “wod” (workout of the day)

  • checks the correctness of the exercises giving hints

  • signals a bad execution to a trainer who’s present or maybe attending a remote session with dozens of people, allowing him/her to take corrective action.

Even training could be customized on the fly based on previous sessions and overall person condition.


As always, I’m amazed about what is possible to achieve and imagine with these technologies and it’s big fun to use them.


In the meantime, happy workout and stay safe.


Appendix

Docker+OpenPose

Instead of installing directly OpenPose with all the necessary dependencies, I opted for a Docker approach. You can found here the image: https://hub.docker.com/r/garyfeng/docker-openpose/


Keep in mind that probably for a real-time approach using a container is not the right solution as there is a lot of lag but I haven’t tried other solutions so I cannot say it for sure.


But before running it, you need to run containers using GPU, otherwise, OpenPose will not start. Here all the instruction to do it (with Invidia GPUs): https://github.com/NVIDIA/nvidia-docker


You’ll see in the command The “privileged” and -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix parts that are used to access the camera inside the container if you need it.


Before launching the docker command, be sure to execute:

xhost +

so the container can connect.


Then, just launch

docker run --privileged --gpus all -v <host path to share>:/data  -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -it garyfeng/docker-openpose:latest

After a while, you’ll enter in a bash shell inside the container.


if you check OpenPose documentation there are a lot of parameters but let’s see a couple of examples

build/examples/openpose/openpose.bin --face

It should turn on the camera and start to detect keypoint in your face.


The command I used to create the data used before:

build/examples/openpose/openpose.bin --video /data/<input file>  --write_video /data/<ouptut file> --no_display --write_keypoint_json /data/<folder with json output files>

Notice the “data” folder that was mounted while launching the container. If you change it, be sure to adapt accordingly to the command.


Python code

Let’s see now some Python code to deal with the data used in the article

import pandas as pd
import os
import numpy as np
def read_pose_values(path, file_name):
    try:
        path, dirs, files = next(os.walk(path))
        df_output = pd.DataFrame()
        for i in range(len(files)):
            if i <=9:
                pose_sample = pd.read_json(path_or_buf=path+'/' +  file_name + '_00000000000' + str(i) + '_keypoints.json', typ='series')
            elif i <= 99:
                pose_sample = pd.read_json(path_or_buf=path+'/' + file_name + '_0000000000' + str(i) + '_keypoints.json', typ='series')
            else:
                pose_sample = pd.read_json(path_or_buf=path+'/' + file_name + '_000000000' + str(i) + '_keypoints.json', typ='series')    
            df_output = df_output.append(pose_sample, ignore_index = True)
        return df_output
    except Exception as e:
        print(e)

This is used to return a DataFrame with all the json found in an OpenPose json output path (beware, it will break if there are 1000+ files — definitely to fix :)

'''
Nose – 0, Neck – 1, Right Shoulder – 2, Right Elbow – 3, Right Wrist – 4,
Left Shoulder – 5, Left Elbow – 6, Left Wrist – 7, Right Hip – 8,
Right Knee – 9, Right Ankle – 10, Left Hip – 11, Left Knee – 12,
LAnkle – 13, Right Eye – 14, Left Eye – 15, Right Ear – 16,
Left Ear – 17, Background – 18
'''
from sklearn.preprocessing import MinMaxScaler
def transform_and_transpose(pose_data, label):
    output = pd.DataFrame()
    for i in range(pose_data.shape[0] -1):
        if len(pose_data.people[i]) > 0: 
            output = output.append(pd.DataFrame(pose_data.people[i][0]['pose_keypoints']).T)
# drop confidence detection
    for y in range(2,output.shape[1] ,3):
        output.drop(columns=[y], inplace=True
# rename columns
    output.columns = ['nose_x', 'nose_y', 'right_shoulder_x', 'right_shoulder_y', 'right_elbow_x', 'right_elbow_y',
                      'right_wrist_x', 'right_wrist_y', 'left_shoulder_x', 'left_shoulder_y', 'left_elbow_x', 'left_elbow_y',
                      'left_wrist_x', 'left_wrist_y', 'right_hip_x', 'right_hip_y', 'right_knee_x', 'right_knee_y',
                      'right_ankle_x', 'right_ankle_y', 'left_hip_x', 'left_hip_y', 'left_knee_x', 'left_knee_y',
                      'left_ankle_x', 'left_ankle_y', 'right_eye_x', 'right_eye_y', 'left_eye_x', 'left_eye_y',
                      'right_ear_x', 'right_ear_y', 'left_ear_x','left_ear_y','background_x', 'background_y']
 
    # interpolate 0 values
    output.replace(0, np.nan, inplace=True)
    output.interpolate(method ='linear', limit_direction ='forward', inplace=True)
return output

Here we’re doing columns renaming based on COCO setup and a basic interpolation if there are 0 values (for example when a nose is behind the pull-up bar):

def model_exercise(json,name,label):
    df_raw = read_pose_values(json,name)
    return transform_and_transpose(df_raw,label)
df_exercise_1 = model_exercise('<path to json>','<file_name>','<label>')

Putting all together, the function to use to have the final DataFrame.

Let’s see some graphs now:

import plotly.graph_objects as go
from plotly.subplots import make_subplots
def plot_y_features(df):
    fig = make_subplots(rows=3, cols=6, start_cell="top-left")
    r = 1
    c = 1
    X = pd.Series(range(df.shape[0]))
    for feature in df.columns:
        if '_y' in feature:
            fig.add_trace(go.Scatter(x=X, y=df[feature], name=feature),
            row=r, col=c)
            fig.update_xaxes(title_text=feature, row=r, col=c)
            if c < 6:
                c = c + 1
            else:
                c = 1
                r = r + 1
    fig.update_layout(title_text="Exercise y-axis movements breakdown", width=2000, height=1000)
    fig.show()
plot_y_features(df_exercise_1)

Drawing the subplots for all the positions.

Now drawing the comparison for two exercises:

def plot_comparison_y_features(df1,df2):
    fig = make_subplots(rows=3, cols=6, start_cell="top-left")
    r = 1
    c = 1
    X1 = pd.Series(range(df1.shape[0]))
    X2 = pd.Series(range(df2.shape[0]))
    for feature in df1.columns:
        if '_y' in feature:
            fig.add_trace(go.Scatter(x=X1, y=df1[feature], name=feature + '_ok'),row=r, col=c)
            fig.add_trace(go.Scatter(x=X2, y=df2[feature], name=feature + '_fail'),row=r, col=c)
            fig.update_xaxes(title_text=feature, row=r, col=c)
            if c < 6:
                c = c + 1
            else:
                c = 1
                r = r + 1
    fig.update_layout(title_text="Exercise y-axis movements breakdown comparison", width=2000, height=1000)
    fig.show()
plot_comparison_y_features(df_exercise_1, df_ok2)

Finally the Dynamic Time Warping part:

def evaluate_dtw(df1,df2,feature, plot=False):
    x1 = range(df1.shape[0])
    y1 = df1[feature].values
    x2 = range(df2.shape[0])
    y2 = df2[feature].values
   
    dtw_value = evaluate_dtw(df1[feature],df2[feature])
      print("dtw_value for feature {} is {}".format(feature,     dtw_value))
    return dtw_value
def evaluate_dtw_values(df1,df2,plot = False):
    dtw_values = []
    for feature in df1.columns:
        if '_y' in feature:
            dtw_values.append(dtw(df1,df2,feature,plot))
    return pd.DataFrame(dtw_values)

That’s all! Thank you.

40 views0 comments