Machine learning gesture analysis of yoga for exergame development

: Many successful and innovative information technology applications use gestures as input. These programs span a wide variety of genres, platforms and input technologies, from the touch screen of a smart phone to the full-motion, the natural input of devices like the Kinect Sensor. Visual Gesture Builder, a data-driven machine-learning solution for gesture detection, was used to capture useful yoga gestures with high accuracy. This gesture analysis technology is being explored for incorporation into exergames for personalised medical interventions. The research goal is to test whether a machine learning algorithm in a basic computer video exergame can assess yoga skill acquisition in targeted select populations as a means to promote healthy physical activity.


Introduction
New forms of interactive gaming, termed exergaming, serious games or active video gaming, have emerged with researchers examining whether these potentially engaging platforms are equivalent to moderate or vigorous physical activity, and have potential to promote healthy active behaviours. Several reviews of intervention studies and laboratory energy expenditure of exergames have been published [1,2]. A recent systematic review [3] focused on children, and while more than half of American adults play games [4], 29% of Americans older than 50 years play video games [5]. Exergames then have the potential to reach a broad audience to promote physical activities (PAs) in the community. The objectives of this study are to develop laboratory measures of yoga as a beneficial physical activity and to assess whether yoga skill acquisition can be measured in an exergame format as a physical activity promotion tool.

Exergames for physical activity
We intend to utilise smart and connected technologies to assist therapeutic intervention using exergames at clinic or home settings with a focus on improving physical activity to prevent the onset of diabetes, cancer, and other diseases [6]. A recent metasurvey of 28 laboratory studies [1] found that the most commonly used exergames were associated with the Nintendo Wii console and the Wii Balance Board (e.g. Wii Sports and Wii Fit), followed by dance pad-based games (e.g. Dance Dance Revolution and Zigzag Xer-Dance) and games for the PlayStation2 EyeToy. Other exergame systems include Exerstation, a full-body isometric game controller for the Xbox 360, GameBike, and the XaviX system. Some of the systems provide vigorous physical activity suitable for sports training, while most provide moderate to light-intensity PA suitable for general populations.

Yoga for disease prevention
Several conventional clinical diabetes prevention programs have demonstrated a reduction in the incidence of diabetes in individuals with prediabetes through weight loss. Short-term yoga-based lifestyle intervention programs have been shown to be efficacious in weight loss [7]. Recently Netam [8] showed that diabetes risk factors were favourably modified by a short-term yoga-based lifestyle intervention in obesity. This study also highlighted the challenges in compliance associated with the follow up of subjects after an aggressive supervised intervention of 10 days. Others have shown that even a short-term yoga-based lifestyle intervention was efficacious in weight loss, reduced inflammation and stress [9], and positively influenced cardiovascular risk factors [10]. In view of complexities of treatment plans for control of diabetes, yoga can be considered as a cost-effective and non-invasive therapy.
Therefore yoga therapy (YT) may be helpful for a variety of conditions by reducing sympathetic nervous system activity, inflammation, and stress responses [11]. Our group has demonstrated that YT over a 10-week course has a beneficial effect on improving cardiovascular endurance and significantly reduced inflammatory biomarkers in 40 African American adults with heart failure [12]. A preliminary study of the effects of a yoga intervention in a group of children with sickle cell disease demonstrated improvements in relaxation, yet the participants were not well focused on the activity in a large group setting (Pullen, unpublished report). A protocol designed around a familiar game console capable of social support and peer interaction could improve compliance, with the ultimate overall goal of contributing a new viable therapeutic protocol to the prevention of diabetes. The idea of serious games for exergames is a relatively new area of academic interest. This research project ultimately aims to (i) build an Exergame to deliver YT to improve exercise tolerance, and (ii) derive participant clinical measures like anthropometry waist circumference, respiration rate, heart rate, and heart rate variability (HRV) from the Kinect data stream to measure the efficacy of therapy. Respiration can be detected from the depth data stream [13], while HRV may be measured from the RGB stream using a motion magnification algorithm from Liu [14] that is open source. The basic yoga exergame described here will be developed in future studies to include remote measurement of HRV as a biomarker for the autonomic nervous system response, which would be novel and significant for assessing participants in longitudinal studies from exergame play. Currently, the exergame displays participants as avatars and identifies one of the five yoga gestures along with a prediction confidence value.

Study design
The purpose of this research is to incorporate gesture analysis software into an exergame engine to provide skill improvement feedback to students in a college yoga course setting. The Hypothesis tested: actual training will reduce yoga position variability relative to gestures captured from yoga instructors. This information could be useful to validate data gathered from realtime YT studies. We can also examine if there is difference between students with and without prior yoga experience.
A convenience sample of students involved in the study were adults (persons 18 years of age or older), male and female as normal students/controls of any race/ethnicity. Exclusions: people with neurological/motor impairments who were not healthy enough to participate in yoga postures. We administered a short questionnaire to exclude people who had any degree of difficulty with normal physical activity function. Students completed studyrelated questionnaires including an Institutional Review Board (IRB) approved consent form (University of North Georgia IRB#201584), health history, and physical activity. They agreed to participate in a yoga group workshop to learn yoga postures; while a three-dimensional (3D) room sensor recorded body and joint positions (Fig. 1).
The students were briefly instructed and shown poses to perform, while recorded by the Kinect attached to a PC. Two students could be followed while showing a series of five standing yoga postures as instructed by a yoga instructor using two Kinect setups with non-overlapping fields of view. The capture took 1-2 min to complete the five yoga postures, resulting in raw files 10-20 GB in size. Three yoga sessions (pre-test session, mid-way session, and a post-test session) were captured during the regularly scheduled yoga class. The course met twice weekly for 75 min, over a 10-week period.

AdaBoost gesture analysis
Visual Gesture Builder (VGB) is a Microsoft Kinect v2.0 SDK tool that provides a data-driven solution to gesture detection through machine learning (Microsoft, 2014) [15]. VGB generates small database files that applications in C# or C++ use to perform gesture detection at run time. VGB uses two detection technologies, either AdaBoostTrigger or RFR Progress, to classify each captured frame into gesture categories. AdaBoost, short for 'adaptive boosting', combines a weighted sum of the output of an ensemble of weak classifiers. The algorithm can potentially generate tens of thousands of weak classifiers in our samples 2000 were examined. The AdaBoost training process selects only those features that improve the predictive power of the model, reducing dimensionality and improving execution time as irrelevant features do not need to be computed. Independently two yoga researchers tagged frames in each captured clip related to a meaningful yoga gesture. At the end of the tagging process, the yoga expert clips were analysed by VGB that builds a gesture database; with this database, an exergame application can process body input from a user in real-time. A filter needs to be applied to the raw per frame results to reduce noise and jitter in the skeleton. AdaBoostTrigger provides a low-latency filter as a simple sliding window of N (=4) frames, summing up the results and comparing it against a threshold value. Filter parameters were sampled to determine optimal values for the yoga postures to reduce false positives (FPs) and false negatives (FNs).

Gesture analysis of yoga positions with 3D room sensor
Biomedical research aimed at a deeper understanding of yoga's benefits and physiological mechanisms has become an active area of study. To decrease the disparity between populations who can readily access yoga classes and therapies, benefits of yoga could be implemented in an exergame format in clinical or home environments. This platform could be installed with low-cost hardware using the cloud for analytics and data collection. We analysed yoga posture alignment using a 3D room sensor to produce a physical activity exergame for specific groups, such as young adults. This research utilises gesture analysis software to provide skill improvement feedback to students in a yoga course setting.
We positioned a yoga mat two metres in front of a Kinect sensor in both sagittal or perpendicular and frontal view orientations [16]. Sagittal view orientation was slightly more accurate than frontal, but not significantly different (Table 1). Inaccuracy was measured as the number of imputed joint positions observed during the pose, out of a total of 20 joints. Standing poses were significantly more accurate than seated or supine body orientations. The Kinect skeleton algorithm becomes confused when the subjects head is below the waist. This is presumably why no yoga commercial product exists in contrast to exergames such as bowling, dance, Zumba or Chopra meditation. Instability of joint positions in the skeleton data stream was more severe for seated poses as evidenced by large fluctuations between image frames that resulted in visible displacements of joints that are described as jitter. As a result, seven postures were selected from Table 1   stream, the maximum is 20 for most inaccurate. This could be seen as jitter in visible displacements of joints.

Training set captured from yoga instructors
We recorded six yoga instructors while they performed a series of yoga postures using Kinect Studio, and then converted those to processed clips using KSConvert. Recorded clips were tagged or labelled in all of the frames in the recordings that defined a yoga gesture by consensus of two yoga instructors. We recorded 17 separate clips for construction of the training set, with AdaBoost performance as an error percentage plotted against training set size in Fig. 2. As the number of expert clips was added to the training set, the average root mean square (RMS) of the output error eventually decreases, along with FPs and FNs. Default settings in Kinect VGB produced solutions with high TPs (99.5%) and low FPs (0.03%) for most yoga postures sampled. Body data was duplicated and mirrored in order to have a larger set of training data. Also, the hand states were not used for training and detection.

Test set analysed from yoga class participants
We measured yoga posture alignment over the course of a 10-week period in an IRB approved study. A convenience sample of 20 undergraduate students with minimal yoga experience was recruited. Depth stream and skeleton coordinates for the 20 participants were acquired and analysed against the previously trained solution. From each captured frame the raw TP, FP, TN, and FN counts were used to calculate several statistical measures (sensitivity, specificity, precision, accuracy, informedness, F1, FPR, and MCC) for assessing gesture learning. Measures with larger ranges and consistently increasing values towards yoga 'expert' or teacher level could track yoga pose performance gains over the study period. Statistical sensitivity for the mountain, forward bend, and upward solute poses showed consistent trends that approached expert levels, over the study period (Table 2). Sensitivity (also called the TP rate, the recall, or probability of detection in some fields) measures the proportion of positives that are correctly identified as such and therefore quantifies the avoidance of FNs. Based on these results the higher sensitivity score predicts greater training and closer the postures were to the 'gold standard'.
Accuracy was high (above 0.9) for all postures, along with specificity closer to 1 for all periods and the expert case. Precision, a measure of random errors, was lower with some values near 0.6, but near 0.9 generally for all gestures and periods. This could be due to the variability in the frame tagging process, or differences between the experimental setups like camera orientation. Statistical informedness, the probability of an informed classification as opposed to a random guess, also showed consistent trends as with sensitivity. Informedness or Youden's index is a way of summarising the performance of classification. Its value ranges from −1 to 1 and has a zero value when the same proportion of positive results for in-pose or not in-pose, i.e. the classifier is useless. A value of 1 indicates that there is no FPs or FNs, i.e. the classification is perfect. The index gives equal weight to FP and FN values, so all trials with the same value of the index give the same proportion of total misclassified results. This index showed a strong consistent increase over the period of participant yoga training, approaching the expert values from 0.98 to 0.99, with the exception of mountain pose at a lower 0.91 value.
Analysis of summary statistics for mountain pose comparing initial, mid-session, and final session captures showed sensitivity and informedness with the most significant t-test between initial and final (p = 0.10 and 0.08, respectively). Sensitivity went from 0.78 to 0.87, while the expert test clip (not in the 17 training clips) scored 0.94. Informedness went from 0.70 to 0.79, while the expert test clip (not in the 17 training clips) scored 0.91. Based on these results the higher informedness score predicts greater training and closer the postures were to the 'gold standard'. Statistical measures of specificity, precision, accuracy, and F1 generally showed little variation over the course of participant yoga training, and the values were less than but close to the test clip from a yoga expert. The yoga experience level was self-assessed utilising a Likert scale from 1 to 10 (10 being an expert instructor level) for the level of yoga proficiency. Since the level of experience was similar for all student participants, no significance was found between participant experience and any of the statistical measures.

Building the exergame from trained AdaBoost classifier
The design guidelines for exergame narrative and graphics functions can be made culturally appropriate [17] for the target participants to maximise therapeutic benefit and compliance. The design goals are modularised into components that are written in C#. Exergame core programming uses Microsoft's Kinect SDK 2.0 using the skeleton data steam output for up to four participants with background scenes showing the desired positions from a yoga instructor. A state-aware programming model uses a skeleton classifier developed from the prior supervised learning of yoga instructor positions [18]. Additional software module add-ins will utilise the depth stream to measure participant heights and limb lengths for assessing stretch, respiration during yoga position holds. Facial recognition application programming interface (APIs) [19] could be used to correlate with the perceived pain levels if appropriate for selected participants, such as sickle cell disease adolescents. The final development could use XNA to allow portability and ease of distribution through XBOX marketplace as independent vendor software. Target market for youth community centres and health clinics with appropriate space.

Discussion and conclusion
Gesture analysis for yoga alignment training may be a useful tool for the development of home and clinical YT for hard to reach populations [20]. The experimental exergame developed here provides a tool that scores the performance of yoga postures and provides improvement metrics [21]. Statistical measurements have been shown here to be able to track changes of yoga posture learning in young adults over a 10-week course. This could be useful for detecting adherence in home-based YT. Prior research by others has shown that even short-term yoga-based lifestyle interventions were efficacious in weight loss, inflammation and stress and positively influenced cardiovascular risk factors. Our plans are to target special populations with YT and study the potential effects of body mass and age on posture alignment and limb stretch.

Acknowledgments
We acknowledge partial support from 8G12MD007602, 8U54MD007588 to Morehouse School of Medicine, Atlanta, GA (to W.S.) from the NIH/NIMHD, and the University of North Georgia, Gainesville, GA (to P.P.). Seftec (W.S.) is a bioinformatics company transforming healthcare with smart and connected technologies.