Ham2Pose: Animating Sign Language Notation into Pose Sequences

1Reichman University 2Bar-Ilan University

Abstract

Translating spoken languages into Sign languages is necessary for open communication between the hearing and hearing-impaired communities. To achieve this goal, we propose the first method for animating a text written in HamNoSys, a lexical Sign language notation, into signed pose sequences. As HamNoSys is universal, our proposed method offers a generic solution invariant to the target Sign language. Our method gradually generates pose predictions using transformer encoders that create meaningful representations of the text and poses while considering their spatial and temporal information. We use weak supervision for the training process and show that our method succeeds in learning from partial and inaccurate data. Additionally, we offer a new distance measurement for pose sequences, normalized Dynamic Time Warping (nDTW), based on DTW over normalized keypoints trajectories, and validate its correctness using AUTSL, a large-scale Sign language dataset. We show that it measures the distance between pose sequences more accurately than existing measurements and use it to assess the quality of our generated pose sequences. Code for the data pre-processing, the model, and the distance measurement is publicly released for future research.

Method

Given a sign written in HamNoSys, our model generates a sequence of frames signing the desired sign. The model is composed of two parts: the text processor, responsible for the HamNoSys text encoding and predicting the length of the generated pose sequence; and the pose generator, responsible for the pose sequence generation. The sign is generated gradually over T steps, starting from a single given reference pose frame, duplicated to the length of the signed video. In each time step t from T to 1, the model predicts the required change from step t to step t-1. After T steps, the pose generator outputs the final pose sequence.

Results

Our prediction from the relevant HamNoSys is presented alongside the ground truth pose extracted from the original video. Note that eventhough some keypoints are missing or incorrect in the ground truth pose, our prediction is full and correct.

BibTeX

If you find this research useful, please cite the following:

@article{shalev2022ham2pose,
  title={Ham2Pose: Animating Sign Language Notation into Pose Sequences},
  author={Shalev-Arkushin, Rotem and Moryossef, Amit and Fried, Ohad},
  journal={arXiv preprint arXiv:2211.13613},
  year={2022}
}