LipSpeak

An easy-to-use mobile application to help people with voice disorders communicate effectively. The project received the Hal R. Varian MIDS Capstone Award, which is awarded to the top project out of 19 projects submitted by the graduating cohort.

Description

LipSpeak is focused on helping people with voice disorders, a condition affecting 7.5 million patients in the US alone. Powered by a novel visual keyword spotting neural network architecture (developed by reseachers[1] at University of Oxford), our smartphone app MVP converts user’s lip movements to synthesized speech and is targeted for people who lost their voice due to illness or surgery but are still capable of articulative lip movements.

We intend to use our MVP to contribute to peer and industry learning about how silent speech voicing technologies can be applied to improve the quality of life for people suffering from voice impairments.

Demo

Techniques

  • supervised learning (classification)
  • computer vision with (CNN and LSTM)
  • image embedding
  • multi-GPU inference
  • cloud deployment

Tools

  • PyTorch
  • TensorFlow
  • sckit-learn
  • AWS
  • Flask
  • Firebase
  • Firestore

More Information

More information can be found at the following links:

References

[1 ] Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman. Seeing wake words: Audio-visual keyword spotting, 2020.