Speech Therapy app
https://victor-gp.github.io/figmatrix/
Overview
This app is an interactive speech therapy application designed to help children with speech issues practice pronunciation of challenging sounds through tongue twisters and targeted sentences. The application provides real-time feedback on pronunciation accuracy by comparing spoken words against expected text.
How It Works
- Sentence Display: The application presents sentences and tongue twisters tailored to specific speech sounds that children struggle with.
- Interactive Practice: Users speak each word into their device’s microphone, with visual guidance showing which word to pronounce next.
- Space Bar Navigation: Press the space bar between words to move the “cursor” to the next word in the sequence.
- Real-time Validation: Your pronunciation is instantly validated against the expected word using advanced speech recognition technology.
- Instant Feedback: Receive feedback on whether your pronunciation matches the target word, helping you improve over time.
Target Audience
This application is specifically designed for:
- Speech-language pathologists and therapists
- Children with speech articulation disorders
Getting Started
Prerequisites
- Python 3.8+
- Node.js (for frontend development)
- ElevenLabs API account and API key
Installation
- Clone the repository
- Install backend dependencies:
pip install -r requirements.txt
- Install frontend dependencies:
cd frontend && npm install
- Set up your ElevenLabs API key on
.env.example
- Start the backend server:
python main.py
- Start the frontend development server:
cd frontend && npm run dev
Usage
- Open the application in your web browser
- Allow microphone access when prompted
- Select a tongue twister or sentence to practice
- Speak each word clearly into your microphone
- Press space bar to advance to the next word
- Receive instant feedback on your pronunciation accuracy
Stack
APIs & Services
- ElevenLabs Speech-to-Text API - Converts spoken audio to text for comparison
- Web Audio API - Captures microphone input from the browser
Backend
- ElevenLabs Python SDK - Interface for speech-to-text conversion
- Python 3.x - Core programming language
- FastAPI - Modern, fast web framework for building APIs
Frontend
- HTML5 - Structure and semantics
- CSS3 - Styling and responsive design
- JavaScript/TypeScript - Client-side interactivity
- Vite - Build tool and development server
- React - User interface components
Key Features
- Real-time Speech Recognition: Leverages ElevenLabs’ advanced speech-to-text technology
- Pronunciation Comparison: Sophisticated string matching algorithms with configurable similarity thresholds
- Progressive Word Navigation: Space bar controlled word-by-word progression
- Responsive Design: Works across desktop and mobile devices
- Audio Processing: Handles PCM audio encoding and decoding
- Error Handling: Robust error management for various speech recognition scenarios
The Team
- Afaf Driouech
- Daniele Pala
- Rahimakhan Abduqodirova
- Thao Phuong Pham
- Victor Gonzalez Prieto