A Windows accessibility toolkit that enables users with visual or cognitive disabilities to select screen regions and have the text read aloud using text-to-speech technology.
- Screen Region Selection: Capture text from any area on your screen using mouse selection
- OCR (Optical Character Recognition): Extract text from images using advanced OCR engines
- Text-to-Speech: High-quality voice synthesis with multiple TTS engines
- Translation Support: Optional translation of captured text before speech synthesis
- Global Shortcuts: Customizable keyboard shortcuts for quick access
- Settings Management: Comprehensive configuration for all features
- Multi-language Support: Localization and OCR language selection
- Backend: Rust with Tauri 2
- Frontend: React 19 + TypeScript + Vite
- UI Framework: Tailwind CSS + shadcn/ui components
- Database: SQLite with SQLx
- Package Manager: Bun
- Platform: Windows only
Before you begin, ensure you have the following installed:
- Node.js (v18 or later) or Bun
- Rust (latest stable)
- Visual Studio Build Tools (for Windows development)
-
Clone the repository:
git clone https://github.com/your-username/hearpoint.git cd hearpoint -
Install frontend dependencies:
bun install # or npm install -
Install Rust dependencies and build the project:
bun tauri build # or npm run tauri build
The built application will be available in src-tauri/target/release/.
bun tauri dev- Input Capture: Low-level mouse hooks detect region selection
- Screen Capture: DXGI captures the selected screen area
- OCR Processing: Text extraction using configurable OCR engines
- Translation (optional): Text translation with caching
- TTS Synthesis: Audio generation using various TTS engines
- Audio Output: Playback through system audio
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
- v0.1.0: Core screen capture → OCR → TTS pipeline ✅
- v0.1.1: Translation caching and optimization ✅
- v0.1.2: On-demand service downloads ✅
- v0.2.0: Voice command recognition (ASR)
- v0.3.0: Game template matching for automated capture
- v1.0: Advanced voice commands and accessibility features
