Skip to content

7otion/hearpoint

Repository files navigation

HearPoint

A Windows accessibility toolkit that enables users with visual or cognitive disabilities to select screen regions and have the text read aloud using text-to-speech technology.

App image

Features

  • Screen Region Selection: Capture text from any area on your screen using mouse selection
  • OCR (Optical Character Recognition): Extract text from images using advanced OCR engines
  • Text-to-Speech: High-quality voice synthesis with multiple TTS engines
  • Translation Support: Optional translation of captured text before speech synthesis
  • Global Shortcuts: Customizable keyboard shortcuts for quick access
  • Settings Management: Comprehensive configuration for all features
  • Multi-language Support: Localization and OCR language selection

Technology Stack

  • Backend: Rust with Tauri 2
  • Frontend: React 19 + TypeScript + Vite
  • UI Framework: Tailwind CSS + shadcn/ui components
  • Database: SQLite with SQLx
  • Package Manager: Bun
  • Platform: Windows only

Prerequisites

Before you begin, ensure you have the following installed:

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/hearpoint.git
    cd hearpoint
  2. Install frontend dependencies:

    bun install
    # or
    npm install
  3. Install Rust dependencies and build the project:

    bun tauri build
    # or
    npm run tauri build

The built application will be available in src-tauri/target/release/.

Development

bun tauri dev

Architecture

Core Pipeline (v0.1.0)

  1. Input Capture: Low-level mouse hooks detect region selection
  2. Screen Capture: DXGI captures the selected screen area
  3. OCR Processing: Text extraction using configurable OCR engines
  4. Translation (optional): Text translation with caching
  5. TTS Synthesis: Audio generation using various TTS engines
  6. Audio Output: Playback through system audio

License

License: GPL v3 This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Roadmap

  • v0.1.0: Core screen capture → OCR → TTS pipeline ✅
  • v0.1.1: Translation caching and optimization ✅
  • v0.1.2: On-demand service downloads ✅
  • v0.2.0: Voice command recognition (ASR)
  • v0.3.0: Game template matching for automated capture
  • v1.0: Advanced voice commands and accessibility features

About

A Windows accessibility toolkit that enables users with visual or cognitive disabilities to select screen regions and have the text read aloud using text-to-speech technology.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors