HuggingFace nodes for Nodetool - A comprehensive integration that brings state-of-the-art AI models to your workflows.
This package provides a rich set of HuggingFace nodes for integration with Nodetool, allowing you to build powerful AI workflows using cutting-edge models. With support for over 25 different model types, you can create sophisticated pipelines for text, image, audio, and multimodal processing.
-
Stable Diffusion - Generate high-quality images from text prompts using Stable Diffusion models
- Custom width/height settings (256-1024px)
- Configurable inference steps and guidance scale
- Support for negative prompts
- Use cases: Art creation, concept visualization, content generation
-
Stable Diffusion XL - Enhanced image generation with SDXL models
- Higher resolution outputs (up to 1024px)
- Improved image quality and detail
- Support for IP adapters and LoRA models
- Use cases: Marketing materials, game assets, interior design concepts
-
Flux - Next-generation image generation with memory-efficient quantization
- Supports schnell (fast) and dev (high-quality) variants
- Nunchaku quantization (FP16, FP4, INT4) for reduced VRAM usage
- CPU offload support for large models
- Configurable max_sequence_length for prompt complexity
- Use cases: High-fidelity image generation with limited hardware
-
Flux Control - Controlled image generation with depth/canny guidance
- Depth-aware and edge-guided generation
- Control image input for structural guidance
- Quantization support (FP16, FP4, INT4)
- Use cases: Controlled composition, maintaining structure while changing style
-
Chroma - Flux-based model with advanced attention masking
- Professional-quality color control
- Attention slicing for memory optimization
- Use cases: Professional photography effects, precise color grading
-
Qwen-Image - High-quality general-purpose text-to-image generation
- Nunchaku quantization support
- True CFG scale control
- Use cases: General-purpose image generation, quick prototyping
-
Text2Image (AutoPipeline) - Automatic pipeline selection for any text-to-image model
- Auto-detects best pipeline for given model
- Flexible generation without pipeline-specific knowledge
- Use cases: Testing different models, rapid prototyping
- Image to Image - Transform existing images using Stable Diffusion
- Strength parameter controls transformation amount
- Support for style transfer and image variations
- Use cases: Style transfer, image enhancement, creative remixing
-
Audio Classifier - Classify audio into predefined categories
- Recommended models:
MIT/ast-finetuned-audioset-10-10-0.4593ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition
- Use cases: Music genre classification, speech detection, environmental sounds, emotion recognition
- Recommended models:
-
Zero-Shot Audio Classifier - Classify audio without predefined categories
- Flexible classification with custom labels
- Use cases: Dynamic audio categorization, sound identification
-
Whisper - Convert speech to text with multilingual support
- Supports 100+ languages
- Translation mode (translate any language to English)
- Timestamp options (word-level or sentence-level)
- Multiple model sizes (tiny to large-v3)
- Recommended models:
openai/whisper-large-v3- Best accuracyopenai/whisper-large-v3-turbo- Fast inferenceopenai/whisper-small- Lightweight option
- Use cases: Transcription, translation, subtitle generation, voice interfaces
-
ChunksToSRT - Convert transcription chunks to SRT subtitle format
- Automatic timestamp formatting
- Time offset support
- Use cases: Video subtitling, accessibility features
-
Text-to-Speech - Generate natural-sounding speech from text
- Multiple voice options
- Configurable speaking rate and pitch
- Use cases: Voiceovers, accessibility, content creation
-
Text-to-Audio - Generate audio effects and sounds from text descriptions
- Creative sound generation
- Use cases: Sound effects, audio design, music production
- Text Generation - Generate text using large language models
- Streaming output support
- Extensive model support including:
- Qwen3 series (0.6B to 32B parameters)
- Meta Llama 3.1 series
- Ministral 3 series
- Gemma 3 series
- TinyLlama for lightweight deployment
- Quantized model support (BitsAndBytes 4-bit)
- Configurable parameters:
- Temperature (0.0-2.0) - Controls randomness
- Top-p (0.0-1.0) - Controls diversity
- Max tokens (up to 512 default)
- GGUF model support for efficient inference
- Use cases: Chatbots, content generation, code completion, creative writing
-
Text Classification - Classify text into categories
- Sentiment analysis
- Topic categorization
- Use cases: Content moderation, sentiment analysis, document organization
-
Token Classification - Identify and classify tokens in text
- Named entity recognition (NER)
- Part-of-speech tagging
- Use cases: Information extraction, text analysis
-
Fill Mask - Predict masked tokens in text
- BERT-style masked language modeling
- Use cases: Text completion, grammar correction
-
Question Answering - Extract answers from context
- Recommended models:
distilbert-base-cased-distilled-squadbert-large-uncased-whole-word-masking-finetuned-squad
- Returns answer with confidence score and position
- Use cases: Document Q&A, customer support, information retrieval
- Recommended models:
-
Table Question Answering - Query tabular data with natural language
- Works with DataFrames
- Recommended models:
google/tapas-base-finetuned-wtqmicrosoft/tapex-large-finetuned-tabfact
- Use cases: Database queries, spreadsheet analysis
-
Translation - Translate text between languages
- Multiple language pairs
- Use cases: Localization, multilingual content
-
Summarization - Generate concise summaries of long text
- Extractive and abstractive summarization
- Use cases: Document summarization, news digests
-
Image Classifier - Classify images into predefined categories
- Recommended models:
google/vit-base-patch16-224- Vision Transformermicrosoft/resnet-50- ResNet architectureFalconsai/nsfw_image_detection- Content moderationnateraw/vit-age-classifier- Age estimation
- Returns confidence scores for each category
- Use cases: Content moderation, photo organization, age detection
- Recommended models:
-
Zero-Shot Image Classifier - Classify images without training data
- Uses CLIP models for flexible classification
- Custom candidate labels
- Recommended models:
openai/clip-vit-base-patch32laion/CLIP-ViT-H-14-laion2B-s32B-b79K
- Use cases: Dynamic categorization, custom tagging
-
Image Segmentation - Segment images into different regions
- Instance and semantic segmentation
- Use cases: Object isolation, background removal
-
Object Detection - Detect and locate objects in images
- Bounding box outputs
- Multi-object detection
- Use cases: Surveillance, counting, automation
-
Depth Estimation - Estimate depth from 2D images
- Monocular depth prediction
- Use cases: 3D reconstruction, AR/VR, robotics
-
Text-to-Video (CogVideoX) - Generate videos from text prompts
- Large diffusion transformer model
- High-quality, consistent video generation
- Longer video sequences
- Use cases: Video content creation, animated storytelling, marketing videos, cinematic content
-
Image-to-Video - Convert static images into video sequences
- Animate still images
- Add motion to photographs
- Use cases: Photo animation, creating video from stills, dynamic presentations
-
Image to Text - Generate captions for images
- Automatic image captioning
- Use cases: Accessibility, content tagging, image search
-
Image-Text-to-Text - Process images with text queries
- Visual question answering
- Image reasoning with text context
- Use cases: Document understanding, visual Q&A, scene description
-
Multimodal - Process both image and text inputs
- Vision-language models
- Combined visual and textual understanding
- Use cases: Complex visual reasoning, document analysis, multimodal search
-
LoRA Selector - Apply LoRA models to Stable Diffusion
- Combine up to 5 LoRA models
- Adjustable strength per LoRA (0.0-2.0)
- 60+ pre-configured style LoRAs including:
- Art styles (anime, pixel art, 3D render)
- Character styles (Ghibli, Arcane, One Piece)
- Visual effects (fire, lightning, water)
- Use cases: Style customization, character consistency, artistic effects
-
LoRA Selector XL - Apply LoRA models to Stable Diffusion XL
- SDXL-specific LoRA support
- Enhanced quality for high-resolution outputs
- Use cases: High-quality style transfer, professional artwork
- Feature Extraction - Extract embeddings from text or images
- Generate vector representations
- Use cases: Semantic search, similarity matching, clustering
- Sentence Similarity - Compute similarity between text pairs
- Use cases: Duplicate detection, semantic search
- Ranking - Rank documents by relevance
- Use cases: Search engines, recommendation systems
pip install nodetool-huggingfaceOr install from source:
git clone https://github.com/nodetool-ai/nodetool-huggingface.git
cd nodetool-huggingface
pip install -e .- Python 3.10+
- PyTorch 2.9.0+
- CUDA support recommended for optimal performance
- See pyproject.toml for full dependencies
from nodetool.nodes.huggingface.text_generation import TextGeneration
from nodetool.workflows.processing_context import ProcessingContext
# Create a text generation node
text_gen = TextGeneration(
model=HFTextGeneration(repo_id="Qwen/Qwen2.5-7B-Instruct"),
prompt="Write a short story about a robot learning to paint",
max_new_tokens=512,
temperature=0.8,
top_p=0.9
)
# Process in your workflow
result = await text_gen.process(context)
print(result) # Generated textfrom nodetool.nodes.huggingface.text_to_image import StableDiffusion
# Create an image generation node
sd = StableDiffusion(
prompt="A serene landscape with mountains and a lake at sunset, highly detailed",
negative_prompt="blurry, low quality, distorted",
width=512,
height=512,
num_inference_steps=50,
guidance_scale=7.5,
seed=42
)
# Generate image
output = await sd.process(context)
# output['image'] contains the generated ImageReffrom nodetool.nodes.huggingface.automatic_speech_recognition import Whisper
# Create a Whisper transcription node
whisper = Whisper(
model=HFAutomaticSpeechRecognition(repo_id="openai/whisper-large-v3"),
audio=audio_input,
task=Task.TRANSCRIBE,
language=WhisperLanguage.ENGLISH,
timestamps=Timestamps.WORD
)
# Transcribe audio
result = await whisper.process(context)
print(result['text']) # Transcribed text
print(result['chunks']) # Word-level timestampsfrom nodetool.nodes.huggingface.image_classification import ImageClassifier
# Create an image classifier node
classifier = ImageClassifier(
model=HFImageClassification(repo_id="google/vit-base-patch16-224"),
image=image_input
)
# Classify image
results = await classifier.process(context)
# Returns dict of {label: confidence_score}Here's an example of a complete workflow that transcribes audio, generates a summary, and creates an image:
# Step 1: Transcribe audio
transcription = await whisper_node.process(context)
# Step 2: Summarize the transcription
summary_node = TextGeneration(
prompt=f"Summarize the following text in 2-3 sentences: {transcription['text']}",
max_new_tokens=256
)
summary = await summary_node.process(context)
# Step 3: Generate an image based on the summary
image_node = StableDiffusion(
prompt=f"Create an illustration for: {summary}",
width=768,
height=512
)
image = await image_node.process(context)- 25+ Node Types: Comprehensive coverage of HuggingFace model types
- Streaming Output: Real-time generation for text and audio
- Quantization: Memory-efficient inference with Nunchaku (FP4, INT4)
- GPU Optimization: Automatic device management and VRAM optimization
- CPU Offload: Run large models on limited hardware
- LoRA Support: Easy style customization for Stable Diffusion
- Multimodal Processing: Combine text, image, and audio in workflows
- Batch Processing: Process multiple inputs efficiently
- Custom Models: Use any HuggingFace model repo
- Fine-tuning Ready: Support for custom LoRA models
- Recommended Models: Curated model lists for each node type
- Flexible Parameters: Full control over generation parameters
- Type Safety: Full Pydantic type validation
- Error Handling: Comprehensive error messages
- Progress Tracking: Real-time progress updates for long operations
- Memory Management: Automatic cleanup and optimization
- Documentation: Detailed docstrings and use cases for all nodes
The package includes several pre-built workflow examples that demonstrate how to use the nodes:
- Image to Image - Transform images using Stable Diffusion
- Movie Posters - Generate movie poster-style images
- Transcribe Audio - Convert speech to text with Whisper
- Pokemon Maker - Generate Pokemon-style creatures
- Depth Estimation - Extract depth information from images
- Add Subtitles To Video - Automatically generate and add subtitles
- Object Detection - Detect and locate objects in images
- Summarize Audio - Transcribe and summarize audio content
- Segmentation - Segment images into regions
- Audio To Spectrogram - Visualize audio as spectrograms
These examples are located in src/nodetool/examples/nodetool-huggingface/ and can be imported directly into Nodetool.
Models are automatically downloaded from HuggingFace Hub on first use. For better performance:
- Set your
HF_TOKENenvironment variable for gated models - Use
huggingface-cli loginto authenticate - Models are cached in
~/.cache/huggingface/by default - Use
allow_patternsto download only necessary files
Some models (like FLUX) require accepting terms on HuggingFace:
- Visit the model page on HuggingFace
- Accept the terms of use
- Set your
HF_TOKENin Nodetool settings
- Use quantized models (INT4, FP4) for reduced VRAM usage
- Enable CPU offload for large models
- Use smaller model variants when possible
- Enable attention slicing for memory-intensive operations
- Use CUDA/GPU when available
- Select appropriate model sizes (tiny/small vs large)
- Use optimized models (e.g., whisper-large-v3-turbo)
- Enable PyTorch 2 attention (automatic)
- Fast + Low Memory: Quantized models with CPU offload
- Balanced: FP16 models on GPU
- Best Quality: Full precision models with high inference steps
CUDA Out of Memory
- Enable CPU offload in advanced node properties
- Use quantized models (INT4/FP4)
- Reduce image size or inference steps
- Close other GPU applications
Model Not Found
- Ensure model is downloaded first
- Check HuggingFace Hub for model availability
- Verify
HF_TOKENis set for gated models
Slow Inference
- Check if CUDA is available and being used
- Use smaller or quantized models
- Enable attention optimizations
- Consider using turbo/fast variants
AGPL
Contributions are welcome! Please feel free to submit a Pull Request.
git clone https://github.com/nodetool-ai/nodetool-huggingface.git
cd nodetool-huggingface
pip install -e .- Create a new node class in
src/nodetool/nodes/huggingface/ - Inherit from
HuggingFacePipelineNodeorBaseNode - Implement
preload_model()andprocess()methods - Add docstrings with use cases
- Include recommended models