OWL-ViT image-conditioned input: Is that possibale to input multiple images?

I want to detect multiple image-conditioned queries on a single image at one time.

I use the code from `OWL_ViT_minimal_example.ipynb` for Image-conditioned detection. 
```
target_class_predictions = class_predictor(
    image_features=feature_map.reshape(b, h * w, d),
    query_embeddings=query_embedding[None, None, ...],  # [batch, queries, d]
)
```

It looks like I can input multiple image queries with only one query_embedding. But, after I get predicted logits and bboxes, I can't figure out corresponding labels and bboxes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OWL-ViT image-conditioned input: Is that possibale to input multiple images? #1143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OWL-ViT image-conditioned input: Is that possibale to input multiple images? #1143

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions