I want to detect multiple image-conditioned queries on a single image at one time.
I use the code from OWL_ViT_minimal_example.ipynb for Image-conditioned detection.
target_class_predictions = class_predictor(
image_features=feature_map.reshape(b, h * w, d),
query_embeddings=query_embedding[None, None, ...], # [batch, queries, d]
)
It looks like I can input multiple image queries with only one query_embedding. But, after I get predicted logits and bboxes, I can't figure out corresponding labels and bboxes.
I want to detect multiple image-conditioned queries on a single image at one time.
I use the code from
OWL_ViT_minimal_example.ipynbfor Image-conditioned detection.It looks like I can input multiple image queries with only one query_embedding. But, after I get predicted logits and bboxes, I can't figure out corresponding labels and bboxes.