GQA 2020 submisstion

I generated the submit_predict.json and submited it to GQA evaluation server. However, I got an accuracy of 0 in test phase, but the result in dev phase makes sense. Is it possible that I predict all wrong answers in test split?

What is wrong with the submission file?