Spaces:

Kamichanw
/

vqa_accuracy

Runtime error

App Files Files Community

vqa_accuracy / README.md

Kamichanw

Update README.md

df20d76 verified over 1 year ago

preview code

raw

history blame contribute delete

3.15 kB

	---
	title: VQA Accuracy
	emoji: 🔥
	colorFrom: indigo
	colorTo: gray
	sdk: gradio
	sdk_version: 3.19.1
	app_file: app.py
	pinned: false
	---

	# VQAaccuracy Metric Card

	## Metric Description
	The VQAaccuracy metric is used for evaluating the accuracy of visual question answering (VQA) models. It is designed to be robust to the variability in how different humans may phrase their answers. The accuracy for an answer (`ans`) predicted by the model is calculated as:
	$ \text{Acc}(ans) = \min\left(\frac{\# \text{humans that said } ans}{3}, 1\right) $
	This metric aligns with the official VQA evaluation by averaging the machine accuracies over all possible sets of human annotators.

	## How to Use
	The VQAAccuracy metric can be used to evaluate the performance of a VQA model by comparing the predicted answers to a set of ground truth answers. The metric can be integrated into your evaluation pipeline as follows:

	### Inputs
	- predictions (`list` of `str`): The predicted answers generated by the VQA model.
	- references (`list` of `str` lists): The ground truth answers corresponding to each question.
	- answer_types (`list` of `str`, optional): The types of answers corresponding to each question. If not provided, defaults to `None`.
	- question_types (`list` of `str`, optional): The types of questions corresponding to each question. If not provided, defaults to `None`.

	### Output Values
	The output of this metric is a dictionary containing:
	- overall (`float`): The overall VQA accuracy, rounded to the specified precision.
	- perAnswerType (`dict`, optional): The VQA accuracy for each answer type, if provided.
	- perQuestionType (`dict`, optional): The VQA accuracy for each question type, if provided.

	The accuracy values range from 0 to 100, with higher values indicating better performance.

	### Examples
	Here is an example of how to use the VQAaccuracy metric:

	```python
	>>> from evaluate import load
	>>> vqa_accuracy = load("Kamichanw/vqa_accuracy")
	>>> predictions = ["yes", "2", "blue"]
	>>> references = [["yes", "yeah", "yep"], ["2", "two"], ["blue", "bluish"]]
	>>> results = vqa_accuracy.compute(predictions=predictions, references=references)
	>>> print(results)
	{"overall": 24.07}
	```

	## Limitations and Bias
	The VQAaccuracy metric is dependent on the consistency and quality of the ground truth answers provided. Variability in human annotations can affect the accuracy scores. Additionally, the metric is designed specifically for the VQA task and may not generalize well to other types of question-answering models.

	## Citation
	If you use the VQAaccuracy metric in your work, please cite the original VQA paper:

	```bibtex
	@InProceedings{VQA,
	author = {Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh},
	title = {VQA: Visual Question Answering},
	booktitle = {International Conference on Computer Vision (ICCV)},
	year = {2015},
	}
	```

	## Further References
	- [VQA Evaluation](https://visualqa.org/evaluation.html)
	- [VQA GitHub Repository](https://github.com/GT-Vision-Lab/VQA)