Data Source & Data Science Approach

Our project used the Indoor Scene Recognition Dataset from MIT. This dataset, which includes high-quality indoor images of airports, perfectly fits our project's needs despite initial concerns about image quality. It is too good to be realistic representations captured by visually impaired individuals. This observation stemmed from our review of the Vizwiz Dataset, where many photos taken by visually impaired persons are blurred, out-of-focus, or do not fully capture targeting items.

To address this discrepancy, we applied random Gaussian and motion blur, exposure adjustment, rotations, and cropping to the images to simulate the conditions of photos taken by visually impaired users. By doing so, we created a robust and proprietary dataset tailored to our task, allowing us to evaluate models using these processed images. In addition, we also performed other extensive EDAs like t-SNE and PCAs to ensure the datasets had enough variety and were suitable for our use case.
Because the output will be descriptive, we generated our golden data annotations. We took 92 images (a random 10% of the dataset) and sent them to GPT4 for description, then manually reviewed them and made them more descriptive. We pay special attention to photo accessibility features, like tactile guide strips and signs. With this dataset, we can generate evaluation scores like Rouge and BERT scores and compare them among different models quickly.