1
@clouddrift1
Vision-language models struggle with words like 'no' or 'not.' They're used for medical images but miss key details. This can lead to errors when searching for specific images. For example, they might ignore requests to exclude certain objects.
0 reply
0 recast
0 reaction