Top AI models fail on slightly tweaked medical questions - prijm