Threat inherent to language models: Misalignment
Threats inherent to language models - Misalignment
Description: The production of content that contradicts the goals, policies, or intentions of developers.
Impact: Depends on the ability of end users to exploit content for malicious purposes such as reputational damage or social engineering, even if terms of service of Gen AI system state that it does not represent the views of developers.
1
vote
Seamus Abshere
shared this idea