2 results found
-
Threat inherent to language models: Misalignment
Threats inherent to language models - Misalignment
Description: The production of content that contradicts the goals, policies, or intentions of developers.
Impact: Depends on the ability of end users to exploit content for malicious purposes such as reputational damage or social engineering, even if terms of service of Gen AI system state that it does not represent the views of developers.
1 vote -
Filter out "liability generating" (or "compromising") Gen AI outputs
Generative AI outputs that create financial or other liability (for example, by making promises) are filtered before reaching the end-user of the AI system.
For example, https://venturebeat.com/ai/a-chevy-for-1-car-dealer-chatbots-show-perils-of-ai-for-customer-service/. A specially trained LLM or other NLP can be used to flag responses that have compromising content.
This would be added alongside "toxic", "protected", and "sensitive" output, because it is a true fourth category.
0 votes
- Don't see your idea?