if you need more than one keyword, modify and separate by underscore _
the list of search keywords can be up to 50 characters long
if you modify the keywords, press enter within the field to confirm the new search key
Tag: toxicity
Bibliography items where occurs: 74
- The AI Index 2022 Annual Report / 2205.03468 / ISBN:https://doi.org/10.48550/arXiv.2205.03468 / Published by ArXiv / on (web) Publishing site
- Report highlights
Chapter 3 Technical AI Ethics
Appendix - Implementing Responsible AI: Tensions and Trade-Offs Between Ethics Aspects / 2304.08275 / ISBN:https://doi.org/10.48550/arXiv.2304.08275 / Published by ArXiv / on (web) Publishing site
- III. Interactions between Aspects
- Ethical Considerations and Policy Implications for Large Language Models: Guiding Responsible Development and Deployment / 2308.02678 / ISBN:https://doi.org/10.48550/arXiv.2308.02678 / Published by ArXiv / on (web) Publishing site
- Perturbation
- A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation / 2305.11391 / ISBN:https://doi.org/10.48550/arXiv.2305.11391 / Published by ArXiv / on (web) Publishing site
- 2 Large Language Models
3 Vulnerabilities, Attack, and Limitations
5 Falsification and Evaluation - A Review of the Ethics of Artificial Intelligence and its Applications in the United States / 2310.05751 / ISBN:https://doi.org/10.48550/arXiv.2310.05751 / Published by ArXiv / on (web) Publishing site
- 2. Literature Review
- Language Agents for Detecting Implicit Stereotypes in Text-to-Image Models at Scale / 2310.11778 / ISBN:https://doi.org/10.48550/arXiv.2310.11778 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
3 Agent Benchmark
Appendix A Data Details - Specific versus General Principles for Constitutional AI / 2310.13798 / ISBN:https://doi.org/10.48550/arXiv.2310.13798 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
- Systematic AI Approach for AGI:
Addressing Alignment, Energy, and AGI Grand Challenges / 2310.15274 / ISBN:https://doi.org/10.48550/arXiv.2310.15274 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
- Unpacking the Ethical Value Alignment in Big Models / 2310.17551 / ISBN:https://doi.org/10.48550/arXiv.2310.17551 / Published by ArXiv / on (web) Publishing site
- 3 Investigating the Ethical Values of
Large Language Models
4 Equilibrium Alignment: A Prospective Paradigm for Ethical Value Alignmen - Human participants in AI research: Ethics and transparency in practice / 2311.01254 / ISBN:https://doi.org/10.48550/arXiv.2311.01254 / Published by ArXiv / on (web) Publishing site
- IV. Principles in Practice: Guidelines for AI Research with Human Participants
- How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities / 2311.09447 / ISBN:https://doi.org/10.48550/arXiv.2311.09447 / Published by ArXiv / on (web) Publishing site
- Abstract
1 Introduction
2 Related Work
3 Methodology
4 Experiments - Ethical Implications of ChatGPT in Higher Education: A Scoping Review / 2311.14378 / ISBN:https://doi.org/10.48550/arXiv.2311.14378 / Published by ArXiv / on (web) Publishing site
- Research Method
- Control Risk for Potential Misuse of Artificial Intelligence in Science / 2312.06632 / ISBN:https://doi.org/10.48550/arXiv.2312.06632 / Published by ArXiv / on (web) Publishing site
- 2 Risks of Misuse for Artificial Intelligence in
Science
3 Control the Risks of AI Models in Science
6 Related Works
Appendix B Details of Risks Demonstration in Chemical Science
Appendix C Detailed Implementation of SciGuard - Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates / 2312.06861 / ISBN:https://doi.org/10.48550/arXiv.2312.06861 / Published by ArXiv / on (web) Publishing site
- Data Collection
Study 3: Implications for Responsible AI
A Appendix - FAIR Enough How Can We Develop and Assess a FAIR-Compliant Dataset for Large Language Models' Training? / 2401.11033 / ISBN:https://doi.org/10.48550/arXiv.2401.11033 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
4 Framework for FAIR Data Principles Integration in LLM Development
Appendices - Mapping the Ethics of Generative AI: A Comprehensive Scoping Review / 2402.08323 / ISBN:https://doi.org/10.48550/arXiv.2402.08323 / Published by ArXiv / on (web) Publishing site
- 3 Results
- How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities / 2311.09447 / ISBN:https://doi.org/10.48550/arXiv.2311.09447 / Published by ArXiv / on (web) Publishing site
- B Baseline Setup
C Prompt Templates
D More Results - Legally Binding but Unfair? Towards Assessing Fairness of Privacy Policies / 2403.08115 / ISBN:https://doi.org/10.48550/arXiv.2403.08115 / Published by ArXiv / on (web) Publishing site
- 2 Related Work
5 Representational Fairness - Review of Generative AI Methods in Cybersecurity / 2403.08701 / ISBN:https://doi.org/10.48550/arXiv.2403.08701 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
- A Review of Multi-Modal Large Language and Vision Models / 2404.01322 / ISBN:https://doi.org/10.48550/arXiv.2404.01322 / Published by ArXiv / on (web) Publishing site
- 4 Specific Large Language Models
7 Model Evaluation and Benchmarking - Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Language Model Agents / 2404.06750 / ISBN:https://arxiv.org/abs/2404.06750 / Published by ArXiv / on (web) Publishing site
- Rebooting Machine Ethics
- AI Alignment: A Comprehensive Survey / 2310.19852 / ISBN:https://doi.org/10.48550/arXiv.2310.19852 / Published by ArXiv / on (web) Publishing site
- 4 Assurance
5 Governance - Taxonomy to Regulation: A (Geo)Political Taxonomy for AI Risks and Regulatory Measures in the EU AI Act / 2404.11476 / ISBN:https://doi.org/10.48550/arXiv.2404.11476 / Published by ArXiv / on (web) Publishing site
- 3 A Geo-Political AI Risk Taxonomy
- A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI / 2405.04333 / ISBN:https://doi.org/10.48550/arXiv.2405.04333 / Published by ArXiv / on (web) Publishing site
- 3. A Spectrum of Scenarios of Open Data
for Generative AI
- Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators / 2402.01708 / ISBN:https://doi.org/10.48550/arXiv.2402.01708 / Published by ArXiv / on (web) Publishing site
- 2 Related Work
- Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback / 2404.10271 / ISBN:https://doi.org/10.48550/arXiv.2404.10271 / Published by ArXiv / on (web) Publishing site
- 5. What Is the Format of Human Feedback?
- The Narrow Depth and Breadth of Corporate Responsible AI Research / 2405.12193 / ISBN:https://doi.org/10.48550/arXiv.2405.12193 / Published by ArXiv / on (web) Publishing site
- 3 Motivations for Industry to Engage in Responsible AI Research
S1 Additional Analyses on Engagement Analysis - Current state of LLM Risks and AI Guardrails / 2406.12934 / ISBN:https://doi.org/10.48550/arXiv.2406.12934 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
- A Blueprint for Auditing Generative AI / 2407.05338 / ISBN:https://doi.org/10.48550/arXiv.2407.05338 / Published by ArXiv / on (web) Publishing site
- 6 Application audits
7 Clarifications and limitations - Bridging the Global Divide in AI Regulation: A Proposal for a Contextual, Coherent, and Commensurable Framework / 2303.11196 / ISBN:https://doi.org/10.48550/arXiv.2303.11196 / Published by ArXiv / on (web) Publishing site
- IV. Proposing an Alternative 3C Framework
- Open Artificial Knowledge / 2407.14371 / ISBN:https://doi.org/10.48550/arXiv.2407.14371 / Published by ArXiv / on (web) Publishing site
- 2. Key Challenges of Artificial Data
3. OAK Dataset - Mapping the individual, social, and biospheric impacts of Foundation Models / 2407.17129 / ISBN:https://doi.org/10.48550/arXiv.2407.17129 / Published by ArXiv / on (web) Publishing site
- 4 Mapping Individual, Social, and Biospheric Impacts of Foundation
Models
- AI for All: Identifying AI incidents Related to Diversity and Inclusion / 2408.01438 / ISBN:https://doi.org/10.48550/arXiv.2408.01438 / Published by ArXiv / on (web) Publishing site
- 4 Results
- Improving Large Language Model (LLM) fidelity through context-aware grounding: A systematic approach to reliability and veracity / 2408.04023 / ISBN:https://doi.org/10.48550/arXiv.2408.04023 / Published by ArXiv / on (web) Publishing site
- 3. Proposed framework
- The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources / 2406.16746 / ISBN:https://doi.org/10.48550/arXiv.2406.16746 / Published by ArXiv / on (web) Publishing site
- 2 Methodology & Guidelines
4 Data Preparation
8 Model Evaluation - Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks / 2408.12806 / ISBN:https://doi.org/10.48550/arXiv.2408.12806 / Published by ArXiv / on (web) Publishing site
- II. Related Work
- DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection / 2409.06072 / ISBN:https://doi.org/10.48550/arXiv.2409.06072 / Published by ArXiv / on (web) Publishing site
- Abstract
3 Data Details - Navigating LLM Ethics: Advancements, Challenges, and Future Directions / 2406.18841 / ISBN:https://doi.org/10.48550/arXiv.2406.18841 / Published by ArXiv / on (web) Publishing site
- IV. Findings and Resultant Themes
- XTRUST: On the Multilingual Trustworthiness of Large Language Models / 2409.15762 / ISBN:https://doi.org/10.48550/arXiv.2409.15762 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
3 XTRUST Construction
4 Experiments
Appendices - Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions / 2409.16974 / ISBN:https://doi.org/10.48550/arXiv.2409.16974 / Published by ArXiv / on (web) Publishing site
- 6 Methodologies & Capabilities (RQ2)
7 Limitations & Considerations (RQ3)
8 Discussion - Investigating Labeler Bias in Face Annotation for Machine Learning / 2301.09902 / ISBN:https://doi.org/10.48550/arXiv.2301.09902 / Published by ArXiv / on (web) Publishing site
- 1. Introduction
2. Related Work
5. Discussion - From human-centered to social-centered artificial intelligence: Assessing ChatGPT's impact through disruptive events / 2306.00227 / ISBN:https://doi.org/10.48550/arXiv.2306.00227 / Published by ArXiv / on (web) Publishing site
- Abstract
Introduction
The multiple levels of AI impact
Discussion - Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models
/ 2410.12880 / ISBN:https://doi.org/10.48550/arXiv.2410.12880 / Published by ArXiv / on (web) Publishing site
- 4 Cultural safety dataset
- Data Defenses Against Large Language Models / 2410.13138 / ISBN:https://doi.org/10.48550/arXiv.2410.13138 / Published by ArXiv / on (web) Publishing site
- 2 Ethics of Resisting LLM Inference
- Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems / 2410.13334 / ISBN:https://doi.org/10.48550/arXiv.2410.13334 / Published by ArXiv / on (web) Publishing site
- Refefences
- Jailbreaking and Mitigation of Vulnerabilities in Large Language Models / 2410.15236 / ISBN:https://doi.org/10.48550/arXiv.2410.15236 / Published by ArXiv / on (web) Publishing site
- II. Background and Concepts
V. Evaluation and Benchmarking - The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships / 2410.20130 / ISBN:https://doi.org/10.48550/arXiv.2410.20130 / Published by ArXiv / on (web) Publishing site
- 2 Related Work
4 Results
5 Discussion - Examining Human-AI Collaboration for Co-Writing Constructive Comments Online / 2411.03295 / ISBN:https://doi.org/10.48550/arXiv.2411.03295 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
2 Related Work
3 Methods
4 Findings
5 Discussion - A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions / 2406.03712 / ISBN:https://doi.org/10.48550/arXiv.2406.03712 / Published by ArXiv / on (web) Publishing site
- V. Applying Medical LLMs
- Bias in Large Language Models: Origin, Evaluation, and Mitigation / 2411.10915 / ISBN:https://doi.org/10.48550/arXiv.2411.10915 / Published by ArXiv / on (web) Publishing site
- 4. Bias Evaluation
6. Ethical Concerns and Legal Challenges - Good intentions, unintended consequences: exploring forecasting harms
/ 2411.16531 / ISBN:https://doi.org/10.48550/arXiv.2411.16531 / Published by ArXiv / on (web) Publishing site
- 2 Harms in forecasting
- Large Language Model Safety: A Holistic Survey / 2412.17686 / ISBN:https://doi.org/10.48550/arXiv.2412.17686 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
2 Taxonomy
3 Value Misalignment
4 Robustness to Attack
8 Interpretability for LLM Safety
9 Technology Roadmaps / Strategies to LLM Safety in Practice
11 Challenges and Future Directions - INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models / 2501.01973 / ISBN:https://doi.org/10.48550/arXiv.2501.01973 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
- Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto / 2312.01818 / ISBN:https://doi.org/10.48550/arXiv.2312.01818 / Published by ArXiv / on (web) Publishing site
- 2. Learning Morality in Machines
3. Designing AI Agents based on Moral Principles - Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline / 2501.18493 / ISBN:https://doi.org/10.48550/arXiv.2501.18493 / Published by ArXiv / on (web) Publishing site
- 4 Findings
- Safety at Scale: A Comprehensive Survey of Large Model Safety / 2502.05206 / ISBN:https://doi.org/10.48550/arXiv.2502.05206 / Published by ArXiv / on (web) Publishing site
- 3 Large Language Model Safety
5 Vision-Language Model Safety - Position: We Need An Adaptive Interpretation of Helpful, Honest, and Harmless Principles / 2502.06059 / ISBN:https://doi.org/10.48550/arXiv.2502.06059 / Published by ArXiv / on (web) Publishing site
- 3 Ambiguity and Conflicts in HHH
- DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life / 2410.02683 / ISBN:https://doi.org/10.48550/arXiv.2410.02683 / Published by ArXiv / on (web) Publishing site
- Appendices
- On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective / 2502.14296 / ISBN:https://doi.org/10.48550/arXiv.2502.14296 / Published by ArXiv / on (web) Publishing site
- 2 Background
4 Designing TrustGen, a Dynamic Benchmark Platform for Evaluating the Trustworthiness of GenFMs
8 Other Generative Models - Examining Human-AI Collaboration for Co-Writing Constructive Comments Online / 2411.03295 / ISBN:https://doi.org/10.48550/arXiv.2411.03295 / Published by ArXiv / on (web) Publishing site
- A Appendix
- Towards interactive evaluations for interaction harms in human-AI systems / 2405.10632 / ISBN:https://doi.org/10.48550/arXiv.2405.10632 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
2 An overview of the generative AI evaluation landscape
3 Why current evaluations approaches are insufficient for assessing interaction harms - Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
/ 2502.05151 / ISBN:https://doi.org/10.48550/arXiv.2502.05151 / Published by ArXiv / on (web) Publishing site
- 3 AI Support for Individual Topics and Tasks
4 Ethical Concerns - Who is Responsible? The Data, Models, Users or Regulations? A Comprehensive Survey on Responsible Generative AI for a Sustainable Future / 2502.08650 / ISBN:https://doi.org/10.48550/arXiv.2502.08650 / Published by ArXiv / on (web) Publishing site
- 2 Responsible Generative AI
4 Best Practices for Responsible Generative AI and Existing Frameworks - Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room / 2504.16148 / ISBN:https://doi.org/10.48550/arXiv.2504.16148 / Published by ArXiv / on (web) Publishing site
- 4. Hybrid human-AI methods for responsible AI for education
- Auditing the Ethical Logic of Generative AI Models / 2504.17544 / ISBN:https://doi.org/10.48550/arXiv.2504.17544 / Published by ArXiv / on (web) Publishing site
- Abstract
Seven Contemporary LLMs - AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How / 2504.18044 / ISBN:https://doi.org/10.48550/arXiv.2504.18044 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
2 Background
Appendix - Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room / 2504.16148 / ISBN:https://doi.org/10.48550/arXiv.2504.16148 / Published by ArXiv / on (web) Publishing site
- 4 AI Awareness and AI Capabilities
- AI Awareness / 2504.20084 / ISBN:https://doi.org/10.48550/arXiv.2504.20084 / Published by ArXiv / on (web) Publishing site
- 4 AI Awareness and AI Capabilities
- LLM Ethics Benchmark: A Three-Dimensional Assessment System for Evaluating Moral Reasoning in Large Language Models / 2505.00853 / ISBN:https://doi.org/10.48550/arXiv.2505.00853 / Published by ArXiv / on (web) Publishing site
- 2 Related Work
- Emotions in the Loop: A Survey of Affective Computing for Emotional Support / 2505.01542 / ISBN:https://doi.org/10.48550/arXiv.2505.01542 / Published by ArXiv / on (web) Publishing site
- IV. Applications and Approaches
VI. Datasets for Emotion Management and Sentiment Analysis - Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs / 2505.02009 / ISBN:https://doi.org/10.48550/arXiv.2505.02009 / Published by ArXiv / on (web) Publishing site
- Abstract
1 Introduction
2 Related Works
6 Results
7 Conclusion - Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach / 2505.09576 / ISBN:https://doi.org/10.48550/arXiv.2505.09576 / Published by ArXiv / on (web) Publishing site
- II Background
- Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data / 2505.09974 / ISBN:https://doi.org/10.48550/arXiv.2505.09974 / Published by ArXiv / on (web) Publishing site
- 3 Methodology
- SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use / 2505.17332 / ISBN:https://doi.org/10.48550/arXiv.2505.17332 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
- Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods / 2505.17870 / ISBN:https://doi.org/10.48550/arXiv.2505.17870 / Published by ArXiv / on (web) Publishing site
- 3 Discussion and Limitations
- Exploring Societal Concerns and Perceptions of AI: A Thematic Analysis through the Lens of Problem-Seeking / 2505.23930 / ISBN:https://doi.org/10.48550/arXiv.2505.23930 / Published by ArXiv / on (web) Publishing site
- Discussion
- Locating Risk: Task Designers and the Challenge of Risk Disclosure in RAI Content Work / 2505.24246 / ISBN:https://doi.org/10.48550/arXiv.2505.24246 / Published by ArXiv / on (web) Publishing site
- 4 Findings