if you need more than one keyword, modify and separate by underscore _
the list of search keywords can be up to 50 characters long
if you modify the keywords, press enter within the field to confirm the new search key
Tag: toxic
Bibliography items where occurs: 135
- The AI Index 2022 Annual Report / 2205.03468 / ISBN:https://doi.org/10.48550/arXiv.2205.03468 / Published by ArXiv / Version released on 2022-05-02 / on (web) Publishing site
- Implementing Responsible AI: Tensions and Trade-Offs Between Ethics Aspects / 2304.08275 / ISBN:https://doi.org/10.48550/arXiv.2304.08275 / Published by ArXiv / Version released on 2024-09-06 / on (web) Publishing site
- Ethical Considerations and Policy Implications for Large Language Models: Guiding Responsible Development and Deployment / 2308.02678 / ISBN:https://doi.org/10.48550/arXiv.2308.02678 / Published by ArXiv / Version released on 2023-08-01 / on (web) Publishing site
- A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation / 2305.11391 / ISBN:https://doi.org/10.48550/arXiv.2305.11391 / Published by ArXiv / Version released on 2023-08-27 / on (web) Publishing site
- Getting pwn'd by AI: Penetration Testing with Large Language Models / 2308.00121 / ISBN:https://doi.org/10.48550/arXiv.2308.00121 / Published by ArXiv / Version released on 2023-08-17 / on (web) Publishing site
- The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward / 2308.14253 / ISBN:https://doi.org/10.48550/arXiv.2308.14253 / Published by ArXiv / Version released on 2023-08-28 / on (web) Publishing site
- Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? / 2308.15399 / ISBN:https://doi.org/10.48550/arXiv.2308.15399 / Published by ArXiv / Version released on 2024-07-01 / on (web) Publishing site
- A Review of the Ethics of Artificial Intelligence and its Applications in the United States / 2310.05751 / ISBN:https://doi.org/10.48550/arXiv.2310.05751 / Published by ArXiv / Version released on 2023-10-09 / on (web) Publishing site
- Language Agents for Detecting Implicit Stereotypes in Text-to-Image Models at Scale / 2310.11778 / ISBN:https://doi.org/10.48550/arXiv.2310.11778 / Published by ArXiv / Version released on 2023-11-02 / on (web) Publishing site
- Specific versus General Principles for Constitutional AI / 2310.13798 / ISBN:https://doi.org/10.48550/arXiv.2310.13798 / Published by ArXiv / Version released on 2023-10-20 / on (web) Publishing site
- Systematic AI Approach for AGI:
Addressing Alignment, Energy, and AGI Grand Challenges / 2310.15274 / ISBN:https://doi.org/10.48550/arXiv.2310.15274 / Published by ArXiv / Version released on 2023-10-23 / on (web) Publishing site
- AI Alignment and Social Choice: Fundamental
Limitations and Policy Implications / 2310.16048 / ISBN:https://doi.org/10.48550/arXiv.2310.16048 / Published by ArXiv / Version released on 2023-10-24 / on (web) Publishing site
- Unpacking the Ethical Value Alignment in Big Models / 2310.17551 / ISBN:https://doi.org/10.48550/arXiv.2310.17551 / Published by ArXiv / Version released on 2023-10-26 / on (web) Publishing site
- Human participants in AI research: Ethics and transparency in practice / 2311.01254 / ISBN:https://doi.org/10.48550/arXiv.2311.01254 / Published by ArXiv / Version released on 2024-09-26 / on (web) Publishing site
- How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities / 2311.09447 / ISBN:https://doi.org/10.48550/arXiv.2311.09447 / Published by ArXiv / Version released on 2024-04-02 / on (web) Publishing site
- Assessing AI Impact Assessments: A Classroom Study / 2311.11193 / ISBN:https://doi.org/10.48550/arXiv.2311.11193 / Published by ArXiv / Version released on 2023-11-19 / on (web) Publishing site
- Ethical Implications of ChatGPT in Higher Education: A Scoping Review / 2311.14378 / ISBN:https://doi.org/10.48550/arXiv.2311.14378 / Published by ArXiv / Version released on 2024-06-05 / on (web) Publishing site
- Survey on AI Ethics: A Socio-technical Perspective / 2311.17228 / ISBN:https://doi.org/10.48550/arXiv.2311.17228 / Published by ArXiv / Version released on 2025-11-04 / on (web) Publishing site
- Control Risk for Potential Misuse of Artificial Intelligence in Science / 2312.06632 / ISBN:https://doi.org/10.48550/arXiv.2312.06632 / Published by ArXiv / Version released on 2023-12-11 / on (web) Publishing site
- Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates / 2312.06861 / ISBN:https://doi.org/10.48550/arXiv.2312.06861 / Published by ArXiv / Version released on 2023-12-11 / on (web) Publishing site
- FAIR Enough How Can We Develop and Assess a FAIR-Compliant Dataset for Large Language Models' Training? / 2401.11033 / ISBN:https://doi.org/10.48550/arXiv.2401.11033 / Published by ArXiv / Version released on 2024-04-03 / on (web) Publishing site
- Beyond principlism: Practical strategies for ethical AI use in research practices / 2401.15284 / ISBN:https://doi.org/10.48550/arXiv.2401.15284 / Published by ArXiv / Version released on 2025-06-20 / on (web) Publishing site
- Commercial AI, Conflict, and Moral Responsibility: A theoretical analysis and practical approach to the moral responsibilities associated with dual-use AI technology / 2402.01762 / ISBN:https://doi.org/10.48550/arXiv.2402.01762 / Published by ArXiv / Version released on 2024-01-30 / on (web) Publishing site
- (A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice / 2402.01864 / ISBN:https://doi.org/10.48550/arXiv.2402.01864 / Published by ArXiv / Version released on 2024-02-02 / on (web) Publishing site
- Mapping the Ethics of Generative AI: A Comprehensive Scoping Review / 2402.08323 / ISBN:https://doi.org/10.48550/arXiv.2402.08323 / Published by ArXiv / Version released on 2024-02-13 / on (web) Publishing site
- Evolving AI Collectives to Enhance Human Diversity and Enable Self-Regulation / 2402.12590 / ISBN:https://doi.org/10.48550/arXiv.2402.12590 / Published by ArXiv / Version released on 2024-06-18 / on (web) Publishing site
- Legally Binding but Unfair? Towards Assessing Fairness of Privacy Policies / 2403.08115 / ISBN:https://doi.org/10.48550/arXiv.2403.08115 / Published by ArXiv / Version released on 2024-05-08 / on (web) Publishing site
- Review of Generative AI Methods in Cybersecurity / 2403.08701 / ISBN:https://doi.org/10.48550/arXiv.2403.08701 / Published by ArXiv / Version released on 2024-03-19 / on (web) Publishing site
- Trust in AI: Progress, Challenges, and Future Directions / 2403.14680 / ISBN:https://doi.org/10.48550/arXiv.2403.14680 / Published by ArXiv / Version released on 2024-04-04 / on (web) Publishing site
- A Review of Multi-Modal Large Language and Vision Models / 2404.01322 / ISBN:https://doi.org/10.48550/arXiv.2404.01322 / Published by ArXiv / Version released on 2024-03-28 / on (web) Publishing site
- Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Language Model Agents / 2404.06750 / ISBN:https://arxiv.org/abs/2404.06750 / Published by ArXiv / Version released on 2024-10-18 / on (web) Publishing site
- AI Alignment: A Comprehensive Survey / 2310.19852 / ISBN:https://doi.org/10.48550/arXiv.2310.19852 / Published by ArXiv / Version released on 2025-04-04 / on (web) Publishing site
- Taxonomy to Regulation: A (Geo)Political Taxonomy for AI Risks and Regulatory Measures in the EU AI Act / 2404.11476 / ISBN:https://doi.org/10.48550/arXiv.2404.11476 / Published by ArXiv / Version released on 2024-04-17 / on (web) Publishing site
- Large Language Model Supply Chain: A Research Agenda / 2404.12736 / ISBN:https://doi.org/10.48550/arXiv.2404.12736 / Published by ArXiv / Version released on 2024-04-19 / on (web) Publishing site
- War Elephants: Rethinking Combat AI and Human Oversight / 2404.19573 / ISBN:https://doi.org/10.48550/arXiv.2404.19573 / Published by ArXiv / Version released on 2024-04-30 / on (web) Publishing site
- A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law / 2405.01769 / ISBN:https://doi.org/10.48550/arXiv.2405.01769 / Published by ArXiv / Version released on 2024-11-21 / on (web) Publishing site
- A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI / 2405.04333 / ISBN:https://doi.org/10.48550/arXiv.2405.04333 / Published by ArXiv / Version released on 2024-05-07 / on (web) Publishing site
- Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness / 2405.05930 / ISBN:https://doi.org/10.48550/arXiv.2405.05930 / Published by ArXiv / Version released on 2024-05-09 / on (web) Publishing site
- Redefining Qualitative Analysis in the AI Era: Utilizing ChatGPT for Efficient Thematic Analysis / 2309.10771 / ISBN:https://doi.org/10.48550/arXiv.2309.10771 / Version released on 2024-05-28 / on (web) Publishing site
- Should agentic conversational AI change how we think about ethics? Characterising an interactional ethics centred on respect / 2401.09082 / ISBN:https://doi.org/10.48550/arXiv.2401.09082 / Published by ArXiv / Version released on 2024-05-16 / on (web) Publishing site
- Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators / 2402.01708 / ISBN:https://doi.org/10.48550/arXiv.2402.01708 / Published by ArXiv / Version released on 2024-05-15 / on (web) Publishing site
- Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback / 2404.10271 / ISBN:https://doi.org/10.48550/arXiv.2404.10271 / Published by ArXiv / Version released on 2024-06-04 / on (web) Publishing site
- Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language Models / 2405.07076 / ISBN:https://doi.org/10.48550/arXiv.2405.07076 / Published by ArXiv / Version released on 2024-05-14 / on (web) Publishing site
- The Narrow Depth and Breadth of Corporate Responsible AI Research / 2405.12193 / ISBN:https://doi.org/10.48550/arXiv.2405.12193 / Published by ArXiv / Version released on 2024-05-20 / on (web) Publishing site
- The Ethics of Interaction: Mitigating Security Threats in LLMs / 2401.12273 / ISBN:https://doi.org/10.48550/arXiv.2401.12273 / Published by ArXiv / Version released on 2024-07-10 / on (web) Publishing site
- Current state of LLM Risks and AI Guardrails / 2406.12934 / ISBN:https://doi.org/10.48550/arXiv.2406.12934 / Published by ArXiv / Version released on 2024-06-16 / on (web) Publishing site
- Documenting Ethical Considerations in Open Source AI Models / 2406.18071 / ISBN:https://doi.org/10.48550/arXiv.2406.18071 / Published by ArXiv / Version released on 2024-07-03 / on (web) Publishing site
- AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations / 2406.18346 / ISBN:https://doi.org/10.48550/arXiv.2406.18346 / Published by ArXiv / Version released on 2024-06-26 / on (web) Publishing site
- A Blueprint for Auditing Generative AI / 2407.05338 / ISBN:https://doi.org/10.48550/arXiv.2407.05338 / Published by ArXiv / Version released on 2024-07-07 / on (web) Publishing site
- Bridging the Global Divide in AI Regulation: A Proposal for a Contextual, Coherent, and Commensurable Framework / 2303.11196 / ISBN:https://doi.org/10.48550/arXiv.2303.11196 / Published by ArXiv / Version released on 2024-07-15 / on (web) Publishing site
- Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias / 2407.11360 / ISBN:https://doi.org/10.48550/arXiv.2407.11360 / Published by ArXiv / Version released on 2024-07.16 / on (web) Publishing site
- Open Artificial Knowledge / 2407.14371 / ISBN:https://doi.org/10.48550/arXiv.2407.14371 / Published by ArXiv / Version released on 2024-07-19 / on (web) Publishing site
- Mapping the individual, social, and biospheric impacts of Foundation Models / 2407.17129 / ISBN:https://doi.org/10.48550/arXiv.2407.17129 / Published by ArXiv / Version released on 2024-07-24 / on (web) Publishing site
- AI for All: Identifying AI incidents Related to Diversity and Inclusion / 2408.01438 / ISBN:https://doi.org/10.48550/arXiv.2408.01438 / Published by ArXiv / Version released on 2024-07-19 / on (web) Publishing site
- Improving Large Language Model (LLM) fidelity through context-aware grounding: A systematic approach to reliability and veracity / 2408.04023 / ISBN:https://doi.org/10.48550/arXiv.2408.04023 / Published by ArXiv / Version released on 2024-08-07 / on (web) Publishing site
- The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources / 2406.16746 / ISBN:https://doi.org/10.48550/arXiv.2406.16746 / Published by ArXiv / Version released on 2024-09-03 / on (web) Publishing site
- Promises and challenges of generative artificial intelligence for human learning / 2408.12143 / ISBN:https://doi.org/10.48550/arXiv.2408.12143 / Published by ArXiv / Version released on 2024-09-05 / on (web) Publishing site
- Catalog of General Ethical Requirements for AI Certification / 2408.12289 / ISBN:https://doi.org/10.48550/arXiv.2408.12289 / Published by ArXiv / Version released on 2024-11-15 / on (web) Publishing site
- Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks / 2408.12806 / ISBN:https://doi.org/10.48550/arXiv.2408.12806 / Published by ArXiv / Version released on 2024-08-23 / on (web) Publishing site
- DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection / 2409.06072 / ISBN:https://doi.org/10.48550/arXiv.2409.06072 / Published by ArXiv / Version released on 2024-09-09 / on (web) Publishing site
- LLM generated responses to mitigate the impact of hate speech / 2311.16905 / ISBN:https://doi.org/10.48550/arXiv.2311.16905 / Published by ArXiv / Version released on 2024-10-02 / on (web) Publishing site
- Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models / 2401.16727 / ISBN:https://doi.org/10.48550/arXiv.2401.16727 / Published by ArXiv / Version released on 2024-10-30 / on (web) Publishing site
- Navigating LLM Ethics: Advancements, Challenges, and Future Directions / 2406.18841 / ISBN:https://doi.org/10.48550/arXiv.2406.18841 / Published by ArXiv / Version released on 2025-06-15 / on (web) Publishing site
- ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs / 2409.09586 / ISBN:https://doi.org/10.48550/arXiv.2409.09586 / Published by ArXiv / Version released on 2025-11-04 / on (web) Publishing site
- Reporting Non-Consensual Intimate Media: An Audit Study of Deepfakes / 2409.12138 / ISBN:https://doi.org/10.48550/arXiv.2409.12138 / Published by ArXiv / Version released on 2024-09-18 / on (web) Publishing site
- XTRUST: On the Multilingual Trustworthiness of Large Language Models / 2409.15762 / ISBN:https://doi.org/10.48550/arXiv.2409.15762 / Published by ArXiv / Version released on 2024-09-24 / on (web) Publishing site
- Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions / 2409.16974 / ISBN:https://doi.org/10.48550/arXiv.2409.16974 / Published by ArXiv / Version released on 2024-09-25 / on (web) Publishing site
- DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life / 2410.02683 / ISBN:https://doi.org/10.48550/arXiv.2410.02683 / Published by ArXiv / Version released on 2025-03-15 / on (web) Publishing site
- Investigating Labeler Bias in Face Annotation for Machine Learning / 2301.09902 / ISBN:https://doi.org/10.48550/arXiv.2301.09902 / Published by ArXiv / Version released on 2024-10-24 / on (web) Publishing site
- From human-centered to social-centered artificial intelligence: Assessing ChatGPT's impact through disruptive events / 2306.00227 / ISBN:https://doi.org/10.48550/arXiv.2306.00227 / Published by ArXiv / Version released on 2024-10-25 / on (web) Publishing site
- Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models
/ 2410.12880 / ISBN:https://doi.org/10.48550/arXiv.2410.12880 / Published by ArXiv / Version released on 2025-01-24 / on (web) Publishing site
- Data Defenses Against Large Language Models / 2410.13138 / ISBN:https://doi.org/10.48550/arXiv.2410.13138 / Published by ArXiv / Version released on 2024-10-17 / on (web) Publishing site
- Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems / 2410.13334 / ISBN:https://doi.org/10.48550/arXiv.2410.13334 / Published by ArXiv / Version released on 2024-10-23 / on (web) Publishing site
- Jailbreaking and Mitigation of Vulnerabilities in Large Language Models / 2410.15236 / ISBN:https://doi.org/10.48550/arXiv.2410.15236 / Published by ArXiv / Version released on 2025-11-25 / on (web) Publishing site
- The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships / 2410.20130 / ISBN:https://doi.org/10.48550/arXiv.2410.20130 / Published by ArXiv / Version released on 2025-01-26 / on (web) Publishing site
- Examining Human-AI Collaboration for Co-Writing Constructive Comments Online / 2411.03295 / ISBN:https://doi.org/10.48550/arXiv.2411.03295 / Published by ArXiv / Version released on 2025-07-30 / on (web) Publishing site
- A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions / 2406.03712 / ISBN:https://doi.org/10.48550/arXiv.2406.03712 / Published by ArXiv / Version released on 2024-12-09 / on (web) Publishing site
- The doctor will polygraph you now: ethical concerns with AI for fact-checking patients / 2408.07896 / ISBN:https://doi.org/10.48550/arXiv.2408.07896 / Published by ArXiv / Version released on 2024-11-11 / on (web) Publishing site
- Bias in Large Language Models: Origin, Evaluation, and Mitigation / 2411.10915 / ISBN:https://doi.org/10.48550/arXiv.2411.10915 / Published by ArXiv / Version released on 2024-11-16 / on (web) Publishing site
- Good intentions, unintended consequences: exploring forecasting harms
/ 2411.16531 / ISBN:https://doi.org/10.48550/arXiv.2411.16531 / Published by ArXiv / Version released on 2025-03-12 / on (web) Publishing site
- Bots against Bias: Critical Next Steps for Human-Robot Interaction / 2412.12542 / ISBN:https://doi.org/10.1017/9781009386708.023 / Published by ArXiv / Version released on 2024-12-17 / on (web) Publishing site
- Large Language Model Safety: A Holistic Survey / 2412.17686 / ISBN:https://doi.org/10.48550/arXiv.2412.17686 / Published by ArXiv / Version released on 2024-12-23 / on (web) Publishing site
- INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models / 2501.01973 / ISBN:https://doi.org/10.48550/arXiv.2501.01973 / Published by ArXiv / Version released on 2025-01-09 / on (web) Publishing site
- Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto / 2312.01818 / ISBN:https://doi.org/10.48550/arXiv.2312.01818 / Published by ArXiv / Version released on 2025-01-16 / on (web) Publishing site
- Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline / 2501.18493 / ISBN:https://doi.org/10.48550/arXiv.2501.18493 / Published by ArXiv / Version released on 2025-01-30 / on (web) Publishing site
- Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety / 2502.05206 / ISBN:https://doi.org/10.48550/arXiv.2502.05206 / Published by ArXiv / Version released on 2025-08-02 / on (web) Publishing site
- Prioritization First, Principles Second: An Adaptive Interpretation of Helpful, Honest, and Harmless Principles / 2502.06059 / ISBN:https://doi.org/10.48550/arXiv.2502.06059 / Published by ArXiv / Version released on 2025-12-27 / on (web) Publishing site
- On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective / 2502.14296 / ISBN:https://doi.org/10.48550/arXiv.2502.14296 / Published by ArXiv / Version released on 2025-09-30 / on (web) Publishing site
- A Peek Behind the Curtain: Using Step-Around Prompt Engineering to Identify Bias and Misinformation in GenAI Models / 2503.15205 / ISBN:https://doi.org/10.48550/arXiv.2503.15205 / Published by ArXiv / Version released on 2025-03-19 / on (web) Publishing site
- Towards interactive evaluations for interaction harms in human-AI systems / 2405.10632 / ISBN:https://doi.org/10.48550/arXiv.2405.10632 / Published by ArXiv / Version released on 2025-07-30 / on (web) Publishing site
- Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
/ 2502.05151 / ISBN:https://doi.org/10.48550/arXiv.2502.05151 / Published by ArXiv / Version released on 2025-04-16 / on (web) Publishing site
- Who is Responsible? The Data, Models, Users or Regulations? A Comprehensive Survey on Responsible Generative AI for a Sustainable Future / 2502.08650 / ISBN:https://doi.org/10.48550/arXiv.2502.08650 / Published by ArXiv / Version released on 2025-04-28 / on (web) Publishing site
- Evaluation Framework for AI Systems in the Wild / 2504.16778 / ISBN:https://doi.org/10.48550/arXiv.2504.16778 / Published by ArXiv / Version released on 2025-04-28 / on (web) Publishing site
- Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room / 2504.16148 / ISBN:https://doi.org/10.48550/arXiv.2504.16148 / Published by ArXiv / Version released on 2025-04-22 / on (web) Publishing site
- Auditing the Ethical Logic of Generative AI Models / 2504.17544 / ISBN:https://doi.org/10.48550/arXiv.2504.17544 / Published by ArXiv / Version released on 2025-04-24 / on (web) Publishing site
- AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How / 2504.18044 / ISBN:https://doi.org/10.48550/arXiv.2504.18044 / Published by ArXiv / Version released on 2025-04-25 / on (web) Publishing site
- Generative AI in Financial Institution: A Global Survey of Opportunities, Threats, and Regulation / 2504.21574 / ISBN:https://doi.org/10.48550/arXiv.2504.21574 / Published by ArXiv / Version released on 2025-04-30 / on (web) Publishing site
- LLM Ethics Benchmark: A Three-Dimensional Assessment System for Evaluating Moral Reasoning in Large Language Models / 2505.00853 / ISBN:https://doi.org/10.48550/arXiv.2505.00853 / Published by ArXiv / Version released on 2025-05-01 / on (web) Publishing site
- Emotions in the Loop: A Survey of Affective Computing for Emotional Support / 2505.01542 / ISBN:https://doi.org/10.48550/arXiv.2505.01542 / Published by ArXiv / Version released on 2025-05-02 / on (web) Publishing site
- Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs / 2505.02009 / ISBN:https://doi.org/10.48550/arXiv.2505.02009 / Published by ArXiv / Version released on 2025-08-12 / on (web) Publishing site
- Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach / 2505.09576 / ISBN:https://doi.org/10.48550/arXiv.2505.09576 / Published by ArXiv / Version released on 2025-05-14 / on (web) Publishing site
- Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data / 2505.09974 / ISBN:https://doi.org/10.48550/arXiv.2505.09974 / Published by ArXiv / Version released on 2025-05-15 / on (web) Publishing site
- SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use / 2505.17332 / ISBN:https://doi.org/10.48550/arXiv.2505.17332 / Published by ArXiv / Version released on 2025-05-22 / on (web) Publishing site
- Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods / 2505.17870 / ISBN:https://doi.org/10.48550/arXiv.2505.17870 / Published by ArXiv / Version released on 2025-05-23 / on (web) Publishing site
- Unintentional Consequences: Generative AI Use for Cybercrime / 2505.23733 / ISBN:https://doi.org/10.48550/arXiv.2505.23733 / Published by ArXiv / Version released on 2025-12-03 / on (web) Publishing site
- Exploring Societal Concerns and Perceptions of AI: A Thematic Analysis through the Lens of Problem-Seeking / 2505.23930 / ISBN:https://doi.org/10.48550/arXiv.2505.23930 / Published by ArXiv / Version released on 2025-05-29 / on (web) Publishing site
- Locating Risk: Task Designers and the Challenge of Risk Disclosure in RAI Content Work / 2505.24246 / ISBN:https://doi.org/10.48550/arXiv.2505.24246 / Published by ArXiv / Version released on 2025-09-30 / on (web) Publishing site
- DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models / 2506.01257 / ISBN:https://doi.org/10.48550/arXiv.2506.01257 / Published by ArXiv / Version released on 2025-06-02 / on (web) Publishing site
- Social Scientists on the Role of AI in Research / 2506.11255 / ISBN:https://doi.org/10.48550/arXiv.2506.11255 / Published by ArXiv / Version released on 2025-06-12 / on (web) Publishing site
- JETHICS: Japanese Ethics Understanding Evaluation Dataset
/ 2506.16187 / ISBN:https://doi.org/10.48550/arXiv.2506.16187 / Published by ArXiv / Version released on 2025-06-19 / on (web) Publishing site
- When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance / 2507.07748 / ISBN:https://doi.org/10.48550/arXiv.2507.07748 / Published by ArXiv / Version released on 2025-07-10 / on (web) Publishing site
- Artificial Intelligence Governance for Businesses / 2011.10672 / ISBN:https://doi.org/10.48550/arXiv.2011.10672 / Published by ArXiv / Version released on 2025-07-16 / on (web) Publishing site
- Defining ethically sourced code generation / 2507.19743 / ISBN:https://doi.org/10.48550/arXiv.2507.19743 / Published by ArXiv / Version released on 2025-07-26 / on (web) Publishing site
- EthicAlly: a Prototype for AI-Powered Research Ethics Support for the Social Sciences and Humanities / 2508.00856 / ISBN:https://doi.org/10.48550/arXiv.2508.00856 / Published by ArXiv / Version released on 2025-07-15 / on (web) Publishing site
- Development of management systems using artificial intelligence systems and machine learning methods for boards of directors (preprint, unofficial translation) / 2508.03769 / ISBN:https://doi.org/10.48550/arXiv.2508.03769 / Published by ArXiv / Version released on 2025-08-05 / on (web) Publishing site
- Data and AI governance: Promoting equity, ethics, and fairness in large language models / 2508.03970 / ISBN:https://doi.org/10.48550/arXiv.2508.03970 / Published by ArXiv / Version released on 2025-08-05 / on (web) Publishing site
- Towards Assessing Medical Ethics from Knowledge to Practice / 2508.05132 / ISBN:https://doi.org/10.48550/arXiv.2508.05132 / Published by ArXiv / Version released on 2025-08-07 / on (web) Publishing site
- Ethical Concerns of Generative AI and Mitigation Strategies: A Systematic Mapping Study / 2502.00015 / ISBN:https://doi.org/10.48550/arXiv.2502.00015 / Published by ArXiv / Version released on 2025-08-22 / on (web) Publishing site
- Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance / 2508.08789 / ISBN:https://doi.org/10.48550/arXiv.2508.08789 / Published by ArXiv / Version released on 2025-08-18 / on (web) Publishing site
- Can AI be Auditable? / 2509.00575 / ISBN:https://doi.org/10.48550/arXiv.2509.00575 / Published by ArXiv / Version released on 2025-09-14 / on (web) Publishing site
- The User-first Approach to AI Ethics: Preferences for Ethical Principles in AI Systems across Cultures and Contexts / 2508.11327 / ISBN:https://doi.org/10.48550/arXiv.2508.11327 / Published by ArXiv / Version released on 2025-09-17 / on (web) Publishing site
- The AI-Fraud Diamond: A Novel Lens for Auditing Algorithmic Deception / 2508.13984 / ISBN:https://doi.org/10.48550/arXiv.2508.13984 / Published by ArXiv / Version released on 2025-08-19 / on (web) Publishing site
- Bridging Minds and Machines: Toward an Integration of AI and Cognitive Science / 2508.20674 / ISBN:https://doi.org/10.48550/arXiv.2508.20674 / Published by ArXiv / Version released on 2025-08-28 / on (web) Publishing site
- Beyond Prediction: Reinforcement Learning as the Defining Leap in Healthcare AI / 2508.21101 / ISBN:https://doi.org/10.48550/arXiv.2508.21101 / Published by ArXiv / Version released on 2025-08-28 / on (web) Publishing site
- Leveraging Imperfection with MEDLEY A Multi-Model Approach Harnessing Bias in Medical AI / 2508.21648 / ISBN:https://doi.org/10.48550/arXiv.2508.21648 / Published by ArXiv / Version released on 2025-08-29 / on (web) Publishing site
- Between a Rock and a Hard Place: Exploiting Ethical Reasoning to Jailbreak LLMs / 2509.05367 / ISBN:https://doi.org/10.48550/arXiv.2509.05367 / Published by ArXiv / Version released on 2025-09-12 / on (web) Publishing site
- Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models / 2509.16332 / ISBN:https://doi.org/10.48550/arXiv.2509.16332 / Published by ArXiv / Version released on 2025-09-19 / on (web) Publishing site
- The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs
/ 2506.11094 / ISBN:https://doi.org/10.48550/arXiv.2506.11094 / Published by ArXiv / Version released on 2025-10-30 / on (web) Publishing site
- The Evolving Ethics of Medical Data Stewardship / 2511.15829 / ISBN:https://doi.org/10.48550/arXiv.2511.15829 / Published by ArXiv / Version released on 2025-11-19 / on (web) Publishing site
- Morality in AI. A plea to embed morality in LLM architectures and frameworks / 2511.20689 / ISBN:https://doi.org/10.48550/arXiv.2511.20689 / Published by ArXiv / Version released on 2025-11-21 / on (web) Publishing site
- A Brief History of Digital Twin Technology
/ 2511.20695 / ISBN:https://doi.org/10.48550/arXiv.2511.20695 / Published by ArXiv / Version released on 2025-11-24 / on (web) Publishing site
- Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs / 2511.21757 / ISBN:https://doi.org/10.48550/arXiv.2511.21757 / Published by ArXiv / Version released on 2025-11-24 / on (web) Publishing site
- SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation / 2512.12501 / ISBN:https://doi.org/10.48550/arXiv.2512.12501 / Published by ArXiv / Version released on 2025-12-14 / on (web) Publishing site
- Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Stud / 2512.15791 / ISBN:https://doi.org/10.48550/arXiv.2512.15791 / Published by ArXiv / Version released on 2025-12-16 / on (web) Publishing site
- PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI / 2512.24848 / ISBN:https://doi.org/10.48550/arXiv.2512.24848 / Published by ArXiv / Version released on 2025-12-31 / on (web) Publishing site
_