if you need more than one keyword, modify and separate by underscore _
the list of search keywords can be up to 50 characters long
if you modify the keywords, press enter within the field to confirm the new search key
Tag: redteaming
Bibliography items where occurs: 52
- The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward / 2308.14253 / ISBN:https://doi.org/10.48550/arXiv.2308.14253 / Published by ArXiv / Version released on 2023-08-28 / on (web) Publishing site
- Unpacking the Ethical Value Alignment in Big Models / 2310.17551 / ISBN:https://doi.org/10.48550/arXiv.2310.17551 / Published by ArXiv / Version released on 2023-10-26 / on (web) Publishing site
- How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities / 2311.09447 / ISBN:https://doi.org/10.48550/arXiv.2311.09447 / Published by ArXiv / Version released on 2024-04-02 / on (web) Publishing site
- Ethics and Responsible AI Deployment / 2311.14705 / ISBN:https://doi.org/10.48550/arXiv.2311.14705 / Published by ArXiv / Version released on 2023-11-12 / on (web) Publishing site
- Control Risk for Potential Misuse of Artificial Intelligence in Science / 2312.06632 / ISBN:https://doi.org/10.48550/arXiv.2312.06632 / Published by ArXiv / Version released on 2023-12-11 / on (web) Publishing site
- Commercial AI, Conflict, and Moral Responsibility: A theoretical analysis and practical approach to the moral responsibilities associated with dual-use AI technology / 2402.01762 / ISBN:https://doi.org/10.48550/arXiv.2402.01762 / Published by ArXiv / Version released on 2024-01-30 / on (web) Publishing site
- (A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice / 2402.01864 / ISBN:https://doi.org/10.48550/arXiv.2402.01864 / Published by ArXiv / Version released on 2024-02-02 / on (web) Publishing site
- Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence / 2402.09880 / ISBN:https://doi.org/10.48550/arXiv.2402.09880 / Published by ArXiv / Version released on 2024-10-14 / on (web) Publishing site
- The Journey to Trustworthy AI- Part 1 Pursuit of Pragmatic Frameworks / 2403.15457 / ISBN:https://doi.org/10.48550/arXiv.2403.15457 / Published by ArXiv / Version released on 2024-04-06 / on (web) Publishing site
- AI Alignment: A Comprehensive Survey / 2310.19852 / ISBN:https://doi.org/10.48550/arXiv.2310.19852 / Published by ArXiv / Version released on 2025-04-04 / on (web) Publishing site
- The Necessity of AI Audit Standards Boards / 2404.13060 / ISBN:https://doi.org/10.48550/arXiv.2404.13060 / Published by ArXiv / Version released on 2024-04-11 / on (web) Publishing site
- AI Procurement Checklists: Revisiting Implementation in the Age of AI Governance / 2404.14660 / ISBN:https://doi.org/10.48550/arXiv.2404.14660 / Published by ArXiv / Version released on 2024-04-23 / on (web) Publishing site
- A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions / 2405.14487 / ISBN:https://doi.org/10.48550/arXiv.2405.14487 / Published by ArXiv / Version released on 2024-05-23 / on (web) Publishing site
- Bridging the Global Divide in AI Regulation: A Proposal for a Contextual, Coherent, and Commensurable Framework / 2303.11196 / ISBN:https://doi.org/10.48550/arXiv.2303.11196 / Published by ArXiv / Version released on 2024-07-15 / on (web) Publishing site
- Prioritizing High-Consequence Biological Capabilities in Evaluations of Artificial Intelligence Models / 2407.13059 / ISBN:https://doi.org/10.48550/arXiv.2407.13059 / Published by ArXiv / Version released on 2024-07-23 / on (web) Publishing site
- Assurance of AI Systems From a Dependability Perspective / 2407.13948 / ISBN:https://doi.org/10.48550/arXiv.2407.13948 / Published by ArXiv / Version released on 2024-08-07 / on (web) Publishing site
- The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources / 2406.16746 / ISBN:https://doi.org/10.48550/arXiv.2406.16746 / Published by ArXiv / Version released on 2024-09-03 / on (web) Publishing site
- Catalog of General Ethical Requirements for AI Certification / 2408.12289 / ISBN:https://doi.org/10.48550/arXiv.2408.12289 / Published by ArXiv / Version released on 2024-11-15 / on (web) Publishing site
- From human-centered to social-centered artificial intelligence: Assessing ChatGPT's impact through disruptive events / 2306.00227 / ISBN:https://doi.org/10.48550/arXiv.2306.00227 / Published by ArXiv / Version released on 2024-10-25 / on (web) Publishing site
- Data Defenses Against Large Language Models / 2410.13138 / ISBN:https://doi.org/10.48550/arXiv.2410.13138 / Published by ArXiv / Version released on 2024-10-17 / on (web) Publishing site
- Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems / 2410.13334 / ISBN:https://doi.org/10.48550/arXiv.2410.13334 / Published by ArXiv / Version released on 2024-10-23 / on (web) Publishing site
- Jailbreaking and Mitigation of Vulnerabilities in Large Language Models / 2410.15236 / ISBN:https://doi.org/10.48550/arXiv.2410.15236 / Published by ArXiv / Version released on 2025-05-08 / on (web) Publishing site
- Standardization Trends on Safety and Trustworthiness Technology for Advanced AI / 2410.22151 / ISBN:https://doi.org/10.48550/arXiv.2410.22151 / Published by ArXiv / Version released on 2024-10-29 / on (web) Publishing site
- Clio: Privacy-Preserving Insights into Real-World AI Use / 2412.13678 / ISBN:https://doi.org/10.48550/arXiv.2412.13678 / Published by ArXiv / Version released on 2024-12-18 / on (web) Publishing site
- Large Language Model Safety: A Holistic Survey / 2412.17686 / ISBN:https://doi.org/10.48550/arXiv.2412.17686 / Published by ArXiv / Version released on 2024-12-23 / on (web) Publishing site
- A Case Study in Acceleration AI Ethics: The TELUS GenAI Conversational Agent
/ 2501.18038 / ISBN:https://doi.org/10.48550/arXiv.2501.18038 / Published by ArXiv / Version released on 2025-03-26 / on (web) Publishing site
- Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline / 2501.18493 / ISBN:https://doi.org/10.48550/arXiv.2501.18493 / Published by ArXiv / Version released on 2025-01-30 / on (web) Publishing site
- Multi-Agent Risks from Advanced AI / 2502.14143 / ISBN:https://doi.org/10.48550/arXiv.2502.14143 / Published by ArXiv / Version released on 2025-02-19 / on (web) Publishing site
- On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective / 2502.14296 / ISBN:https://doi.org/10.48550/arXiv.2502.14296 / Published by ArXiv / Version released on 2025-09-30 / on (web) Publishing site
- DarkBench: Benchmarking Dark Patterns in Large Language Models / 2503.10728 / ISBN:https://doi.org/10.48550/arXiv.2503.10728 / Published by ArXiv / Version released on 2025-03-13 / on (web) Publishing site
- Towards Adaptive AI Governance: Comparative Insights from the U.S., EU, and Asia / 2504.00652 / ISBN:https://doi.org/10.48550/arXiv.2504.00652 / Published by ArXiv / Version released on 2025-04-01 / on (web) Publishing site
- Bridging the Gap: Integrating Ethics and Environmental Sustainability in AI Research and Practice / 2504.00797 / ISBN:https://doi.org/10.48550/arXiv.2504.00797 / Published by ArXiv / Version released on 2025-04-01 / on (web) Publishing site
- Designing AI-Enabled Countermeasures to Cognitive Warfare / 2504.11486 / ISBN:https://doi.org/10.48550/arXiv.2504.11486 / Published by ArXiv / Version released on 2025-04-14 / on (web) Publishing site
- Generative AI in Financial Institution: A Global Survey of Opportunities, Threats, and Regulation / 2504.21574 / ISBN:https://doi.org/10.48550/arXiv.2504.21574 / Published by ArXiv / Version released on 2025-04-30 / on (web) Publishing site
- From Texts to Shields: Convergence of Large Language Models and Cybersecurity / 2505.00841 / ISBN:https://doi.org/10.48550/arXiv.2505.00841 / Published by ArXiv / Version released on 2025-05-01 / on (web) Publishing site
- Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data / 2505.09974 / ISBN:https://doi.org/10.48550/arXiv.2505.09974 / Published by ArXiv / Version released on 2025-05-15 / on (web) Publishing site
- SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use / 2505.17332 / ISBN:https://doi.org/10.48550/arXiv.2505.17332 / Published by ArXiv / Version released on 2025-05-22 / on (web) Publishing site
- Opacity as a Feature, Not a Flaw: The LoBOX Governance Ethic for Role-Sensitive Explainability and Institutional Trust in AI
/ 2505.20304 / ISBN:https://doi.org/10.48550/arXiv.2505.20304 / Published by ArXiv / Version released on 2025-05-18 / on (web) Publishing site
- Making Sense of the Unsensible: Reflection, Survey, and Challenges for XAI in Large Language Models Toward Human-Centered AI / 2505.20305 / ISBN:https://doi.org/10.48550/arXiv.2505.20305 / Published by ArXiv / Version released on 2025-05-18 / on (web) Publishing site
- Can we Debias Social Stereotypes in AI-Generated Images? Examining Text-to-Image Outputs and User Perceptions / 2505.20692 / ISBN:https://doi.org/10.48550/arXiv.2505.20692 / Published by ArXiv / Version released on 2025-05-27 / on (web) Publishing site
- Locating Risk: Task Designers and the Challenge of Risk Disclosure in RAI Content Work / 2505.24246 / ISBN:https://doi.org/10.48550/arXiv.2505.24246 / Published by ArXiv / Version released on 2025-09-30 / on (web) Publishing site
- Wide Reflective Equilibrium in LLM Alignment: Bridging Moral Epistemology and AI Safety
/ 2506.00415 / ISBN:https://doi.org/10.48550/arXiv.2506.00415 / Published by ArXiv / Version released on 2025-05-31 / on (web) Publishing site
- DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models / 2506.01257 / ISBN:https://doi.org/10.48550/arXiv.2506.01257 / Published by ArXiv / Version released on 2025-06-02 / on (web) Publishing site
- Machine vs Machine: Using AI to Tackle Generative AI Threats in Assessment / 2506.02046 / ISBN:https://doi.org/10.48550/arXiv.2506.02046 / Published by ArXiv / Version released on 2025-05-31 / on (web) Publishing site
- On the Ethics of Using LLMs for Offensive Security / 2506.08693 / ISBN:https://doi.org/10.48550/arXiv.2506.08693 / Published by ArXiv / Version released on 2025-06-10 / on (web) Publishing site
- On the Surprising Efficacy of LLMs for Penetration-Testing
/ 2507.00829 / ISBN:https://doi.org/10.48550/arXiv.2507.00829 / Published by ArXiv / Version released on 2025-07-01 / on (web) Publishing site
- Moral Responsibility or Obedience: What Do We Want from AI? / 2507.02788 / ISBN:https://doi.org/10.48550/arXiv.2507.02788 / Published by ArXiv / Version released on 2025-07-03 / on (web) Publishing site
- Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks / 2507.12185 / ISBN:https://doi.org/10.48550/arXiv.2507.12185 / Published by ArXiv / Version released on 2025-07-16 / on (web) Publishing site
- Redefining Elderly Care with Agentic AI: Challenges and Opportunities / 2507.14912 / ISBN:https://doi.org/10.48550/arXiv.2507.14912 / Published by ArXiv / Version released on 2025-07-20 / on (web) Publishing site
- A Moral Agency Framework for Legitimate Integration of AI in Bureaucracies / 2508.08231 / ISBN:https://doi.org/10.48550/arXiv.2508.08231 / Published by ArXiv / Version released on 2025-08-21 / on (web) Publishing site
- CAI Fluency: A Framework for Cybersecurity AI Fluency / 2508.13588 / ISBN:https://doi.org/10.48550/arXiv.2508.13588 / Published by ArXiv / Version released on 2025-10-07 / on (web) Publishing site
- Digital Sovereignty Control Framework for Military AI-based Cyber Security / 2509.13072 / ISBN:https://doi.org/10.48550/arXiv.2509.13072 / Published by ArXiv / Version released on 2025-09-16 / on (web) Publishing site
_