if you need more than one keyword, modify and separate by underscore _
the list of search keywords can be up to 50 characters long
if you modify the keywords, press enter within the field to confirm the new search key
Tag: defenses
Bibliography items where occurs: 49
- AI Ethics Issues in Real World: Evidence from AI Incident Database / 2206.07635 / ISBN:https://doi.org/10.48550/arXiv.2206.07635 / Published by ArXiv / on (web) Publishing site
- 2 Related Work
- Implementing Responsible AI: Tensions and Trade-Offs Between Ethics Aspects / 2304.08275 / ISBN:https://doi.org/10.48550/arXiv.2304.08275 / Published by ArXiv / on (web) Publishing site
- References
- A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation / 2305.11391 / ISBN:https://doi.org/10.48550/arXiv.2305.11391 / Published by ArXiv / on (web) Publishing site
- Reference
- The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward / 2308.14253 / ISBN:https://doi.org/10.48550/arXiv.2308.14253 / Published by ArXiv / on (web) Publishing site
- 4 Integrating red teaming, blue teaming, and ethics with violet teaming
- Security Considerations in AI-Robotics: A Survey of Current Methods, Challenges, and Opportunities / 2310.08565 / ISBN:https://doi.org/10.48550/arXiv.2310.08565 / Published by ArXiv / on (web) Publishing site
- I. Introduction and Motivation
References - Specific versus General Principles for Constitutional AI / 2310.13798 / ISBN:https://doi.org/10.48550/arXiv.2310.13798 / Published by ArXiv / on (web) Publishing site
- 4 Reinforcement Learning with Good-for-Humanity Preference Models
- She had Cobalt Blue Eyes: Prompt Testing to Create Aligned and Sustainable Language Models / 2310.18333 / ISBN:https://doi.org/10.48550/arXiv.2310.18333 / Published by ArXiv / on (web) Publishing site
- References
- Ethics and Responsible AI Deployment / 2311.14705 / ISBN:https://doi.org/10.48550/arXiv.2311.14705 / Published by ArXiv / on (web) Publishing site
- 11. References
- Survey on AI Ethics: A Socio-technical Perspective / 2311.17228 / ISBN:https://doi.org/10.48550/arXiv.2311.17228 / Published by ArXiv / on (web) Publishing site
- 2 Privacy and data protection
References - Deepfakes, Misinformation, and Disinformation in the Era of Frontier AI, Generative AI, and Large AI Models / 2311.17394 / ISBN:https://doi.org/10.48550/arXiv.2311.17394 / Published by ArXiv / on (web) Publishing site
- IX. Discussion
- Autonomous Threat Hunting: A Future Paradigm for AI-Driven Threat Intelligence / 2401.00286 / ISBN:https://doi.org/10.48550/arXiv.2401.00286 / Published by ArXiv / on (web) Publishing site
- Abstract
1. Introduction
2. Foundations of AI-driven threat intelligence
3. Autonomous threat hunting: conceptual framework
6. Case studies and applications
7. Evaluation metrics and performance benchmarks
8. Future directions and emerging trends
9. Conclusion - Commercial AI, Conflict, and Moral Responsibility: A theoretical analysis and practical approach to the moral responsibilities associated with dual-use AI technology / 2402.01762 / ISBN:https://doi.org/10.48550/arXiv.2402.01762 / Published by ArXiv / on (web) Publishing site
- 4 Recommendations to address threats posed by crossover AI technology
- A Survey on Human-AI Teaming with Large Pre-Trained Models / 2403.04931 / ISBN:https://doi.org/10.48550/arXiv.2403.04931 / Published by ArXiv / on (web) Publishing site
- 4 Safe, Secure and Trustworthy AI
- Responsible Artificial Intelligence: A Structured Literature Review / 2403.06910 / ISBN:https://doi.org/10.48550/arXiv.2403.06910 / Published by ArXiv / on (web) Publishing site
- 3. Analysis
References - Towards a Privacy and Security-Aware Framework for Ethical AI: Guiding the Development and Assessment of AI Systems / 2403.08624 / ISBN:https://doi.org/10.48550/arXiv.2403.08624 / Published by ArXiv / on (web) Publishing site
- 4 Results of the Systematic Literature Review
- Debunking Robot Rights Metaphysically, Ethically, and Legally / 2404.10072 / ISBN:https://doi.org/10.48550/arXiv.2404.10072 / Published by ArXiv / on (web) Publishing site
- 7 The Legal Perspective
- Large Language Model Supply Chain: A Research Agenda / 2404.12736 / ISBN:https://doi.org/10.48550/arXiv.2404.12736 / Published by ArXiv / on (web) Publishing site
- References
- AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research / 2405.01859 / ISBN:https://doi.org/10.48550/arXiv.2405.01859 / Published by ArXiv / on (web) Publishing site
- References
- Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness / 2405.05930 / ISBN:https://doi.org/10.48550/arXiv.2405.05930 / Published by ArXiv / on (web) Publishing site
- VII. Challenges and Future Research Directions
- The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative / 2402.14859 / ISBN:https://doi.org/10.48550/arXiv.2402.14859 / Published by ArXiv / on (web) Publishing site
- References
- A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions / 2405.14487 / ISBN:https://doi.org/10.48550/arXiv.2405.14487 / Published by ArXiv / on (web) Publishing site
- I. Introduction
References - Responsible AI for Earth Observation / 2405.20868 / ISBN:https://doi.org/10.48550/arXiv.2405.20868 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
3 Secure AI in EO: Focusing on Defense Mechanisms, Uncertainty Modeling and Explainability
References - Current state of LLM Risks and AI Guardrails / 2406.12934 / ISBN:https://doi.org/10.48550/arXiv.2406.12934 / Published by ArXiv / on (web) Publishing site
- 3 Strategies in Securing Large Language
models
- A Survey on Privacy Attacks Against Digital Twin Systems in AI-Robotics / 2406.18812 / ISBN:https://doi.org/10.48550/arXiv.2406.18812 / Published by ArXiv / on (web) Publishing site
- IV. DT-INTEGRATED ROBOTICS DESIGN
CONSIDERATIONS AND DISCUSSION
REFERENCES - FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare / 2309.12325 / ISBN:https://doi.org/10.48550/arXiv.2309.12325 / Published by ArXiv / on (web) Publishing site
- REFERENCES:
- Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias / 2407.11360 / ISBN:https://doi.org/10.48550/arXiv.2407.11360 / Published by ArXiv / on (web) Publishing site
- Abstract
1 Introduction
3 Giraffe and Acacia: Reciprocal Adaptations and Shaping
4 Generative AI and Humans: Risks and Mitigation
6 Discussion
References - Honest Computing: Achieving demonstrable data lineage and provenance for driving data and process-sensitive policies / 2407.14390 / ISBN:https://doi.org/10.48550/arXiv.2407.14390 / Published by ArXiv / on (web) Publishing site
- References
- Deepfake Media Forensics: State of the Art and Challenges Ahead / 2408.00388 / ISBN:https://doi.org/10.48550/arXiv.2408.00388 / Published by ArXiv / on (web) Publishing site
- References
- AI-Driven Chatbot for Intrusion Detection in Edge Networks: Enhancing Cybersecurity with Ethical User Consent / 2408.04281 / ISBN:https://doi.org/10.48550/arXiv.2408.04281 / Published by ArXiv / on (web) Publishing site
- II. Related Work
- The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources / 2406.16746 / ISBN:https://doi.org/10.48550/arXiv.2406.16746 / Published by ArXiv / on (web) Publishing site
- 8 Model Evaluation
- Neuro-Symbolic AI for Military Applications / 2408.09224 / ISBN:https://doi.org/10.48550/arXiv.2408.09224 / Published by ArXiv / on (web) Publishing site
- References
- Don't Kill the Baby: The Case for AI in Arbitration / 2408.11608 / ISBN:https://doi.org/10.48550/arXiv.2408.11608 / Published by ArXiv / on (web) Publishing site
- 3. Practical and Strategic Benefits of Using AI in Arbitration
- Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks / 2408.12806 / ISBN:https://doi.org/10.48550/arXiv.2408.12806 / Published by ArXiv / on (web) Publishing site
- Abstract
I. Introduction
IV. Attack Methodology - Catalog of General Ethical Requirements for AI Certification / 2408.12289 / ISBN:https://doi.org/10.48550/arXiv.2408.12289 / Published by ArXiv / on (web) Publishing site
- References
- Responsible AI in Open Ecosystems: Reconciling Innovation with Risk Assessment and Disclosure / 2409.19104 / ISBN:https://doi.org/10.48550/arXiv.2409.19104 / Published by ArXiv / on (web) Publishing site
- References
- Data Defenses Against Large Language Models / 2410.13138 / ISBN:https://doi.org/10.48550/arXiv.2410.13138 / Published by ArXiv / on (web) Publishing site
- Abstract
1 Introduction
2 Ethics of Resisting LLM Inference
3 Threat Model
4 LLM Adversarial Attacks as LLM Inference Data Defenses
5 Experiments
6 Discussion
7 Conclusion and Limitations
8 Ethics Considerations and Compliance with the Open Science Policy
References - Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems / 2410.13334 / ISBN:https://doi.org/10.48550/arXiv.2410.13334 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
2 Background and Related Works
Refefences - A Simulation System Towards Solving Societal-Scale Manipulation / 2410.13915 / ISBN:https://doi.org/10.48550/arXiv.2410.13915 / Published by ArXiv / on (web) Publishing site
- Abstract
1 Introduction
5 Future Work and Discussion
6 Social Impact Statement - Jailbreaking and Mitigation of Vulnerabilities in Large Language Models / 2410.15236 / ISBN:https://doi.org/10.48550/arXiv.2410.15236 / Published by ArXiv / on (web) Publishing site
- Abstract
I. Introduction
II. Background and Concepts
III. Jailbreak Attack Methods and Techniques
IV. Defense Mechanisms Against Jailbreak Attacks
V. Evaluation and Benchmarking
VI. Research Gaps and Future Directions
VII. Conclusion
References - Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements / 2410.17141 / ISBN:https://doi.org/10.48550/arXiv.2410.17141 / Published by ArXiv / on (web) Publishing site
- 7 Potential Risks
- The Cat and Mouse Game: The Ongoing Arms Race Between Diffusion Models and Detection Methods / 2410.18866 / ISBN:https://doi.org/10.48550/arXiv.2410.18866 / Published by ArXiv / on (web) Publishing site
- VIII. Research Gaps and Future Directions
- On Large Language Models in Mission-Critical IT Governance: Are We Ready Yet? / 2412.11698 / ISBN:https://doi.org/10.48550/arXiv.2412.11698 / Published by ArXiv / on (web) Publishing site
- III. Results
- Autonomous Vehicle Security: A Deep Dive into Threat Modeling / 2412.15348 / ISBN:https://doi.org/10.48550/arXiv.2412.15348 / Published by ArXiv / on (web) Publishing site
- Abstract
III. Autonomous Vehicle Cybersecurirty Attacks
VI. Comparative Analysis of Threat Modeling Frameworks for Autonomous Vehicles
IX. Conclusions
References - Large Language Model Safety: A Holistic Survey / 2412.17686 / ISBN:https://doi.org/10.48550/arXiv.2412.17686 / Published by ArXiv / on (web) Publishing site
- 5 Misuse
References - Self-Disclosure to AI: The Paradox of Trust and Vulnerability in Human-Machine Interactions / 2412.20564 / ISBN:https://doi.org/10.48550/arXiv.2412.20564 / Published by ArXiv / on (web) Publishing site
- 3 The Psychology of Confiding and Self-Disclosure
- Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto / 2312.01818 / ISBN:https://doi.org/10.48550/arXiv.2312.01818 / Published by ArXiv / on (web) Publishing site
- References
- Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare / 2501.18632 / ISBN:https://doi.org/10.48550/arXiv. / Published by ArXiv / on (web) Publishing site
- Model Guardrail Enhancemen
References - Safety at Scale: A Comprehensive Survey of Large Model Safety / 2502.05206>Publishing site
- Abstract
1 Introduction
2 Vision Foundation Model Safety
3 Large Language Model Safety
4 Vision-Language Pre-Training Model Safety
5 Vision-Language Model Safety
6 Diffusion Model Safety
7 Agent Safety
8 Open Challenges
9 Conclusion
References - Position: We Need An Adaptive Interpretation of Helpful, Honest, and Harmless Principles / 2502.06059>Publishing site
- References