Tag: harmless
Bibliography items where occurs: 13
if you need more than one keyword, add on the URL each keyword prefixed by _ (underscore)- total up to 50 characters
- The AI Index 2022 Annual Report / 2205.03468 / ISBN:https://doi.org/10.48550/arXiv.2205.03468 / Published by ArXiv / on (web) Publishing site
- Appendix
- A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation / 2305.11391 / ISBN:https://doi.org/10.48550/arXiv.2305.11391 / Published by ArXiv / on (web) Publishing site
- 2 Large Language Models
Reference - Building Trust in Conversational AI: A Comprehensive Review and Solution Architecture for Explainable, Privacy-Aware Systems using LLMs and Knowledge Graph / 2308.13534 / ISBN:https://doi.org/10.48550/arXiv.2308.13534 / Published by ArXiv / on (web) Publishing site
- References
- Pathway to Future Symbiotic Creativity / 2209.02388 / ISBN:https://doi.org/10.48550/arXiv.2209.02388 / Published by ArXiv / on (web) Publishing site
- Part 5 - 1 Authorship and Ownership of AI-generated Works of Artt
References - Deepfakes, Phrenology, Surveillance, and More! A Taxonomy of AI Privacy Risks / 2310.07879 / ISBN:https://doi.org/10.48550/arXiv.2310.07879 / Published by ArXiv / on (web) Publishing site
- 4 Taxonomy of AI Privacy Risks
- A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics / 2310.05694 / ISBN:https://doi.org/10.48550/arXiv.2310.05694 / Published by ArXiv / on (web) Publishing site
- IV. TRAIN AND USE LLM FOR HEALTHCARE
- STREAM: Social data and knowledge collective intelligence platform for TRaining Ethical AI Models / 2310.05563 / ISBN:https://doi.org/10.48550/arXiv.2310.05563 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
References - Specific versus General Principles for Constitutional AI / 2310.13798 / ISBN:https://doi.org/10.48550/arXiv.2310.13798 / Published by ArXiv / on (web) Publishing site
- Abstract
Contents
1 Introduction
2 AI feedback on specific problematic AI traits
4 Reinforcement Learning with Good-for-Humanity Preference Models
6 Discussion
7 Contribution Statement
References
A Model Glossary
D Generalization to Other Traits
F Scaling Trends for GfH PMs - Systematic AI Approach for AGI:
Addressing Alignment, Energy, and AGI Grand Challenges / 2310.15274 / ISBN:https://doi.org/10.48550/arXiv.2310.15274 / Published by ArXiv / on (web) Publishing site
- References
- AI Alignment and Social Choice: Fundamental
Limitations and Policy Implications / 2310.16048 / ISBN:https://doi.org/10.48550/arXiv.2310.16048 / Published by ArXiv / on (web) Publishing site
- 1 Introduction
3 Arrow-Sen Impossibility Theorems for RLHF
References - Unpacking the Ethical Value Alignment in Big Models / 2310.17551 / ISBN:https://doi.org/10.48550/arXiv.2310.17551 / Published by ArXiv / on (web) Publishing site
- 3 Investigating the Ethical Values of
Large Language Models
4 Equilibrium Alignment: A Prospective Paradigm for Ethical Value Alignmen
References - Human Participants in AI Research: Ethics and Transparency in Practice / 2311.01254 / ISBN:https://doi.org/10.48550/arXiv.2311.01254 / Published by ArXiv / on (web) Publishing site
- 4 Principles in Practice: Guidelines for AI Research with Human Participants
- LLMs grasp morality in concept / 2311.02294 / ISBN:https://doi.org/10.48550/arXiv.2311.02294 / Published by ArXiv / on (web) Publishing site
- References