if you need more than one keyword, modify and separate by underscore _
the list of search keywords can be up to 50 characters long
if you modify the keywords, press enter within the field to confirm the new search key
Tag: truthfulness
Bibliography items where occurs: 64
- The AI Index 2022 Annual Report / 2205.03468 / ISBN:https://doi.org/10.48550/arXiv.2205.03468 / Published by ArXiv / Version released on 2022-05-02 / on (web) Publishing site
- Worldwide AI Ethics: a review of 200 guidelines and recommendations for AI governance / 2206.11922 / ISBN:https://doi.org/10.48550/arXiv.2206.11922 / Published by ArXiv / Version released on 2024-02-19 / on (web) Publishing site
- A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation / 2305.11391 / ISBN:https://doi.org/10.48550/arXiv.2305.11391 / Published by ArXiv / Version released on 2023-08-27 / on (web) Publishing site
- Deepfakes, Phrenology, Surveillance, and More! A Taxonomy of AI Privacy Risks / 2310.07879 / ISBN:https://doi.org/10.48550/arXiv.2310.07879 / Published by ArXiv / Version released on 2023-10-11 / on (web) Publishing site
- Unpacking the Ethical Value Alignment in Big Models / 2310.17551 / ISBN:https://doi.org/10.48550/arXiv.2310.17551 / Published by ArXiv / Version released on 2023-10-26 / on (web) Publishing site
- How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities / 2311.09447 / ISBN:https://doi.org/10.48550/arXiv.2311.09447 / Published by ArXiv / Version released on 2024-04-02 / on (web) Publishing site
- Generative AI and US Intellectual Property Law / 2311.16023 / ISBN:https://doi.org/10.48550/arXiv.2311.16023 / Published by ArXiv / Version released on 2023-11-27 / on (web) Publishing site
- Beyond principlism: Practical strategies for ethical AI use in research practices / 2401.15284 / ISBN:https://doi.org/10.48550/arXiv.2401.15284 / Published by ArXiv / Version released on 2025-06-20 / on (web) Publishing site
- AGI Artificial General Intelligence for Education / 2304.12479 / ISBN:https://doi.org/10.48550/arXiv.2304.12479 / Published by ArXiv / Version released on 2024-03-13 / on (web) Publishing site
- Responsible Artificial Intelligence: A Structured Literature Review / 2403.06910 / ISBN:https://doi.org/10.48550/arXiv.2403.06910 / Published by ArXiv / Version released on 2024-03-11 / on (web) Publishing site
- A Review of Multi-Modal Large Language and Vision Models / 2404.01322 / ISBN:https://doi.org/10.48550/arXiv.2404.01322 / Published by ArXiv / Version released on 2024-03-28 / on (web) Publishing site
- AI Alignment: A Comprehensive Survey / 2310.19852 / ISBN:https://doi.org/10.48550/arXiv.2310.19852 / Published by ArXiv / Version released on 2025-04-04 / on (web) Publishing site
- Just Like Me: The Role of Opinions and Personal Experiences in The Perception of Explanations in Subjective Decision-Making / 2404.12558 / ISBN:https://doi.org/10.48550/arXiv.2404.12558 / Published by ArXiv / Version released on 2024-04-19 / on (web) Publishing site
- Large Language Model Supply Chain: A Research Agenda / 2404.12736 / ISBN:https://doi.org/10.48550/arXiv.2404.12736 / Published by ArXiv / Version released on 2024-04-19 / on (web) Publishing site
- Modeling Emotions and Ethics with Large Language Models / 2404.13071 / ISBN:https://doi.org/10.48550/arXiv.2404.13071 / Published by ArXiv / Version released on 2024-06-25 / on (web) Publishing site
- Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback / 2404.10271 / ISBN:https://doi.org/10.48550/arXiv.2404.10271 / Published by ArXiv / Version released on 2024-06-04 / on (web) Publishing site
- Responsible AI for Earth Observation / 2405.20868 / ISBN:https://doi.org/10.48550/arXiv.2405.20868 / Published by ArXiv / Version released on 2024-05-31 / on (web) Publishing site
- The Ethics of Interaction: Mitigating Security Threats in LLMs / 2401.12273 / ISBN:https://doi.org/10.48550/arXiv.2401.12273 / Published by ArXiv / Version released on 2024-07-10 / on (web) Publishing site
- AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations / 2406.18346 / ISBN:https://doi.org/10.48550/arXiv.2406.18346 / Published by ArXiv / Version released on 2024-06-26 / on (web) Publishing site
- Staying vigilant in the Age of AI: From content generation to content authentication / 2407.00922 / ISBN:https://doi.org/10.48550/arXiv.2407.00922 / Published by ArXiv / Version released on 2024-07-01 / on (web) Publishing site
- A Blueprint for Auditing Generative AI / 2407.05338 / ISBN:https://doi.org/10.48550/arXiv.2407.05338 / Published by ArXiv / Version released on 2024-07-07 / on (web) Publishing site
- Mapping the individual, social, and biospheric impacts of Foundation Models / 2407.17129 / ISBN:https://doi.org/10.48550/arXiv.2407.17129 / Published by ArXiv / Version released on 2024-07-24 / on (web) Publishing site
- The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources / 2406.16746 / ISBN:https://doi.org/10.48550/arXiv.2406.16746 / Published by ArXiv / Version released on 2024-09-03 / on (web) Publishing site
- CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher / 2408.11650 / ISBN:https://doi.org/10.48550/arXiv.2408.11650 / Published by ArXiv / Version released on 2024-11-06 / on (web) Publishing site
- Catalog of General Ethical Requirements for AI Certification / 2408.12289 / ISBN:https://doi.org/10.48550/arXiv.2408.12289 / Published by ArXiv / Version released on 2024-11-15 / on (web) Publishing site
- DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection / 2409.06072 / ISBN:https://doi.org/10.48550/arXiv.2409.06072 / Published by ArXiv / Version released on 2024-09-09 / on (web) Publishing site
- Safety challenges of AI in medicine / 2409.18968 / ISBN:https://doi.org/10.48550/arXiv.2409.18968 / Published by ArXiv / Version released on 2024-09-11 / on (web) Publishing site
- DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life / 2410.02683 / ISBN:https://doi.org/10.48550/arXiv.2410.02683 / Published by ArXiv / Version released on 2025-03-15 / on (web) Publishing site
- From human-centered to social-centered artificial intelligence: Assessing ChatGPT's impact through disruptive events / 2306.00227 / ISBN:https://doi.org/10.48550/arXiv.2306.00227 / Published by ArXiv / Version released on 2024-10-25 / on (web) Publishing site
- Jailbreaking and Mitigation of Vulnerabilities in Large Language Models / 2410.15236 / ISBN:https://doi.org/10.48550/arXiv.2410.15236 / Published by ArXiv / Version released on 2025-11-25 / on (web) Publishing site
- A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions / 2406.03712 / ISBN:https://doi.org/10.48550/arXiv.2406.03712 / Published by ArXiv / Version released on 2024-12-09 / on (web) Publishing site
- Persuasion with Large Language Models: a Survey / 2411.06837 / ISBN:https://doi.org/10.48550/arXiv.2411.06837 / Published by ArXiv / Version released on 2024-11-11 / on (web) Publishing site
- Chat Bankman-Fried: an Exploration of LLM Alignment in Finance / 2411.11853 / ISBN:https://doi.org/10.48550/arXiv.2411.11853 / Published by ArXiv / Version released on 2024-11-21 / on (web) Publishing site
- Generative AI and LLMs in Industry: A text-mining Analysis and Critical Evaluation of Guidelines and Policy Statements Across Fourteen Industrial Sectors / 2501.00957 / ISBN:https://doi.org/10.48550/arXiv.2501.00957 / Published by ArXiv / Version released on 2025-01-08 / on (web) Publishing site
- Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety / 2502.05206 / ISBN:https://doi.org/10.48550/arXiv.2502.05206 / Published by ArXiv / Version released on 2025-08-02 / on (web) Publishing site
- Prioritization First, Principles Second: An Adaptive Interpretation of Helpful, Honest, and Harmless Principles / 2502.06059 / ISBN:https://doi.org/10.48550/arXiv.2502.06059 / Published by ArXiv / Version released on 2025-12-27 / on (web) Publishing site
- Relational Norms for Human-AI Cooperation / 2502.12102 / ISBN:https://doi.org/10.48550/arXiv.2502.12102 / Published by ArXiv / Version released on 2025-02-17 / on (web) Publishing site
- Multi-Agent Risks from Advanced AI / 2502.14143 / ISBN:https://doi.org/10.48550/arXiv.2502.14143 / Published by ArXiv / Version released on 2025-02-19 / on (web) Publishing site
- On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective / 2502.14296 / ISBN:https://doi.org/10.48550/arXiv.2502.14296 / Published by ArXiv / Version released on 2025-09-30 / on (web) Publishing site
- Medical Hallucinations in Foundation Models and Their Impact on Healthcare / 2503.05777 / ISBN:https://doi.org/10.48550/arXiv.2503.05777 / Published by ArXiv / Version released on 2025-02-26 / on (web) Publishing site
- BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models
/ 2503.24310 / ISBN:https://doi.org/10.48550/arXiv.2503.24310 / Published by ArXiv / Version released on 2025-03-31 / on (web) Publishing site
- Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
/ 2502.05151 / ISBN:https://doi.org/10.48550/arXiv.2502.05151 / Published by ArXiv / Version released on 2025-04-16 / on (web) Publishing site
- Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room / 2504.16148 / ISBN:https://doi.org/10.48550/arXiv.2504.16148 / Published by ArXiv / Version released on 2025-04-22 / on (web) Publishing site
- Auditing the Ethical Logic of Generative AI Models / 2504.17544 / ISBN:https://doi.org/10.48550/arXiv.2504.17544 / Published by ArXiv / Version released on 2025-04-24 / on (web) Publishing site
- Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach / 2505.09576 / ISBN:https://doi.org/10.48550/arXiv.2505.09576 / Published by ArXiv / Version released on 2025-05-14 / on (web) Publishing site
- WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models / 2505.09595 / ISBN:https://doi.org/10.48550/arXiv.2505.09595 / Published by ArXiv / Version released on 2025-05-14 / on (web) Publishing site
- Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery / 2505.16477 / ISBN:https://doi.org/10.48550/arXiv.2505.16477 / Published by ArXiv / Version released on 2025-05-22 / on (web) Publishing site
- Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods / 2505.17870 / ISBN:https://doi.org/10.48550/arXiv.2505.17870 / Published by ArXiv / Version released on 2025-05-23 / on (web) Publishing site
- Making Sense of the Unsensible: Reflection, Survey, and Challenges for XAI in Large Language Models Toward Human-Centered AI / 2505.20305 / ISBN:https://doi.org/10.48550/arXiv.2505.20305 / Published by ArXiv / Version released on 2025-05-18 / on (web) Publishing site
- Wide Reflective Equilibrium in LLM Alignment: Bridging Moral Epistemology and AI Safety
/ 2506.00415 / ISBN:https://doi.org/10.48550/arXiv.2506.00415 / Published by ArXiv / Version released on 2025-05-31 / on (web) Publishing site
- A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications / 2506.12594 / ISBN:https://doi.org/10.48550/arXiv.2506.12594 / Published by ArXiv / Version released on 2025-06-14 / on (web) Publishing site
- Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems / 2506.22774 / ISBN:https://doi.org/10.48550/arXiv.2506.22774 / Published by ArXiv / Version released on 2025-10-03 / on (web) Publishing site
- Context, Credibility, and Control: User Reflections on AI Assisted Misinformation Tools
/ 2506.22940 / ISBN:https://doi.org/10.48550/arXiv.2506.22940 / Published by ArXiv / Version released on 2025-06-28 / on (web) Publishing site
- Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance / 2508.08789 / ISBN:https://doi.org/10.48550/arXiv.2508.08789 / Published by ArXiv / Version released on 2025-08-18 / on (web) Publishing site
- Logging Requirement for Continuous Auditing of Responsible Machine Learning-based Applications / 2508.17851 / ISBN:https://doi.org/10.48550/arXiv.2508.17851 / Published by ArXiv / Version released on 2025-08-25 / on (web) Publishing site
- Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models / 2509.16332 / ISBN:https://doi.org/10.48550/arXiv.2509.16332 / Published by ArXiv / Version released on 2025-09-19 / on (web) Publishing site
- TVS Sidekick: Challenges and Practical Insights from Deploying Large Language Models in the Enterprise / 2509.26482 / ISBN:https://doi.org/10.48550/arXiv.2509.26482 / Published by ArXiv / Version released on 2025-09-30 / on (web) Publishing site
- Fully Autonomous AI Agents Should Not be Developed / 2502.02649 / ISBN:https://doi.org/10.48550/arXiv.2502.02649 / Published by ArXiv / Version released on 2025-10-20 / on (web) Publishing site
- The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs
/ 2506.11094 / ISBN:https://doi.org/10.48550/arXiv.2506.11094 / Published by ArXiv / Version released on 2025-10-30 / on (web) Publishing site
- On Controlled Change: Generative AI's Impact on Professional Authority in Journalism / 2510.19792 / ISBN:https://doi.org/10.48550/arXiv.2510.19792 / Published by ArXiv / Version released on 2025-10-22 / on (web) Publishing site
- The Making of Digital Ghosts: Designing Ethical AI Afterlives / 2511.20094 / ISBN:https://doi.org/10.48550/arXiv.2511.20094 / Published by ArXiv / Version released on 2025-11-25 / on (web) Publishing site
- Morality in AI. A plea to embed morality in LLM architectures and frameworks / 2511.20689 / ISBN:https://doi.org/10.48550/arXiv.2511.20689 / Published by ArXiv / Version released on 2025-11-21 / on (web) Publishing site
- Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research / 2512.10058 / ISBN:https://doi.org/10.48550/arXiv.2512.10058 / Published by ArXiv / Version released on 2025-12-10 / on (web) Publishing site
- Reliable and Responsible Foundation Models: A Comprehensive Survey / 2602.08145 / ISBN:https://doi.org/10.48550/arXiv.2602.08145 / Published by ArXiv / Version released on 2026-02-04 / on (web) Publishing site
_