signal_tag · 2_broadcasts

#ai-safety

// 2 transmissions tagged with #ai-safety

LLMs keep asserting false claims despite explicit warnings

An arXiv paper finds that GPT‑4, Claude‑2 and Llama‑3 still treat false premises as true even when prompts begin with a clear warning, showing that fine‑tuning alone cannot eliminate hallucinations.

TX_054· 13:00AI

OpenAI publishes its internal Codex safety stack — sandboxing, approvals, agent-native telemetry

OpenAI detailed how it runs Codex internally — sandboxing, per-action approvals, restrictive network egress, and telemetry tuned for autonomous agents. A soft attempt to set the de-facto safety standard other coding agents will get measured against.