signal_tag · 2_broadcasts
#ai-safety
// 2 transmissions tagged with #ai-safety

TX_041728· AI
LLMs keep asserting false claims despite explicit warnings
An arXiv paper finds that GPT‑4, Claude‑2 and Llama‑3 still treat false premises as true even when prompts begin with a clear warning, showing that fine‑tuning alone cannot eliminate hallucinations.

TX_054· AI
OpenAI publishes its internal Codex safety stack — sandboxing, approvals, agent-native telemetry
OpenAI detailed how it runs Codex internally — sandboxing, per-action approvals, restrictive network egress, and telemetry tuned for autonomous agents. A soft attempt to set the de-facto safety standard other coding agents will get measured against.