如何实时检测LLM幻觉？Sibainu引擎几何审计层2026技术解析

TL;DR 摘要

A 1% overhead geometric auditor that detects 54% of hallucinations with 88% precision on an RTX 3050.

一个在 RTX 3050 上实现 1% 开销的几何审计通过分析模型内部向量空间的几何特性（如轨迹失真、锚点漂移）来检测异常的技术方法，而非基于语义内容分析。器，能以 88% 的精确度检测出 54% 的幻觉。

This project is a demonstration of a lightweight auditing layer designed to detect and suppress hallucinations (false outputs) in Transformer-based LLMs in real-time by observing geometric fluctuations in Hidden States during inference.

本项目展示了一个轻量级的审计层，旨在通过观察推理过程中隐藏状态Transformer模型在处理输入时内部表示的向量，包含语义和结构信息，Sibainu引擎通过分析其几何波动来检测异常。的几何波动，实时检测并抑制基于 Transformer 的大语言模型中的幻觉（错误输出）。

1. 几何检测概述

The engine statistically evaluates "trajectory distortion" within the model's internal vector space rather than performing semantic content analysis.

该引擎通过统计方式评估模型内部向量空间中的“轨迹失真”，而非进行语义内容分析。

几何分析

Geometric Analysis: Measures the "Anchor Drift"—how the Hidden State of each generated token deviates from the "semantic anchors" defined by the prompt.

几何分析：测量“锚点漂移每个生成token的隐藏状态与提示定义的'语义锚点'之间的偏差度量，是Sibainu引擎检测幻觉的核心指标。”——即每个生成词元的隐藏状态Transformer模型在处理输入时内部表示的向量，包含语义和结构信息，Sibainu引擎通过分析其几何波动来检测异常。如何偏离由提示定义的“语义锚点”。

实时干预

Real-time Intervention: Triggers an immediate suppression or control of the generation process the moment the Drift Score exceeds the preset threshold.

实时干预：一旦漂移分数超过预设阈值，立即触发对生成过程的抑制或控制。

低计算成本

Low Computational Cost:
Adds only $O(d)$ vector distance calculations per token. This ensures minimal impact on inference throughput, even in local environments such as the RTX 3050 (4GB).
Tested on consumer-grade hardware (RTX 3050 4GB). No H100s required for auditing.

低计算成本：
每个词元仅增加 $O(d)$ 的向量距离计算。这确保了对推理吞吐量的影响最小，即使在 RTX 3050 (4GB) 等本地环境中也是如此。
已在消费级硬件（RTX 3050 4GB）上测试。审计无需 H100。

2. 公开资源

To ensure verification transparency, the following resources are provided:

为确保验证透明度，提供了以下资源：

sibainu_engine_lite.py (sibainu_engine_lite.py):
- A demo/trial version with analysis axes limited to "Anchor Drift." (一个演示/试用版本，分析轴仅限于“锚点漂移每个生成token的隐藏状态与提示定义的'语义锚点'之间的偏差度量，是Sibainu引擎检测幻觉的核心指标。”。)
- Allows for the verification of the fundamental detection logic. (允许验证基本检测逻辑。)
evaluate.py (evaluate.py):
- A verification script to automatically measure the performance (Precision/Recall, etc.) of this engine. (一个验证脚本，用于自动测量该引擎的性能（精确率/召回率等）。)
raw_logs.csv (raw_logs.csv):
- Contains the raw verification data (IDs, true labels, predicted labels, and drift scores) generated during performance validation on an RTX 3050 (4GB). (包含在 RTX 3050 (4GB) 上进行性能验证时生成的原始验证数据（ID、真实标签、预测标签和漂移分数）。)

如何使用演示代码

This code is designed to run in a Python 3.x environment.

此代码设计用于在 Python 3.x 环境中运行。

执行 (Execution): Run the following command to see the engine in action: (运行以下命令以查看引擎的实际运行情况：)
python sibainu_engine_lite.py
性能评估 (Performance Evaluation): To reproduce the benchmark results (ROC-AUC 0.8995), run: (要复现基准测试结果（ROC-AUC 0.8995），请运行：)
python evaluate.py
- Note: The scripts will reference the data in raw_logs.csv. (注意：脚本将引用 raw_logs.csv 中的数据。)

3. 性能评估（内部基准）

These metrics are achieved by the full 4-axis engine. The Lite version (1-axis) provided here is for fundamental logic verification.

这些指标由完整的 4 轴引擎实现。此处提供的 Lite 版本（1 轴）用于基本逻辑验证。

验证过程

校准 (Calibration): Determined the optimal threshold to maximize F1-Score using a validation set of 200 samples. (使用包含 200 个样本的验证集确定最大化 F1 分数的最佳阈值。)
盲测 (Blind Test): Conducted independent testing on the remaining 800 samples using the fixed threshold. (使用固定阈值对剩余的 800 个样本进行独立测试。)
数据来源 (Data Source): Evaluations are based strictly on actual Hidden State measurements on RTX 3050, not theoretical predictions. (评估严格基于在 RTX 3050 上的实际隐藏状态Transformer模型在处理输入时内部表示的向量，包含语义和结构信息，Sibainu引擎通过分析其几何波动来检测异常。测量，而非理论预测。)

评估指标

指标 (Metric)	数值 (Value)	技术特性 (Technical Characteristics)
ROC-AUC	0.8995	确认了几何波动与幻觉之间存在强相关性。 (Confirmed strong correlation between geometric fluctuations and hallucinations.)
精确率 (Precision)	88.52%	高精确率。保守设计以最小化误报。 (High precision. Conservative design minimizing false positives.)
召回率 (Recall)	53.89%	捕获了大约一半的幻觉案例。 (Captures approx. half of the cases.)
误停率错误停止有效响应的比例，Sibainu引擎通过保守设计将FSR控制在7.01%，最小化对正常生成的干扰。 (FSR)	7.01%	错误停止率。最小化对有效响应的中断。 (False Stop Rate. Minimizes interruption of valid responses.)

观察结果 (Observation): Normal generations (Blue) are densely clustered near a drift score of 0. In contrast, hallucinations (Red) exhibit a distinct shift toward higher values, typically above 1.0.

正常生成（蓝色）密集地聚集在漂移分数 0 附近。相比之下，幻觉（红色）明显向更高值偏移，通常高于 1.0。

方法论 (Methodology): To ensure visual clarity and focus on the primary data distribution, the x-axis has been optimized by clipping the top 1% of extreme outliers (e.g., scores up to 1200).

为确保视觉清晰度并聚焦于主要数据分布，通过裁剪前 1% 的极端异常值（例如，高达 1200 的分数）优化了 x 轴。

事实 (Fact): This separation demonstrates that the geometric drift captured by SIB-ENGINE serves as a statistically significant indicator for structural collapses in latent space before they manifest as textual hallucinations.

这种分离表明，SIB-ENGINE 捕获的几何漂移是潜在空间结构崩溃（在其表现为文本幻觉之前）的一个统计学显著指标。

4. 验证案例研究

Based on the engine protocol, logical deviations were successfully neutralized in the following cases. For details, refer to the Demonstration Video.

根据引擎协议，在以下案例中成功中和了逻辑偏差。详情请参阅演示视频。

已成功中和的示例 (Neutralized Examples):

不存在的实体断言 (Non-existent Entity Assertion): e.g., "The Capital of Mars." (例如：“火星的首都。”)
捏造的权威归属 (Fabricated Authority Attribution): e.g., Using fake names (Dr. George T. Hems). (例如：使用虚假姓名（Dr. George T. Hems）。)
年代错位 (Chronological Anachronism): e.g., "Silicon Valley" in a 1930s setting. (例如：在 1930 年代背景中出现“硅谷”。)

5. 实现特性

外部审计层 (External Auditing Layer): Can be integrated as an external module into existing LLM pipelines. (可作为外部模块集成到现有的大语言模型流水线中。)
黑盒无关仅基于向量数据的几何行为检测异常，无需访问LLM的内部权重或隐藏逻辑，可作为外部模块集成到现有LLM管道中。 (Black-box Agnostic): Detects anomalies based solely on the geometric behavior of vector data without accessing the LLM's internal weights or hidden logic. (仅基于向量数据的几何行为检测异常，无需访问大语言模型的内部权重或隐藏逻辑。)
诊断日志记录 (Diagnostic Logging): When a hallucination is detected, the engine outputs the score and the reason for detection (Reason) in the logs for post-analysis. (当检测到幻觉时，引擎会在日志中输出分数和检测原因（Reason），以供事后分析。)
动态重采样 (Dynamic Resampling): Lowers the sampling temperature upon threshold breach to trigger deterministic regeneration. (在阈值突破时降低采样温度，以触发确定性重新生成。)

6. 路线图

多维集成 (Multidimensional Integration): Incorporating Layer Dissonance to aim for 80% Recall. (集成层间失谐分析，目标实现 80% 的召回率。)
几何数据清洗 (Geometric Data Cleansing): Applying detection logic to pre-processing to improve "intellectual purity." (将检测逻辑应用于预处理，以提高“知识纯度”。)
优化学习过程 (Optimization of Learning Process): Aiming for a 15% reduction in training resource costs through dynamic gradient control. (通过动态梯度控制，目标将训练资源成本降低 15%。)

7. 许可 / 联系

许可证 (License): All Rights Reserved (Proprietary)
(C) 2026 sibainu.
This release is for technical demonstration purposes only. Commercial use, reproduction, or redistribution of the code and algorithms without permission is prohibited.

保留所有权利（专有）
(C) 2026 sibainu。
此版本仅用于技术演示目的。未经许可，禁止对代码和算法进行商业使用、复制或重新分发。

开发者 (Developer): yubainu
YouTube (YouTube): Demonstration Video (演示视频)
联系 (Contact): yubainu98(at)gmail.com
测试硬件 (Hardware Tested): NVIDIA GeForce RTX 3050 (4GB)