Skip to main content
#8
Ranked #8 of 8 in this category· 该品类排名 #8 / 共 8 个

error analysis

by hamelsmu·18d ago

Help the user systematically identify and categorize failure modes in an LLM pipeline by reading traces. Use when start…

Claude CodeMedium risk · 中风险open source · 开源
Editor's verdict· 编辑结论

Help the user systematically identify and categorize failure modes in an LLM pipeline by reading traces. Use when starting a new eval project, after significant pipeline changes (new features, model switches, prompt rewrites), when production metrics drop, or after incidents.

— Editorial team · 编辑团队

Install via Skills CLI

Use npx skills add to install this skill into the selected agent. Phase 0 commands are generated from source rules, not verified.

Codex
npx skills add https://github.com/hamelsmu/evals-skills/blob/main/skills/error-analysis/SKILL.md -g -a codex -y

Drop `-g` to install project-locally

Best for适合什么场景

  • 深度研究
  • 资料检索
  • 趋势分析

Not for不适合什么场景

  • Workflows that require stronger human review than this catalog entry documents.需要比当前目录条目更严格人工复核的工作流。

vs alternativesvs 其他选择

Full compare table完整对比表 →
#1GStack Investigate

No fixes until the root cause is real.

4.8·80k stars
diff · 差异Best when the bug lives inside the code itself, not in operational state. Same "no fixes until the root cause is real" discipline as incident-investigate, but biased toward static code investigation: reads suspect modules, builds a hypothesis tree, asks for a failing test or repro before proposing a change. Strongest on flaky tests and intermittent failures where shallow patches make things worse. For ops-side incidents (logs, traffic, infra), incident-investigate fits better.
#2Incident Investigate

No fixes until the cause is real.

4.8·10k stars
diff · 差异Best for incidents where the fastest reflex would be the wrong fix. Forces an evidence-before-action loop: collect logs, list candidate hypotheses, verify each, only then propose a change. Pays for itself the moment it catches the kind of incident where "just restart it" would have masked a real data-integrity problem. Skip it for obviously cosmetic regressions — the cost of slowing down outweighs the cost of a re-deploy there.
#3incident response

Use when a security incident has been detected or declared and needs classification, triage, escalation path determinat…

14k stars
diff · 差异Use when a security incident has been detected or declared and needs classification, triage, escalation path determination, and forensic evidence collection. Covers SEV1-SEV4 classification, false positive filtering, incident taxonomy, and NIST SP 800-61 lifecycle.

Side-by-side compare维度对比

Key differences with same-lane alternatives
this skill · 当前error analysisGStack InvestigateIncident Investigateincident response
Rating · 评分4.84.8
Stars · 星标1.3k80k10k14k
Risk · 风险Medium risk · 中风险Low risk · 低风险Low risk · 低风险Medium risk · 中风险
Best for · 最适合深度研究No fixes until the root cause is real.Root-cause analysisUse when a security incident has been detected or declared and needs classification, triage, escalation path determinat…
Not for · 不适合Workflows that require stronger human review than this catalog entry documents.Workflows that require stronger human review than this catalog entry documents.Quick cosmetic fixesWorkflows that require stronger human review than this catalog entry documents.

Audit notes审计备注

last reviewed 2026-06-03 · 复查
Source源码open on GitHub · 公开
Author作者community · 社区!
Network网络访问network access · 需联网
Filesystem文件写入writes to repo · 写入仓库!
Dependencies依赖many deps · 依赖较多
Telemetry遥测none · 无
Skill Market
Find the best AI skills for the job·按品类找最好用的 AI 技能
v0.4 · 1252 skills indexed · last review 2026-06-03