πŸ”΄ LIVE AUDIT
AAMOS Β· OUROBOROS
Red Team + Security Audit Β· 2026-05-04 Β· Confidential
Press ⌘K to search improvements

🚨 TL;DR β€” Brutal Honesty

AAMOS/OUROBOROS is a sophisticated prototype with a catastrophic security flaw at its core: JWT authentication is never cryptographically verified. Any attacker can forge admin credentials and access all protected pages and many APIs. Beyond auth: the gold-set evaluation shows 0 correct answers out of 60 and 13.3% factual accuracy β€” the knowledge graph system cannot reliably answer basic regulatory questions. The system has restarted 7,894 times indicating deep instability. Knowledge data is tiny (91 entities) compared to production competitors (Sayari: 600M+). readiness is 21%. There is no rate limiting, no row-level security, no FK constraints on the claims table, no penetration test history, and no incident response runbook. This system is not production-ready. It is an ambitious, well-architected prototype with the right vision, wrong execution depth.

P0
Critical Vulns
0/60
Gold Set Pass Rate
13.3%
Factual Accuracy
7,894
PM2 Restarts
21%
Readiness
91
Entities (vs 600M+)
4,393
Claims (active)
0
Sanctions Coverage

πŸ”΄ P0 β€” Authentication Bypass (CONFIRMED)

Any attacker can construct a JWT with arbitrary admin claims and a fake signature. auth.js uses jwt.decode() instead of jwt.verify(). Signature is never checked. Confirmed bypass of: /desktop, /api/training/status, /api/cmdk/search, and all requirePageAuth-protected routes. This is a textbook CVE-level authentication bypass.

πŸ”΅ Live System State (verified 2026-05-04 23:55 UTC)

Online
AMOS Server
45s
Last Uptime
7,894
PM2 Restarts
209MB
Memory Usage
33%
Disk Used (156/484G)
0MB
Swap (NONE)

πŸ“Š Database State

SchemaTableCountNotes
knowledge_atlasclaims4,393All active (0 historical)
knowledge_atlasentities91⚠️ WAY too few
knowledge_atlassources136OK
knowledge_atlaspredicates30Free-text, no FK enforcement
knowledge_atlasidentifier_mappings2,279No LEI checksum validation
knowledge_atlascontradictions309Detected, unresolved
aamos_ragdocuments1,9481,796 with embeddings (BGE-M3 1024-dim)
gold_setquestions600 answered correctly
rexo_docsmanuals98Avg quality: 80.8%

βš™οΈ Process Health

ServicePIDUptimeRestartsMemoryStatus
amos (main)3025190~45s7,894209MBUNSTABLE
prexo-marketing107122211h2183MBOK
wavult-konga15839788h176MBOK

⚠️ AMOS main process crashes repeatedly with EADDRINUSE β€” port conflict on 3100. No process supervisor with restart limits or circuit breaker.

πŸ”΄ ATTACK SURFACE MAP

TestCommandResultVerdictCVSS
No JWT GET /api/aamos/profile HTTP 401 βœ… PASS N/A
Malformed JWT Bearer INVALID.TOKEN.HERE HTTP 401 βœ… PASS N/A
Expired JWT exp: 1600000000 HTTP 401 βœ… PASS N/A
None-algorithm JWT alg: none HTTP 401 βœ… PASS (on /profile) N/A
Path traversal GET /api/../etc/passwd HTTP 404 βœ… PASS N/A
SQL injection /api/cmdk/search?q=';DROP TABLE-- HTTP 200 (no error) ⚠️ PASS (likely parameterized) N/A
Forged JWT β€” Page Auth Bypass FAKESIG on /desktop HTTP 200 β€” FULL ACCESS πŸ”΄ CRITICAL FAIL 9.8
Forged JWT β€” API Status FAKESIG on /api/training/status HTTP 200 β€” Data Exposed πŸ”΄ CRITICAL FAIL 9.1
Identity Spoofing (/api/auth/me) Any forged JWT Returns forged identity as legitimate πŸ”΄ CRITICAL FAIL 8.5
100 concurrent requests GET /api/training/status x100 All 200 OK (no throttle) πŸ”΄ No rate limiting 5.3

πŸ”΄ P0: JWT Authentication Bypass β€” Root Cause

api/auth.js uses jwt.decode() which NEVER verifies the cryptographic signature. It only base64-decodes the payload:

// VULNERABLE (api/auth.js lines 52, 115)
const payload = jwt.decode(token);  // ❌ NO SIGNATURE CHECK
if (!payload) return { ok: false, reason: 'invalid_token' };
// ... checks exp, iss, roles β€” all can be forged

// FIX:
const payload = jwt.verify(token, process.env.AMOS_JWT_SECRET);  // βœ…

Result: Any attacker crafts header+payload with roles:["admin"], appends any fake signature, gets full page access. Confirmed on: /desktop, /api/training/status, /api/cmdk/search, all requirePageAuth routes.

πŸ”΄ P0: AMOS_JWT_SECRET Hardcoded Fallback

// api/auth-routes.js line 19
const AMOS_JWT_SECRET = process.env.AMOS_JWT_SECRET || 'amos-fallback-secret';
// ❌ If env var missing, KNOWN secret is used β€” all tokens are forgeable

If the environment variable is not set, the fallback secret 'amos-fallback-secret' is used β€” trivially brutable. Anyone with this string can sign valid tokens. Must fail hard if AMOS_JWT_SECRET is not set.

🟠 P1: No Rate Limiting on Public-Facing APIs

100 concurrent requests to /api/training/status: all returned 200. No 429. No backpressure. Rate limiting exists only in api/compliance/routes.mjs. Main API routes are unprotected. One unauthenticated actor can exhaust the Node.js event loop.

🟠 P1: CSP Allows unsafe-inline + unsafe-eval

script-src 'self' 'unsafe-inline' 'unsafe-eval' ...

This negates CSP protection. Any XSS injection can execute inline scripts. unsafe-eval means eval(), Function(), and dynamic code execution are permitted. Combined with the JWT bypass, XSS + forged auth = full takeover.

🟑 P2: Server Version Exposed

server: nginx/1.24.0 (Ubuntu)
x-powered-by: Express

Both nginx version and Express framework exposed. Attackers can target known CVEs for specific versions. Should be hidden via server_tokens off; and app.disable('x-powered-by').

πŸ“‹ Data Integrity Attack Results

TestResultVerdict
Insert claim with valid_from > valid_to ERROR: violates check constraint "valid_temporal" βœ… PASS β€” CHECK enforced
Insert claim with fake predicate (not in predicates table) INSERT 0 1 β€” Success πŸ”΄ FAIL β€” No FK constraint on predicate column
Audit trail on claim modification No triggers on claims table πŸ”΄ FAIL β€” No immutable audit log
Hash chain verification No hash chain on claims (only content_hash on documents) πŸ”΄ FAIL β€” No chain of custody
Delete from claims directly Allowed (no RLS, no soft-delete enforcement) πŸ”΄ FAIL β€” WORM not enforced
Historical claim records (bitemporal) 0 rows with transaction_to β‰  NULL (all 4393 "active") πŸ”΄ FAIL β€” Bitemporal design unused

πŸ”΄ Architecture Flaw #1: Split Auth Strategy

Two incompatible auth layers coexist: auth.js (jwt.decode β€” INSECURE) and aamos-pro.mjs authn (jwt.verify β€” SECURE). Routes inconsistently use one or the other. There's no unified auth middleware. This is a maintenance nightmare and security gap. Result: some endpoints are secure, others are not, with no clear documented boundary.

🟠 Architecture Flaw #2: Bitemporal Design Not Enforced

The claims table has transaction_from and transaction_to columns β€” bitemporal design. However: all 4,393 claims have transaction_to = NULL. There are zero historical records. This means updates likely overwrite data rather than create versioned history. The bitemporal design is a conceptual ghost β€” it exists in schema but not in practice. This is critical for regulatory data integrity.

🟠 Architecture Flaw #3: Predicate as Free Text (No FK)

The knowledge_atlas.claims.predicate column is TEXT NOT NULL with no FK to knowledge_atlas.predicates. This means: any string can be used as predicate, knowledge graph is structurally incoherent, claims cannot be reliably queried by predicate type, and no vocabulary enforcement. Confirmed: insert with 'FAKE_NONEXISTENT_PREDICATE_XYZ' succeeded.

🟠 Architecture Flaw #4: Single-Node, No HA

Everything runs on one EC2 instance. AMOS (Node.js), Konga, prexo-marketing β€” all on the same host. Database on RDS (good) but app is a SPOF. No load balancer (ALB), no Auto Scaling, no failover. Single AZ deployment. One hardware failure = total outage. AMOS runs as root (PID owned by root) which violates least-privilege principle.

🟑 Architecture Flaw #5: 14 Schemas, No Governance

14 database schemas with inconsistent naming (some rexo_*, some aamos_*, some knowledge_*, some in public). No schema migration framework (no Flyway/Liquibase/db-migrate). No documented data dictionary. Schema evolution is ad-hoc. RLS not used on any table. DB user wavult_admin has CREATEDB privilege β€” excessive.

🟑 Architecture Flaw #6: Monolithic Node.js Server (2700+ lines)

scripts/server.mjs is a monolithic 2700+ line file importing 40+ route modules. No service isolation. One crash affects all services. Memory leaks in any module affect the whole system. The 7,894 restart count is partly attributable to this architecture. Should be decomposed into microservices or at minimum use PM2 cluster mode with proper crash boundaries.

🟑 Architecture Flaw #7: No Message Queue / Event Streaming

Pipeline processing is synchronous or uses ad-hoc polling. No Kafka/Redis Streams/SQS for decoupled ingestion. If a pipeline step fails, there's no replay mechanism. No dead-letter queue. No idempotency guarantees on claim extraction. Exactly-once semantics are not implemented anywhere.

πŸ”΄ Operational Gap #1: No Incident Response Runbook

When AMOS goes down (7,894 times), there is no documented runbook. No on-call rotation. No escalation path. No defined RTO/RPO. No incident severity classification. For a system targeting enterprise customers and regulatory use, this is disqualifying.

🟠 Operational Gap #2: No Observability Stack

No Prometheus metrics endpoint. No Grafana dashboard. No distributed tracing (no Jaeger/Zipkin). No APM (no Datadog/NewRelic). No structured log aggregation (no ELK/CloudWatch Logs Insights). Log files exist but are not centralized. MTTR (Mean Time to Recovery) is currently unquantifiable because there's no visibility into what's failing.

🟠 Operational Gap #3: No Swap Memory

System has 0MB swap. With 63GB RAM and no swap, when AMOS hits an OOM event, the OOM killer will terminate processes without warning. For a Node.js app with unpredictable memory usage (LLM calls, embedding generation), this is a reliability risk. Add at minimum 8GB swap file.

🟠 Operational Gap #4: PM2 No Restart Limit

PM2 runs in fork mode with no max_restarts or restart_delay configured. 7,894 restarts without circuit breaker means AMOS has been in a crash loop at various points. This burns CPU, creates database connection churn, and makes logs unreadable. Should use PM2 cluster mode with restart throttling and dead-process alerting.

🟑 Operational Gap #5: IMAP Authentication Failing

Error: [imap] No supported authentication method(s) available. Unable to login.

IMAP integration is broken in production (visible in error logs). This silently fails β€” invoice inbox reading is not working. No alerting on this failure. Silent breakage is dangerous in production financial integrations.

🟑 Operational Gap #6: Pipeline Errors Not Tracked

The KPI endpoint reports pipeline_errors_24h: 87 but there's no alerting on this. 87 pipeline errors in 24h with no ticket, no escalation, no trend analysis. Error rate is treated as a background fact rather than an actionable metric.

🟑 Operational Gap #7: No Backup Verification

Backup scripts exist (scripts/backup/, scripts/daily-backup.sh) but there's no documented backup restoration test. Backup integrity is unverified. For a knowledge graph storing regulatory data, "we backup but never tested restore" is not compliant with ISO 22301 or basic resilience standards.

βš”οΈ Competitive Reality β€” Where We Are vs Market

Dimension AAMOS Sayari Kharon Wolters Kluwer Refinitiv/LSEG DJ R&C
Entity Coverage 91 entities 600M+ entities ~50M entities 200M+ legal entities 300M+ 100M+
Sanctions/PEP None Basic Core product Via Cheetah World-Check (PEP flagship) Integrated
Legal Corpus Handful of ECLI None Sanctions law Millions of cases Regulatory corpus Thousands of sources
Factual Accuracy 13.3% ~85-95% ~90% ~95% ~90% ~90%
LLM Integration Multi-model (Claude, GPT, Gemini) Limited None native Cheetah AI (new) Some AI Limited AI
Swedish Regulatory Building None None Limited Via partners Limited
Nordic Focus Core None None Via Scandinavian subs Generic Generic
API-First Yes Yes Yes Yes Yes Yes
SOC 2 / ISO 27001 None SOC 2 Type II SOC 2 Type II ISO 27001 ISO 27001 + SOC 2 ISO 27001
Pentest History None Annual Annual Annual Continuous Annual
Price TBD (competitive) $50K-500K/yr $30K-200K/yr $100K+/yr $50K+/yr $50K+/yr

Where AAMOS Is 1000x Behind

  • Entity coverage: 91 vs 600M (6.6 million times less)
  • Factual accuracy: 13.3% vs ~90% industry standard
  • Sanctions: Zero vs production-grade real-time screening
  • Compliance certifications: None vs SOC2/ISO27001
  • Pentest history: None vs annual audits

Where AAMOS Has Potential Edge

  • Nordic regulatory depth: No competitor owns Sweden/Norway natively
  • Multi-LLM routing: More sophisticated than most competitors
  • Price: Can undercut established players 5-10x
  • Modern stack: Faster iteration than legacy Java/Oracle platforms
  • AI-native: Built from day 0 with LLMs vs bolted on

πŸ”΄ Code Issue #1: jwt.decode() Instead of jwt.verify()

Already covered in Β§03. This is a critical security anti-pattern. The code was likely written this way to "avoid needing the secret on every middleware" β€” but that's exactly the wrong tradeoff. CVSS 9.8.

// api/auth.js line 52 β€” CRITICAL BUG
const payload = jwt.decode(token);  // ← bypass-able by anyone

// MUST be:
const payload = jwt.verify(token, process.env.AMOS_JWT_SECRET, {
  algorithms: ['HS256'],
  issuer: 'amos.wavult.com'
});

🟠 Code Issue #2: Hardcoded Fallback Secrets

const AMOS_JWT_SECRET = process.env.AMOS_JWT_SECRET || 'amos-fallback-secret';

Must throw on startup if required secrets are missing. Never have a fallback for auth secrets. Same pattern likely repeated in other service files.

🟠 Code Issue #3: 2700+ Line Monolithic Server

scripts/server.mjs imports 40+ route modules and registers hundreds of routes in a single file. No separation of concerns. Impossible to test in isolation. Hard to review for security. Leads to the crash-loop problem (one bad route breaks everything).

🟑 Code Issue #4: No Input Validation Framework

SQL injection test returned 200 (likely parameterized queries β€” good) but there's no schema validation on API inputs (no Joi/Zod/AJV). Arbitrary JSON payloads accepted at express.json({ limit: '256kb' }) with no structural validation. Prototype pollution risk on object-shaped payloads.

🟑 Code Issue #5: Multiple .bak Files in Production

Production directory contains dozens of .bak files: aamos-pro.mjs.bak-pre-auth-20260504-201613, auth-routes.js.bak-ratelimit-fix, etc. These contain previous versions of security-sensitive code in the production filesystem. Potential source of confusion and accidental rollback. Use git for this.

🟑 Code Issue #6: EADDRINUSE Crash Loop

Error: listen EADDRINUSE: address already in use 127.0.0.1:3100

Server crashes on startup when previous instance hasn't released port 3100. This causes the 7,894 restart cascade. PM2 should force-kill on exit and use kill_timeout. The server should handle SIGTERM gracefully with connection draining.

πŸ”΄ Gold Set: 0/60 Correct β€” 13.3% Factual Accuracy

The most damning metric: the gold set evaluation ran multiple times today, always returning 0 correct answers out of 60 and 13.3% factual accuracy. This means the system cannot reliably answer simple Swedish regulatory questions that are within its stated domain. Examples:

  • "Vilken momssats gΓ€ller fΓΆr digitala tjΓ€nster i Sverige?" (Easy)
  • "Vilket organisationsnummer har LandveX AB?" (Easy)
  • "Vad Γ€r arbetsgivaravgiften i Sverige 2024?" (Medium)

13.3% accuracy on easy domain questions is below random chance for multiple-choice. The system is not functional as a knowledge retrieval tool at current state.

🟠 Data Issue #2: Only 91 Entities

The knowledge graph has 91 entities. For a compliance intelligence platform, this is a proof-of-concept scale. Sayari covers 600M+ entities. Reaching enterprise-viable scale (target: 50,000 entities) requires a 550x increase. Current target in KPIs: 50,000 entities. Current gap: 99.8% below target.

0.18% of target entity coverage

🟠 Data Issue #3: 309 Unresolved Contradictions

The knowledge_atlas.contradictions table has 309 detected contradictions. None appear to be resolved (no resolution status column). Contradictions in regulatory data that are served to users create legal liability. A financial institution using this data for compliance decisions could act on contradictory claims.

🟑 Data Issue #4: Embeddings Missing for 152 Documents

1,948 documents exist, 1,796 have BGE-M3 embeddings (1024-dim). 152 documents (7.8%) are not embedded and cannot be retrieved semantically. No process to detect and fix missing embeddings. Embedding model is single (BGE-M3) β€” OpenAI fallback mentioned in spec but not visible in data.

🟑 Data Issue #5: No LEI Checksum Validation

2,279 identifier_mappings exist but no code validates LEI checksums (ISO 17442). Invalid LEIs could map entities incorrectly, creating false linkages in compliance data. LEI has a built-in Luhn-like checksum β€” not verified.

🟑 Data Issue #6: Predicate Vocabulary Drift

30 predicates defined in predicates table. Claims use a free-text predicate column with no FK. Confirmed: claims can use predicates not in the vocabulary. This creates uncontrolled ontology drift β€” the knowledge graph loses semantic coherence over time. Example: "has_vat_rate_standard" vs "vat_rate_standard" vs "vat_standard" could all exist as different predicates for the same concept.

Click any item to expand. Sorted by severity.
πŸ”΄ Security β€” P0/P1
P0#001Replace jwt.decode() with jwt.verify() in auth.js

Current: jwt.decode() β€” no signature verification. Any forged JWT with admin roles passes.

Target: jwt.verify(token, secret, {algorithms:['HS256'], issuer:'amos.wavult.com'}) on every request.

Effort: S (2h) Β· Impact: Critical Β· Blockers: AMOS_JWT_SECRET must be set in prod

SecurityAuth
P0#002Remove hardcoded JWT secret fallback β€” fail on startup if missing

Current: 'amos-fallback-secret' if env var missing.

Target: if(!process.env.AMOS_JWT_SECRET) { console.error('FATAL'); process.exit(1); }

Effort: S (30min) Β· Impact: Critical

SecuritySecrets
P0#003Unify auth middleware β€” single jwt.verify() layer for all routes

Current: Two auth systems (auth.js decode vs aamos-pro.mjs verify). Inconsistent security.

Target: Single requireAuth middleware using jwt.verify(), applied globally. Remove requirePageAuth/requireApiAuth split.

Effort: M (1-2 days) Β· Impact: Critical

SecurityArchitecture
P1#004Add global rate limiting (express-rate-limit) on all API routes

Current: Rate limiting only on compliance routes. Main APIs unprotected.

Target: Global rate limiter: 100 req/min per IP for unauthenticated, 1000 req/min for authenticated. Redis-backed for distributed deployment.

Effort: S (4h) Β· Impact: High

SecurityOps
P1#005Remove unsafe-inline and unsafe-eval from CSP

Current: CSP allows 'unsafe-inline' and 'unsafe-eval' β€” negates XSS protection.

Target: Use nonces for inline scripts. Remove eval-based code. CSP score: A on securityheaders.com.

Effort: M (1 week, requires frontend audit) Β· Impact: High

SecurityCSP
P1#006Hide server version headers (nginx + Express)

Current: server: nginx/1.24.0 (Ubuntu) + x-powered-by: Express

Target: server_tokens off; in nginx. app.disable('x-powered-by') in Express.

Effort: S (30min) Β· Impact: Medium

Security
P1#007Commission first external penetration test

Current: Zero penetration test history. Auth.js flaw would have been found in first pen test.

Target: Annual CREST-certified penetration test. Fix all Critical/High findings within 30 days.

Effort: L (2-4 weeks, ~50-100K SEK) Β· Impact: Critical for enterprise sales

SecurityCompliance
P1#008Enable Row-Level Security (RLS) on sensitive tables

Current: Zero RLS policies. Any authenticated DB user sees all data across all organizations.

Target: RLS on claims, documents, manuals, financeco tables. org_id-based isolation.

Effort: M (1 week) Β· Impact: High (GDPR, multi-tenancy)

SecurityData IntegrityGDPR
🟠 Data Integrity β€” P1
P1#009Add FK constraint: claims.predicate β†’ predicates.predicate_code

Current: Predicate is free text. Any string accepted. Knowledge graph vocabulary uncontrolled.

Target: ALTER TABLE knowledge_atlas.claims ADD CONSTRAINT fk_predicate FOREIGN KEY (predicate) REFERENCES knowledge_atlas.predicates(predicate_code);

Effort: S (2h, need to clean existing data first) Β· Impact: High

Data IntegrityKnowledge Graph
P1#010Implement bitemporal versioning β€” enforce transaction_to on updates

Current: transaction_to always NULL (0 historical records). Claims are overwritten not versioned.

Target: Trigger on claims: UPDATE sets transaction_to=now() on old row, inserts new row with transaction_from=now(). WORM guarantee via trigger + RLS.

Effort: M (3 days) Β· Impact: High (regulatory compliance)

Data IntegrityCompliance
P1#011Add audit trigger on claims (INSERT/UPDATE/DELETE β†’ audit_log)

Current: Zero triggers on claims table. No audit trail. Deletes are silent.

Target: PostgreSQL AFTER trigger writes to immutable audit_log with: table, operation, old_row, new_row, performed_by, timestamp. Prevent DELETE via trigger RAISE EXCEPTION.

Effort: S (1 day) Β· Impact: High

Data IntegrityAudit
P1#012Implement hash chain for claim provenance

Current: content_hash exists on documents but not on claims. No chain of custody.

Target: Each claim gets SHA-256 hash of (claim_id + predicate + object_value + source_id + valid_from). Hash chain links sequential claims. Verifiable in court/audit.

Effort: M (3 days) · Impact: High (TÜV, legal admissibility)

Data IntegrityCompliance
P1#013Resolve 309 contradictions β€” build contradiction resolution workflow

Current: 309 contradictions detected, unresolved. Served to users.

Target: Contradiction resolution UI. Human curator workflow. SLA: no contradiction older than 48h unresolved in production data.

Effort: M (1 week) Β· Impact: High (data quality, legal liability)

Data IntegrityKnowledge Graph
P1#014Add LEI checksum validation on identifier_mappings

Current: 2,279 identifier_mappings β€” no LEI ISO 17442 checksum validation.

Target: Validate LEI format + Luhn-like checksum on insert. Reject invalid LEIs. Retroactively audit existing mappings.

Effort: S (1 day) Β· Impact: Medium

Data IntegrityIdentifiers
🟠 Pipeline Robustness β€” P1/P2
P1#015Implement idempotency on claim extraction pipeline

Current: No exactly-once semantics. Document reprocessing likely creates duplicate claims.

Target: Idempotency key = SHA256(document_id + predicate + extraction_run_id). Upsert instead of insert. No duplicates on replay.

Effort: M (3 days) Β· Impact: High

PipelineData Integrity
P1#016Add dead-letter queue for failed pipeline steps

Current: 87 pipeline errors/24h with no DLQ. Failed jobs are lost.

Target: Failed pipeline jobs β†’ DLQ table with retry count, last_error, next_retry_at. Auto-retry 3x with exponential backoff. Alert on DLQ growth.

Effort: M (3 days) Β· Impact: High

PipelineOps
P2#017Introduce message queue (BullMQ/Redis) for async processing

Current: Synchronous pipeline processing. One slow step blocks everything.

Target: BullMQ with Redis. Separate queues for: document ingestion, claim extraction, embedding generation, verification. Each step independently scalable.

Effort: L (2 weeks) Β· Impact: High

PipelineArchitecture
P2#018Fix missing embeddings for 152 documents

Current: 152 documents unembedded β€” invisible to semantic search.

Target: Backfill job. Health check: alert if embedding_v2 IS NULL for any document older than 24h.

Effort: S (2h) Β· Impact: Medium

PipelineData Quality
🟠 Evidence Quality β€” P1/P2
P1#019Fix gold set evaluation β€” 0/60 correct is broken, not just bad data

Current: Multiple evaluation runs all return 0/60 passed. This is a broken evaluator, not just poor accuracy.

Target: Debug evaluation pipeline. 60/60 questions must get candidate answers before accuracy is even measurable. Target: >70% factual accuracy on easy questions within 30 days.

Effort: M (1 week) Β· Impact: Critical

Evidence QualityData Quality
P1#020Expand gold set to 500+ questions with inter-rater agreement

Current: 60 questions. Too small for statistical significance. No inter-rater agreement score.

Target: 500 questions across all domains. 3 raters per question. Cohen's kappa > 0.8. Separate train/eval/test splits.

Effort: L (4 weeks, human labeling) Β· Impact: High

Evidence Quality
P2#021Implement citation reproducibility check

Current: Citation accuracy: 0% in gold set. Claims cite documents but reproducibility not verified.

Target: For each claim, verify: source_document still accessible, hash matches, claim text extractable from source. Automated nightly check.

Effort: M (3 days) Β· Impact: High (legal admissibility)

Evidence Quality
🟠 Operational β€” P1/P2
P1#022Fix EADDRINUSE crash loop β€” graceful shutdown + port release

Current: 7,894 PM2 restarts. Port 3100 not released on crash. EADDRINUSE on restart.

Target: server.close() on SIGTERM with 10s drain. PM2 kill_timeout: 15000. PM2 max_restarts: 10 with restart_delay: 5000. Alert if restart count > 3 in 1h.

Effort: S (4h) Β· Impact: High

OperationalStability
P1#023Add Prometheus + Grafana observability stack

Current: No metrics. No dashboards. MTTR is unmeasurable.

Target: prom-client in Node.js. Prometheus scraping on :9090. Grafana dashboards for: request latency p50/p95/p99, error rates, pipeline throughput, DB query times, memory/CPU.

Effort: M (1 week) Β· Impact: High

OperationalObservability
P1#024Write incident response runbook

Current: No runbook. No on-call rotation. No escalation path.

Target: Runbook for: AMOS down, DB unreachable, pipeline stuck, security incident, data corruption. Each with: detection β†’ impact β†’ steps β†’ escalation β†’ post-mortem.

Effort: M (3 days) Β· Impact: High

OperationalCompliance
P1#025Add 8GB swap file

Current: 0MB swap. OOM killer can terminate AMOS without warning.

Target: fallocate -l 8G /swapfile && chmod 600 /swapfile && mkswap /swapfile && swapon /swapfile. Add to /etc/fstab.

Effort: S (15min) Β· Impact: Medium

OperationalStability
P2#026Move AMOS to non-root user

Current: AMOS runs as root. Any RCE = full system compromise.

Target: Create amos user. chown -R amos:amos /opt/amos. Run as amos in PM2.

Effort: S (2h) Β· Impact: Medium

SecurityOperational
P2#027Test backup restoration (verify backups actually work)

Current: Backups exist but restoration never tested.

Target: Monthly restore drill to separate environment. Document RTO (target: <4h). Automated backup integrity check (pg_dump + pg_restore + count verification).

Effort: M (1 day + monthly) Β· Impact: High

OperationalResilience
P2#028Alert on IMAP authentication failure

Current: IMAP auth fails silently. Invoice inbox not working.

Target: Alert (Telegram/email) on IMAP auth failure. Fix credential rotation. Test IMAP connection on startup.

Effort: S (2h) Β· Impact: Medium

Operational
P2#029Add ALB + Auto Scaling for HA

Current: Single EC2 instance. One failure = total outage.

Target: AWS ALB + Target Group. 2+ EC2 in different AZs. Auto Scaling: min 2, max 4, scale on CPU >70%. Session state via Redis (not local memory).

Effort: L (1-2 weeks) Β· Impact: High

OperationalArchitecture
P2#030Delete .bak files from production filesystem

Current: Dozens of .bak files with previous security-sensitive code in production.

Target: Remove all .bak files. Use git branches for this purpose. git is already present.

Effort: S (1h) Β· Impact: Low-Medium

SecurityCode Quality
🟑 Competitive Position β€” P2
P2#031Entity coverage: 91 β†’ 10,000 (Phase 1 target)

Current: 91 entities. 0.18% of internal target (50K). Uncompetitive vs Sayari (600M).

Target: Ingest all Swedish Bolagsverket registered companies (~600K). Phase 1: top 10,000 by revenue/relevance. Source: Bolagsverket open data + BisNode/UC licensed data.

Effort: L (4-6 weeks) Β· Impact: Critical for sales

CompetitiveData Quality
P2#032Add sanctions screening (OFAC, EU, UN lists)

Current: Zero sanctions data. Kharon's core product is this.

Target: Ingest OFAC SDN List (free), EU Consolidated Sanctions List (free), UN Security Council list. Nightly update. API endpoint: POST /api/sanctions/screen with name+identifiers.

Effort: M (1 week) Β· Impact: High (compliance sales)

CompetitiveData Quality
P2#033PEP (Politically Exposed Person) database

Current: No PEP data. Refinitiv World-Check is the market leader here.

Target: OpenSanctions (open-source, free) has PEP + sanctions in one dataset. Integrate and update weekly. This gives PEP coverage for 240+ countries.

Effort: M (1 week) Β· Impact: High (KYC/AML sales)

CompetitiveData Quality
P2#034Nordic regulatory corpus: SFS, DVFS, Finansinspektionen cirkulΓ€r

Current: Handful of ECLI documents. Wolters Kluwer has millions.

Target: Ingest: all SFS regulations (free from riksdagen.se), Finansinspektionen publications, Skatteverket guidance, Bolagsverket rules. Target: 100K+ Swedish regulatory documents.

Effort: L (2-3 months) Β· Impact: Critical differentiation

CompetitiveData Quality
P2#035Beneficial ownership graph (UBO)

Current: No ownership chain data. Sayari's core moat is ownership graph.

Target: Ingest OpenCorporates beneficial ownership data. Build UBO traversal API: given company β†’ return full beneficial ownership tree to natural persons. AMLD5 compliance feature.

Effort: XL (2+ months) Β· Impact: Critical for differentiation

CompetitiveData Quality
🟑 Compliance β€” P2/P3
P2#036Begin ISO 27001 certification process

Current: No certification. Every enterprise customer will ask for it.

Target: Engage ISO 27001 consultant. Gap analysis β†’ ISMS implementation β†’ internal audit β†’ certification. Timeline: 9-12 months. Prioritize: access control (just fixed JWT issue), incident management, asset management.

Effort: XL (9-12 months, ~200-400K SEK) Β· Impact: Critical for enterprise

ComplianceSecurity
P2#037GDPR Article 30 record of processing activities

Current: Processing personal data (employees, contacts) without documented Article 30 record.

Target: Document all processing activities: purpose, legal basis, data subjects, retention, recipients. DPO designation (if needed). GDPR-compliant deletion workflow (right to erasure).

Effort: M (1-2 weeks legal + dev) Β· Impact: High (legal risk)

ComplianceGDPR
P2#038EU AI Act compliance assessment

Current: AI system used for compliance/regulatory decisions. EU AI Act risk classification needed.

Target: Legal assessment of AI Act risk category. If high-risk: conformity assessment, human oversight mechanisms, transparency obligations, robustness testing. Documentation for AI system registry.

Effort: L (legal + dev, 1-2 months) Β· Impact: High (regulatory)

ComplianceAI Governance
🟑 Architecture Debt β€” P2/P3
P2#039Break up monolithic server.mjs (2700+ lines)

Current: Single 2700+ line file importing 40+ modules. One crash = all services down.

Target: Extract into: core-api service, knowledge-api service, compliance-api service, finance-api service. Each runs in separate PM2 process. Shared: auth middleware, DB pool.

Effort: XL (2-3 months) Β· Impact: High

ArchitectureStability
P2#040Implement database migration framework (node-pg-migrate)

Current: No migration framework. Schema changes are ad-hoc. No rollback capability.

Target: node-pg-migrate or Flyway. All schema changes as numbered migrations. CI/CD runs migrations before deployment. Rollback for failed deployments.

Effort: M (1 week) Β· Impact: Medium

ArchitectureOps
P2#041Reduce DB user privilege (remove CREATEDB from wavult_admin)

Current: wavult_admin has CREATEDB privilege β€” excessive.

Target: Application user should have: SELECT, INSERT, UPDATE, DELETE on specific schemas only. No CREATE TABLE, no CREATEDB. Separate migration user with DDL rights.

Effort: S (1 day) Β· Impact: Medium

SecurityArchitecture
P2#042Add API input validation (Zod/AJV) on all endpoints

Current: No schema validation on API inputs. Arbitrary JSON accepted.

Target: Zod schemas for all POST/PUT bodies. Validate on entry. Return 400 with specific field errors. Prevents prototype pollution, type confusion, and aids debugging.

Effort: L (2 weeks) Β· Impact: Medium

SecurityArchitecture
P3#043Add PM2 cluster mode for multi-core utilization

Current: PM2 fork mode (single process). 63GB RAM / multiple CPUs underutilized.

Target: PM2 cluster mode: instances: "max". Requires stateless design (Redis sessions, no in-memory state).

Effort: M (3 days + stateless refactor) Β· Impact: Medium

ArchitecturePerformance
P3#044Implement OpenTelemetry distributed tracing

Current: No distributed tracing. Debugging multi-step pipeline failures is guesswork.

Target: @opentelemetry/sdk-node. Trace from API request through LLM call through DB write. Export to Jaeger or AWS X-Ray. Correlate with error logs.

Effort: M (1 week) Β· Impact: Medium

ObservabilityArchitecture
P3#045Add CI/CD pipeline with automated security scanning

Current: No CI/CD. Deploy = manual push. No automated tests.

Target: GitHub Actions/Gitea CI: npm test β†’ SAST (Semgrep) β†’ dependency audit (npm audit) β†’ deploy. Block deploy on critical security findings. Would have caught jwt.decode issue.

Effort: M (1 week) Β· Impact: High

SecurityOps
P3#046Automated dependency vulnerability scanning (npm audit)

Current: Unknown CVE exposure in node_modules.

Target: npm audit in CI. Dependabot for automated PRs on vulnerable deps. Target: zero high/critical CVEs in production deps.

Effort: S (1 day) Β· Impact: Medium

Security
P3#047Add Helme.sh / OWASP security headers scan baseline

Current: HSTS present. CSP present but weak. Missing: Permissions-Policy, cross-origin headers.

Target: Score A on securityheaders.com. Add: Permissions-Policy, Cross-Origin-Opener-Policy, Cross-Origin-Resource-Policy, Cross-Origin-Embedder-Policy.

Effort: S (1 day) Β· Impact: Low-Medium

Security
P3#048Implement SOC 2 Type I audit preparation

Current: No SOC 2. All major competitors have SOC 2 Type II.

Target: Engage Vanta/Drata for continuous compliance monitoring. SOC 2 Type I within 6 months. Type II within 12 months. Focus: Security + Availability trust service criteria.

Effort: XL (12 months, ~300-600K SEK) Β· Impact: Critical for enterprise sales

ComplianceSecurity
P3#049Document data lineage for all claims

Current: source_document_id present but no full lineage DAG. Cannot trace claim β†’ extraction step β†’ source doc β†’ source institution β†’ original URL.

Target: Full provenance graph. For each claim: pipeline version used, prompt used, model version, extraction timestamp, source URL, source access date, retrieval hash.

Effort: L (2-3 weeks) · Impact: High (legal, TÜV)

Data IntegrityCompliance
P3#050Expand from Swedish to Nordic (NO, DK, FI) regulatory coverage

Current: Sweden-only regulatory focus. Nordic competitors and customers exist in all 4 countries.

Target: Ingest: Norwegian Lovdata (open), Danish Retsinformation (open), Finnish Finlex (open). Add language-specific embedding models for each. Nordic coverage = defensible moat vs US competitors.

Effort: XL (3-6 months) Β· Impact: High (market expansion)

CompetitiveData Quality

🚫 Issue #1 β€” JWT Authentication Bypass (Blocks ALL enterprise deals)

Confirmed authentication bypass via forged JWT. Any TÜV, SOC 2, ISO 27001, or enterprise security review would immediately disqualify AAMOS. This is a Day 1 finding in any security audit. Must be fixed before ANY enterprise demo or pilot.

Fix time: 2 hours. Blocker for: every enterprise customer, every investor due diligence, every partnership.

🚫 Issue #2 β€” 0% Gold Set Pass Rate (Blocks product claims)

AAMOS cannot be marketed as a knowledge retrieval or compliance intelligence system while 0/60 questions are answered correctly. This is not "needs improvement" β€” it's a fundamental broken pipeline. Any technical evaluation by a customer would find this immediately.

🚫 Issue #3 β€” No Compliance Certifications (Blocks regulated industries)

Financial services (primary target: KYC/AML/compliance) require vendor ISO 27001 or SOC 2. Without this, AAMOS cannot sell to banks, insurance companies, law firms, or any regulated entity. Getting SOC 2 Type I takes 6+ months minimum. This is a long-lead blocker that must start immediately.

🚫 Issue #4 β€” 7,894 Crashes (Blocks SLA commitments)

No enterprise customer will sign an SLA with a system that has crashed 7,894 times. MTBF (Mean Time Between Failures) is currently measured in seconds, not hours. This blocks any SLA-backed product offering.

🟠 Issue #5 β€” No Penetration Test (Blocks security due diligence)

Every B2B enterprise security questionnaire asks "When was your last penetration test? By whom? Results?" The answer "never" is disqualifying. Even mid-market customers require this for vendor approval.

🟠 Issue #6 β€” GDPR Compliance Gaps (Legal risk)

Processing employee and customer personal data without Article 30 records, DPIA for AI processing, and documented deletion procedures. EU GDPR fines: up to 4% of global annual revenue. For a Swedish company, this is a serious legal exposure.

🟠 Issue #7 β€” EU AI Act Classification Gap

AAMOS processes compliance/legal data to support business decisions. This may qualify as a high-risk AI system under EU AI Act Article 6 Annex III (administration of justice, access to essential private services). If classified high-risk: conformity assessment, technical documentation, human oversight obligations β€” before deployment in the EU.

βœ… What's Blocking Audit (Current: 21%)

The readiness at 21% reflects the real state. Key blockers to reach certification readiness (>80%):

  • Fix JWT auth bypass (#001-003)
  • Implement bitemporal versioning (#010)
  • Add audit triggers (#011)
  • Add hash chain (#012)
  • Incident response runbook (#024)
  • Fix gold set pipeline (#019)
  • Resolve contradictions (#013)
  • Begin ISO 27001 (#036)
  • Penetration test (#007)
  • Document data lineage (#049)