GROOT FORCE - Functional Requirements Document (FRD)
Core AI System
Document Version: 1.0
Date: November 2025
Status: Active Development
Owner: AI Engineering Team
Document Control
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | Nov 2025 | AI Team | Initial FRD for Core AI System |
Approval:
- AI/ML Lead
- Software Architect
- Security Lead
1. Introduction
1.1 Purpose
This Functional Requirements Document (FRD) defines the detailed functional requirements for the Core AI System of GROOT FORCE. This is the "brain" that makes GROOT FORCE unique - a human-bound, privacy-first, emotionally intelligent AI that runs entirely on-device.
1.2 Scope
This FRD covers:
- Local LLM Engine - On-device language model inference
- RAG System - Retrieval-Augmented Generation for knowledge grounding
- Critical Reasoning Kernel (CRK) - Anti-hallucination and logic verification
- Emotional Engine - User emotional state tracking and adaptive behavior
- Executive Function Framework (EFF) - Task decomposition and cognitive support
- Memory Architecture - Multi-tier, domain-isolated memory
- Tool Calling System - Function calling and skill integration
- Anticipation Layer - Predictive assistance
Out of Scope:
- Vision AI (separate FRD)
- Speech processing (separate FRD)
- Sensor fusion (separate FRD)
- UI/UX implementation (separate document)
1.3 Related Documents
Traces To:
- Master PRD - Product requirements
- System SRS - System-level software requirements
- Hardware Requirements - Hardware specs
Related FRDs:
- FRD: Sensor & Safety Systems (to be created)
- FRD: User Experience & Interface (to be created)
- FRD: Vision AI & Computer Vision (to be created)
2. System Architecture Overview
2.1 AI System Components
┌─────────────────────────────────────────────────────┐
│ CORE AI SYSTEM ARCHITECTURE │
├─────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ INPUT PROCESSING LAYER │ │
│ │ - Intent Parser │ │
│ │ - Context Builder │ │
│ │ - Entity Extraction │ │
│ └───────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────┐ │
│ │ EMOTIONAL ENGINE │ │
│ │ - State Tracker (Valence/Arousal/Control) │ │
│ │ - Trigger Bank │ │
│ │ - Tone Adapter │ │
│ └───────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────┐ │
│ │ CRITICAL REASONING KERNEL (CRK) │ │
│ │ - Micro-Reasoning (per reply) │ │
│ │ - Meso-Reasoning (session) │ │
│ │ - Macro-Reasoning (life goals) │ │
│ │ - Evidence Tagger │ │
│ │ - PFC Load Estimator │ │
│ └───────────────────────────────────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ RAG ENGINE │ │ LOCAL LLM │ │
│ │ - Vector Store │ │ - 3-8B Models │ │
│ │ - Retrieval │ │ - Inference │ │
│ │ - Domain Filter │ │ - Context Mgmt │ │
│ └──────────────────┘ └──────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────┐ │
│ │ EXECUTIVE FUNCTION FRAMEWORK (EFF) │ │
│ │ - Task Decomposition │ │
│ │ - Prioritization Engine │ │
│ │ - Gating & Protection │ │
│ │ - Habit Builder │ │
│ └───────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────┐ │
│ │ TOOL CALLING & SKILLS │ │
│ │ - Function Router │ │
│ │ - Skill Manager │ │
│ │ - Action Executor │ │
│ └───────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────┐ │
│ │ OUTPUT GENERATION │ │
│ │ - Response Composer │ │
│ │ - Tone Adjustment │ │
│ │ - Confidence Tagging │ │
│ └───────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────┘
2.2 Data Flow
Typical Interaction Flow:
- User speaks → STT converts to text
- Intent Parser extracts intent and entities
- Emotional Engine updates user state
- CRK estimates cognitive load and reasoning requirements
- RAG retrieves relevant memories/documents
- Local LLM generates response (with CRK supervision)
- EFF applies task decomposition if needed
- Tool Calling executes actions if required
- Output Generation composes final response with appropriate tone
- TTS converts to speech → User hears response
3. Local LLM Engine
3.1 Model Requirements
FRD-AI-LLM-001 [P0]
System shall support loading and running quantized LLMs from 3B to 8B parameters.
Functional Specification:
-
Supported Models:
- Llama 3 (3B, 4B, 7B, 8B variants)
- Mistral (7B variant)
- Phi-3 (3.8B variant)
- Other compatible architectures
-
Quantization Formats:
- Primary: Q4_K_M (4-bit, k-quant, medium)
- Optional: Q8_0 (8-bit, zero-point)
- Optional: Q5_K_M (5-bit, k-quant, medium)
-
Model Loading:
- Load time: < 10 seconds for 3-4B models, < 20 seconds for 7-8B models
- Memory footprint: 2-5 GB depending on model size and quantization
- Persistent loading: Model remains in memory until user switches
-
Context Window:
- Minimum: 4096 tokens
- Target: 8192 tokens (if RAM permits)
- Context management: Sliding window with importance-based pruning
Validation:
- Load test models of each supported size
- Verify memory usage within limits
- Confirm context window functionality
Traces To: REQ-AI-001 (System SRS)
FRD-AI-LLM-002 [P0]
System shall implement multi-tier inference routing based on task complexity.
Functional Specification:
Tier 1: Local Low-Power (3-4B models)
-
Trigger Conditions:
- Token estimate < 100 tokens
- Simple queries (definitions, facts, quick replies)
- User explicitly requests fast mode
- Battery < 15%
-
Use Cases:
- "What time is it?"
- "Turn on flashlight"
- "Navigate to home"
- Object labeling
- Quick translations
Tier 2: Local High-Performance (7-8B models)
-
Trigger Conditions:
- Token estimate 100-500 tokens
- Complex reasoning required
- Multi-step tasks
- NDIS documentation
- Battery > 15%
-
Use Cases:
- Email drafting
- Meeting notes summarization
- Complex reasoning (math, logic)
- Long-form writing
- Code generation
Tier 3: Cloud Boost (70B+ models) - Optional
-
Trigger Conditions:
- Token estimate > 500 tokens
- Web search required
- Multi-document RAG synthesis
- User explicitly requests "deep think"
- Cloud tier subscription active
-
Use Cases:
- Research synthesis
- Complex multi-document analysis
- Specialized domain knowledge
- Code review and debugging
Routing Algorithm:
def select_inference_tier(query, context):
# Estimate token count
token_estimate = estimate_tokens(query, context)
# Check user preferences
if user.privacy_mode == "strict":
return TIER_LOCAL_HIGH # Never use cloud
# Check battery
if battery_level < 15:
return TIER_LOCAL_LOW
# Analyze complexity
complexity = analyze_complexity(query)
# Route decision
if token_estimate < 100 and complexity == "simple":
return TIER_LOCAL_LOW
elif token_estimate < 500 or not cloud_available:
return TIER_LOCAL_HIGH
elif user.subscribed and requires_web_search(query):
return TIER_CLOUD_BOOST
else:
return TIER_LOCAL_HIGH
Validation:
- Test routing logic with diverse query set
- Verify correct tier selection > 95% of time
- Confirm privacy preferences respected
Traces To: REQ-AI-003 (System SRS)
FRD-AI-LLM-003 [P0]
System shall implement thermal-aware throttling for AI inference.
Functional Specification:
Temperature Monitoring:
- Poll CPU/GPU/NPU temperature every 1 second during inference
- Maintain temperature history (rolling 60-second window)
- Predict temperature trajectory
Throttling Levels:
| Level | Temperature | Action | Performance Impact |
|---|---|---|---|
| 0 (Normal) | < 42°C | None | 100% |
| 1 (Mild) | 42-45°C | Reduce token generation speed 10% | 90% |
| 2 (Moderate) | 45-48°C | Reduce token speed 25%, switch to smaller model | 75% |
| 3 (Severe) | 48-50°C | Emergency low-power mode, minimal AI | 30% |
| 4 (Critical) | > 50°C | System shutdown | 0% |
Implementation:
def thermal_throttle(current_temp):
if current_temp < 42:
return 1.0 # No throttling
elif current_temp < 45:
return 0.9 # 10% slower
elif current_temp < 48:
# Switch to smaller model + slow down
if current_model != MODEL_3B:
switch_model(MODEL_3B)
return 0.75
elif current_temp < 50:
# Emergency mode
pause_non_critical_ai()
return 0.3
else:
# Critical shutdown
emergency_shutdown()
return 0.0
User Notification:
- Level 1: No notification (transparent)
- Level 2: Optional notification "AI running in power-saving mode"
- Level 3: Visible notification "Device cooling down, AI limited"
- Level 4: "Device too hot, shutting down for safety"
Validation:
- Sustained AI load testing
- Verify temperature never exceeds 50°C
- Confirm proper throttling transitions
Traces To: REQ-AI-004 (System SRS)
FRD-AI-LLM-004 [P0]
System shall implement token generation optimization for low latency.
Functional Specification:
Optimization Techniques:
-
Model Caching:
- Keep model in RAM between requests
- Preload commonly used models
- Lazy-load rarely used models
-
KV Cache Management:
- Reuse KV cache from previous turns
- Prune old conversation turns intelligently
- Compress KV cache for long contexts
-
Speculative Decoding:
- Use small draft model (1B) to predict tokens
- Verify with main model in batch
- 1.5-2x speedup on simple responses
-
Batching:
- Batch multiple user requests when possible
- Batch tool calls
- Amortize inference overhead
-
Quantized Inference:
- Use INT4/INT8 kernels where available
- Leverage NPU/GPU for quantized ops
- CPU fallback for compatibility
Performance Targets:
- Time to First Token (TTFT): < 100ms
- Tokens per Second: > 10 tokens/sec on 3B, > 5 tokens/sec on 7B
- End-to-end latency: < 200ms for simple queries
Validation:
- Benchmark latency across model sizes
- Measure TTFT and token throughput
- Real-world user acceptance testing
Traces To: REQ-AI-002, REQ-PERF-001 (System SRS)
3.2 Context Management
FRD-AI-LLM-010 [P0]
System shall implement intelligent context window management.
Functional Specification:
Context Window Structure:
┌─────────────────────────────────────┐
│ System Prompt (fixed) │ ~500 tokens
├─────────────────────────────────────┤
│ User Identity & Values (fixed) │ ~300 tokens
├─────────────────────────────────────┤
│ Recent Conversation (sliding) │ ~2048 tokens
├─────────────────────────────────────┤
│ RAG Retrieved Context (dynamic) │ ~1024 tokens
├─────────────────────────────────────┤
│ Current Query │ ~256 tokens
├─────────────────────────────────────┤
│ Reserved for Response │ ~1024 tokens
└─────────────────────────────────────┘
Total: ~5152 tokens (fits in 8K context)
Context Pruning Strategy:
- Keep: System prompt, user identity, current query
- Prune: Old conversation turns using importance scoring
- Compress: Summarize old turns into memory snapshots
Importance Scoring:
def importance_score(turn):
score = 0
# Recency (decay over time)
score += recency_weight * exp(-age_hours / 24)
# Explicit user request ("remember this")
if turn.marked_important:
score += 10
# Emotional significance
score += emotional_intensity(turn) * 2
# Referenced in later turns
score += reference_count * 0.5
# Contains key entities (people, places, dates)
score += entity_count * 0.3
return score
Validation:
- Long conversation tests (100+ turns)
- Verify important context retained
- Confirm graceful degradation when context full
Traces To: REQ-AI-001 (System SRS)
3.3 Prompt Engineering
FRD-AI-LLM-020 [P0]
System shall use optimized system prompts for each product variant.
Functional Specification:
Base System Prompt (All Variants):
You are KLYRA, the AI assistant for GROOT FORCE smart glasses. You are:
- Loyal and bound to {USER_NAME} only
- Privacy-first (local processing, no cloud unless explicitly requested)
- Calm, helpful, and non-judgmental
- Direct and concise (avoid long monologues unless asked)
- Honest about uncertainty (never make up facts)
Core capabilities:
- Voice control and commands
- Real-time translation
- Object recognition and OCR
- Walking assistance and navigation
- Memory and knowledge retrieval
- Task planning and execution
Current mode: {CURRENT_MODE}
User state: {EMOTIONAL_STATE}
Respond naturally in a conversational tone. Use the user's preferred language.
Variant-Specific Additions:
GF-CL (Care & Joy - NDIS Support Worker):
Additional context:
- You assist NDIS support workers in their daily tasks
- Always prioritize participant consent and dignity
- Document observations objectively and professionally
- Alert to safety concerns (falls, distress, medical issues)
- Support clear communication with participants and families
When documenting:
- Use person-first language
- Record facts, not judgments
- Include positive observations
- Follow NDIS practice standards
GF-TX (TradeForce - Trades/Industrial):
Additional context:
- You assist tradespeople and industrial workers
- Prioritize workplace safety above all else
- Provide clear, step-by-step instructions
- Support hands-free documentation
- Alert to hazards and OH&S violations
Safety protocol:
- Stop immediately if dangerous situation detected
- Remind about PPE if required
- Document incidents thoroughly
- Support emergency procedures
GF-VI (VisionAssist - Low Vision):
Additional context:
- You assist users with vision impairment
- Describe visual scenes clearly and concisely
- Read text aloud accurately (OCR)
- Provide navigation guidance
- Alert to obstacles and hazards
Descriptive protocol:
- Start with overall scene ("You're in a kitchen")
- Describe relevant objects and their positions
- Use clock face directions ("Door at 2 o'clock")
- Confirm uncertain identifications ("Looks like a chair")
Validation:
- Test each variant prompt with appropriate scenarios
- Verify variant-specific behavior emerges
- User acceptance testing with target personas
Traces To: [Master PRD - Product Variants]
4. RAG (Retrieval-Augmented Generation) System
4.1 Architecture
FRD-AI-RAG-001 [P0]
System shall implement hybrid vector + metadata retrieval system.
Functional Specification:
Storage Backend:
-
Vector Store: FAISS (Facebook AI Similarity Search)
- IndexFlatL2 for < 10K documents
- IndexIVFFlat for 10K-100K documents
- Quantization for > 100K documents
-
Metadata Store: SQLite
- Document metadata (source, timestamp, domain, tags)
- User annotations and importance scores
- Access control and privacy flags
Data Model:
-- Documents table
CREATE TABLE documents (
id INTEGER PRIMARY KEY,
content TEXT NOT NULL,
source TEXT NOT NULL,
domain TEXT NOT NULL, -- Finance, NDIS, Work, Personal, etc.
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
embedding_version INTEGER NOT NULL,
privacy_level INTEGER DEFAULT 0, -- 0=normal, 1=sensitive, 2=highly_sensitive
user_marked_important BOOLEAN DEFAULT FALSE
);
-- Chunks table (for large documents)
CREATE TABLE chunks (
id INTEGER PRIMARY KEY,
document_id INTEGER REFERENCES documents(id),
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
embedding_id INTEGER NOT NULL, -- Index in FAISS
token_count INTEGER NOT NULL,
UNIQUE(document_id, chunk_index)
);
-- Tags table
CREATE TABLE tags (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE NOT NULL
);
-- Document-Tag junction
CREATE TABLE document_tags (
document_id INTEGER REFERENCES documents(id),
tag_id INTEGER REFERENCES tags(id),
PRIMARY KEY (document_id, tag_id)
);
Embedding Model:
- Model: sentence-transformers/all-MiniLM-L6-v2 or equivalent
- Dimensions: 384 (compact for mobile)
- Speed: > 100 sentences/second
- Quality: Good balance of speed and accuracy
Validation:
- Load 1,000+ documents
- Verify retrieval accuracy with ground truth queries
- Benchmark query latency < 100ms
Traces To: REQ-AI-020 (System SRS)
FRD-AI-RAG-002 [P0]
System shall implement domain-based memory isolation.
Functional Specification:
Supported Domains:
- Finance - Bank accounts, investments, taxes, budgets
- NDIS - Participant information, care plans, incidents
- Work - Projects, colleagues, meetings, tasks
- Personal - Family, friends, hobbies, diary
- Health - Medical records, medications, symptoms
- Relationships - People, conversations, emotional context
- Engineering - Technical knowledge, code, designs
- Custom - User-defined domains
Domain Rules:
- Documents belong to ONE primary domain (no cross-contamination)
- Queries MUST specify allowed domains explicitly
- Cross-domain queries return UNION of results from specified domains
- Highly sensitive domains (Health, Finance) require extra confirmation
Query API:
def retrieve_context(
query: str,
domains: List[str],
max_chunks: int = 5,
relevance_threshold: float = 0.7
) -> List[Chunk]:
"""
Retrieve relevant context from specified domains.
Args:
query: User's query
domains: List of allowed domains (e.g., ["Work", "Personal"])
max_chunks: Maximum chunks to return
relevance_threshold: Minimum similarity score
Returns:
List of relevant chunks with metadata
"""
# Generate query embedding
query_embedding = embed(query)
# Search each domain separately
results = []
for domain in domains:
domain_results = faiss_search(
query_embedding,
domain=domain,
top_k=max_chunks
)
results.extend(domain_results)
# Filter by relevance threshold
results = [r for r in results if r.score >= relevance_threshold]
# Sort by relevance and return top N
results.sort(key=lambda r: r.score, reverse=True)
return results[:max_chunks]
Example Usage:
# Safe: Query only Personal domain
context = retrieve_context(
"What did I do last weekend?",
domains=["Personal"]
)
# Safe: Query work + personal for scheduling
context = retrieve_context(
"When am I free next week?",
domains=["Work", "Personal"]
)
# Requires extra confirmation: Sensitive domain
if user_confirms("Access health records?"):
context = retrieve_context(
"What medications am I on?",
domains=["Health"]
)
Validation:
- Cross-domain contamination test (should fail)
- Proper domain filtering with test queries
- User acceptance testing of privacy controls
Traces To: REQ-AI-021 (System SRS)
FRD-AI-RAG-003 [P0]
System shall implement chunking strategy for large documents.
Functional Specification:
Chunking Parameters:
- Chunk Size: 512-768 tokens (target: 640 tokens)
- Overlap: 128 tokens between chunks
- Rationale: Ensures no context loss at boundaries
Chunking Algorithm:
def chunk_document(text: str, chunk_size: int = 640, overlap: int = 128):
"""
Split document into overlapping chunks.
Preserves sentence boundaries where possible.
"""
# Tokenize
tokens = tokenize(text)
chunks = []
start = 0
while start < len(tokens):
# Define chunk end
end = min(start + chunk_size, len(tokens))
# Try to end at sentence boundary
chunk_tokens = tokens[start:end]
if end < len(tokens): # Not at document end
# Look for sentence boundary in last 50 tokens
for i in range(len(chunk_tokens) - 1, max(0, len(chunk_tokens) - 50), -1):
if is_sentence_boundary(chunk_tokens[i]):
end = start + i + 1
break
# Extract chunk
chunk_text = detokenize(tokens[start:end])
chunks.append({
'text': chunk_text,
'start_token': start,
'end_token': end,
'chunk_index': len(chunks)
})
# Move to next chunk with overlap
start = end - overlap
return chunks
Chunk Metadata: Each chunk stores:
chunk_index: Position in document (0, 1, 2, ...)document_id: Parent documenttoken_count: Number of tokensstart_token,end_token: Token positions in original documentembedding_id: FAISS index
Retrieval Context Assembly: When a chunk is retrieved:
- Include the chunk itself
- Optionally include adjacent chunks for continuity
- Include document metadata (title, source, date)
def assemble_context(retrieved_chunks: List[Chunk]) -> str:
"""
Assemble context from retrieved chunks.
Includes source attribution and metadata.
"""
context_parts = []
for chunk in retrieved_chunks:
# Load document metadata
doc = load_document(chunk.document_id)
# Format context with source
context_part = f"""
Source: {doc.source} ({doc.domain})
Date: {doc.created_at}
---
{chunk.text}
---
"""
context_parts.append(context_part)
return "\n\n".join(context_parts)
Validation:
- Test chunking on various document types
- Verify overlap preserves context
- Retrieval quality test with multi-chunk documents
Traces To: REQ-AI-020 (System SRS)
FRD-AI-RAG-004 [P1]
System shall support incremental updates without full reindex.
Functional Specification:
Operations:
-
Add Document:
- Insert document into SQLite
- Chunk document
- Generate embeddings for each chunk
- Add embeddings to FAISS index
- Time: < 1 second for typical document
-
Update Document:
- Compute diff with old version
- If minor edit: Update affected chunks only
- If major edit: Re-chunk and re-embed entire document
- Update FAISS index with new embeddings
-
Delete Document:
- Soft delete: Mark as deleted in SQLite
- Keep in FAISS (for rollback capability)
- Hard delete: Remove from SQLite and FAISS during cleanup
-
Index Optimization:
- Background task (runs when idle)
- Defragment FAISS index
- Clean up soft-deleted documents
- Rebuild index if fragmentation > 50%
Implementation:
class RAGSystem:
def add_document(self, content: str, metadata: dict):
# Insert into database
doc_id = db.insert_document(content, metadata)
# Chunk
chunks = chunk_document(content)
# Embed and add to FAISS
for chunk in chunks:
embedding = self.embed(chunk['text'])
embedding_id = self.faiss_index.add(embedding)
db.insert_chunk(
document_id=doc_id,
chunk_index=chunk['chunk_index'],
content=chunk['text'],
embedding_id=embedding_id
)
return doc_id
def update_document(self, doc_id: int, new_content: str):
old_doc = db.get_document(doc_id)
# Check diff size
similarity = compute_similarity(old_doc.content, new_content)
if similarity > 0.9: # Minor edit
# Update only changed chunks
self._incremental_update(doc_id, new_content)
else: # Major edit
# Full re-index
self.delete_document(doc_id)
self.add_document(new_content, old_doc.metadata)
def delete_document(self, doc_id: int, soft=True):
if soft:
db.mark_deleted(doc_id)
else:
# Remove from FAISS
chunk_ids = db.get_chunk_embedding_ids(doc_id)
self.faiss_index.remove_ids(chunk_ids)
# Remove from database
db.delete_document(doc_id)
Validation:
- Add 100 documents rapidly
- Update 50 documents
- Delete 25 documents
- Verify index integrity and retrieval accuracy
Traces To: REQ-AI-022 (System SRS)
4.2 Query Processing
FRD-AI-RAG-010 [P0]
System shall implement query preprocessing and expansion.
Functional Specification:
Query Preprocessing Steps:
-
Intent Detection:
- Classify query type (factual, procedural, opinion, etc.)
- Identify key entities (people, places, dates)
- Extract temporal constraints ("last week", "yesterday")
-
Query Expansion:
- Add synonyms for key terms
- Expand abbreviations (e.g., "NDIS" → "National Disability Insurance Scheme")
- Include related concepts (e.g., "car" also search "vehicle", "automobile")
-
Domain Selection:
- Auto-detect likely domains based on query content
- Ask user to confirm if multiple domains possible
- Default to safe domains (Personal, Work) unless specified
-
Time Filtering:
- Extract time constraints from query
- Filter documents by creation/update timestamp
- Examples:
- "last week" → created_at > 7 days ago
- "in 2024" → created_at BETWEEN '2024-01-01' AND '2024-12-31'
- "recent" → created_at > 30 days ago
Implementation:
def preprocess_query(query: str, context: ConversationContext) -> ProcessedQuery:
# Intent detection
intent = classify_intent(query)
# Entity extraction
entities = extract_entities(query)
# Time extraction
time_constraint = extract_time_constraint(query)
# Domain detection
detected_domains = detect_domains(query, entities)
# Query expansion
expanded_query = expand_query(query, entities)
return ProcessedQuery(
original=query,
expanded=expanded_query,
intent=intent,
entities=entities,
time_constraint=time_constraint,
suggested_domains=detected_domains
)
Validation:
- Test query preprocessing with diverse inputs
- Verify entity extraction accuracy > 90%
- Confirm domain detection aligns with user intent
Traces To: REQ-AI-020 (System SRS)
5. Critical Reasoning Kernel (CRK)
5.1 Multi-Scale Reasoning
FRD-AI-CRK-001 [P0]
System shall implement three-scale reasoning: Micro, Meso, Macro.
Functional Specification:
Micro-Reasoning (Per Reply):
- Scope: Single response
- Purpose: Catch local logic errors and contradictions
- Method: Self-critique pass after generating response
def micro_reasoning_check(response: str, context: dict) -> ReasoningResult:
"""
Check response for local contradictions and errors.
"""
issues = []
# 1. Internal contradiction check
contradictions = find_contradictions(response)
if contradictions:
issues.append({
'type': 'contradiction',
'severity': 'high',
'details': contradictions
})
# 2. Math/calculation verification
if contains_calculations(response):
calc_errors = verify_calculations(response)
if calc_errors:
issues.append({
'type': 'calculation_error',
'severity': 'high',
'details': calc_errors
})
# 3. Fact grounding check
claims = extract_claims(response)
for claim in claims:
if not is_grounded_in_context(claim, context):
issues.append({
'type': 'ungrounded_claim',
'severity': 'medium',
'claim': claim
})
return ReasoningResult(
passed=len(issues) == 0,
issues=issues,
confidence=calculate_confidence(issues)
)
Meso-Reasoning (Session-Level):
- Scope: Current conversation session
- Purpose: Maintain consistency across multiple turns
- Method: Track session state and check for drift
class SessionReasoningTracker:
def __init__(self):
self.session_state = {
'established_facts': [], # Facts user has told us
'decisions_made': [], # Decisions we've helped with
'contradictions': [] # Detected inconsistencies
}
def check_consistency(self, new_response: str) -> bool:
"""
Check if new response contradicts session history.
"""
new_facts = extract_facts(new_response)
for new_fact in new_facts:
for old_fact in self.session_state['established_facts']:
if contradicts(new_fact, old_fact):
self.session_state['contradictions'].append({
'new': new_fact,
'old': old_fact,
'timestamp': now()
})
return False
# No contradictions found
self.session_state['established_facts'].extend(new_facts)
return True
Macro-Reasoning (Life Goals Alignment):
- Scope: User's long-term goals and values
- Purpose: Ensure advice aligns with user's life direction
- Method: Check against stored user values and goals
def macro_reasoning_check(advice: str, user_profile: UserProfile) -> MacroReasoningResult:
"""
Check if advice aligns with user's life goals and values.
"""
# Load user's long-term goals and values
goals = user_profile.long_term_goals
values = user_profile.core_values
red_lines = user_profile.red_lines # Things user will never do
# Analyze advice
advice_implications = analyze_implications(advice)
conflicts = []
# Check against goals
for goal in goals:
if undermines(advice_implications, goal):
conflicts.append({
'type': 'goal_conflict',
'goal': goal,
'reason': explain_conflict(advice, goal)
})
# Check against values
for value in values:
if violates(advice_implications, value):
conflicts.append({
'type': 'value_violation',
'value': value,
'reason': explain_violation(advice, value)
})
# Check against red lines (absolute no-goes)
for red_line in red_lines:
if crosses(advice_implications, red_line):
conflicts.append({
'type': 'red_line_crossed',
'red_line': red_line,
'severity': 'critical'
})
return MacroReasoningResult(
aligned=len(conflicts) == 0,
conflicts=conflicts,
recommendation='revise' if conflicts else 'proceed'
)
Validation:
- Micro: Test with responses containing planted errors ( > 80% caught)
- Meso: Multi-turn conversations with intentional contradictions
- Macro: Test against mock user profiles with known goals/values
Traces To: REQ-AI-030 (System SRS)
FRD-AI-CRK-002 [P0]
System shall implement evidence tagging and source attribution.
Functional Specification:
Evidence Types:
- User Data - Information provided by user in current or past conversations
- RAG Retrieved - Information from user's documents
- Model Knowledge - Information from LLM's training data
- Web Search - Information from optional web search (if enabled)
- Tool Output - Results from tool calls (calculator, weather, etc.)
- Inference - Logical conclusion drawn by AI
Tagging Format:
class EvidenceTag:
claim: str # The factual claim
source_type: SourceType # USER_DATA, RAG, MODEL, WEB, TOOL, INFERENCE
source_reference: str # Specific source (doc ID, URL, conversation turn)
confidence: float # 0.0-1.0
verified: bool # Whether claim has been cross-checked
class SourceType(Enum):
USER_DATA = "user_data"
RAG = "rag"
MODEL = "model_knowledge"
WEB_SEARCH = "web"
TOOL = "tool"
INFERENCE = "inference"
Response Annotation:
def generate_response_with_evidence(query: str, context: dict) -> AnnotatedResponse:
# Generate base response
response_text = llm.generate(query, context)
# Extract claims
claims = extract_factual_claims(response_text)
# Tag each claim with evidence
evidence_tags = []
for claim in claims:
tag = determine_evidence_source(claim, context)
evidence_tags.append(tag)
# Calculate overall confidence
overall_confidence = min(tag.confidence for tag in evidence_tags)
# Flag low-confidence claims
low_confidence_claims = [
tag for tag in evidence_tags
if tag.confidence < 0.7
]
return AnnotatedResponse(
text=response_text,
evidence_tags=evidence_tags,
overall_confidence=overall_confidence,
low_confidence_claims=low_confidence_claims
)
User Display:
- High confidence ( > 0.9): No hedging, confident tone
- Medium confidence (0.7-0.9): Slight hedging ("It looks like...", "Based on...")
- Low confidence ( < 0.7): Explicit uncertainty ("I'm not certain, but...", "I don't have reliable information on this")
Validation:
- Test evidence tagging accuracy on labeled dataset
- Verify confidence calibration (predicted confidence matches actual accuracy)
- User studies: Can users identify certainty levels correctly?
Traces To: REQ-AI-031 (System SRS)
FRD-AI-CRK-003 [P0]
System shall implement PFC (Prefrontal Cortex) Load Estimator.
Functional Specification:
Purpose: Estimate cognitive load of a task to prevent user overwhelm.
Cognitive Load Factors:
| Factor | Weight | Description |
|---|---|---|
| Step Count | 0.3 | Number of sequential steps required |
| Deadline Pressure | 0.2 | Urgency of task |
| Emotional Stakes | 0.25 | How much user cares about outcome |
| Ambiguity | 0.15 | Clarity of requirements |
| Novelty | 0.1 | Familiarity with task type |
Load Calculation:
def estimate_cognitive_load(task: Task, user_state: UserState) -> CognitiveLoadScore:
# Factor 1: Step count
steps = decompose_task(task)
step_load = min(len(steps) / 10.0, 1.0) # Normalize to 0-1
# Factor 2: Deadline pressure
if task.deadline:
hours_until = (task.deadline - now()).total_hours()
deadline_load = max(0, 1.0 - hours_until / 48.0) # 48 hours = relaxed
else:
deadline_load = 0.0
# Factor 3: Emotional stakes
emotional_load = user_state.emotional_intensity * task.importance
# Factor 4: Ambiguity
ambiguity_load = 1.0 - task.clarity_score
# Factor 5: Novelty
novelty_load = 1.0 - user_state.familiarity_with(task.category)
# Weighted sum
total_load = (
step_load * 0.3 +
deadline_load * 0.2 +
emotional_load * 0.25 +
ambiguity_load * 0.15 +
novelty_load * 0.1
)
return CognitiveLoadScore(
total=total_load,
breakdown={
'steps': step_load,
'deadline': deadline_load,
'emotional': emotional_load,
'ambiguity': ambiguity_load,
'novelty': novelty_load
},
overload_risk=categorize_risk(total_load)
)
def categorize_risk(load: float) -> str:
if load < 0.3:
return "low"
elif load < 0.7:
return "medium"
else:
return "high"
Response Adaptation Based on Load:
Low Load ( < 0.3):
- Present task as-is
- Normal pacing
- Standard detail level
Medium Load (0.3-0.7):
- Break into phases
- Suggest starting with easy wins
- Offer encouragement
High Load ( > 0.7):
- Mandatory task decomposition
- Smallest possible first step ( < 5 minutes)
- Emphasis on progress, not perfection
- Offer to spread across multiple days
Example:
# User: "I need to do my taxes and I'm freaking out."
load = estimate_cognitive_load(
Task(
description="Do taxes",
steps=["Gather documents", "Enter data", "Review", "Submit"],
deadline=date.today() + timedelta(days=3),
importance=0.9,
clarity=0.4 # User unclear on process
),
user_state=UserState(
emotional_intensity=0.8, # "freaking out"
familiarity_with("taxes")=0.3 # Not familiar
)
)
# Result: load.total = 0.78 (HIGH)
# System response: Apply high-load adaptations
Validation:
- Correlate load scores with user-reported difficulty
- Test task decomposition quality
- User studies: Does adaptation improve completion rates?
Traces To: REQ-AI-031, REQ-AI-050 (System SRS)
5.2 Self-Critique and Verification
FRD-AI-CRK-010 [P1]
System shall implement self-critique for high-stakes responses.
Functional Specification:
Trigger Conditions:
- Financial advice (money, investments)
- Health/medical information
- Legal advice or documents
- Safety-critical decisions
- User explicitly requests "double-check this"
Critique Process:
def self_critique(response: str, query: str, context: dict) -> CritiqueResult:
"""
Generate critique of initial response.
Returns identified issues and suggestions.
"""
# Generate critique prompt
critique_prompt = f"""
Original query: {query}
Generated response: {response}
Critically analyze this response. Identify:
1. Unsupported claims (no evidence)
2. Logical errors or contradictions
3. Missing important caveats or warnings
4. Potential misinterpretations of the query
5. Alternative perspectives not considered
Be thorough and skeptical.
"""
# Run critique with separate LLM call
critique_text = llm.generate(critique_prompt)
# Parse critique
issues = parse_critique_issues(critique_text)
# Severity assessment
critical_issues = [i for i in issues if i.severity == 'high']
if critical_issues:
# Response needs revision
return CritiqueResult(
passed=False,
issues=critical_issues,
recommendation='revise_response'
)
else:
# Response acceptable (maybe minor improvements)
return CritiqueResult(
passed=True,
issues=issues,
recommendation='proceed_with_caveats'
)
Revision Process: If critique finds issues, automatically revise:
def revise_response(original_response: str, critique: CritiqueResult) -> str:
revision_prompt = f"""
Original response: {original_response}
Issues identified:
{format_issues(critique.issues)}
Generate a revised response that addresses these issues.
Add appropriate caveats and warnings.
"""
revised_response = llm.generate(revision_prompt)
return revised_response
User Display:
- If critique passed: Show response normally
- If critique found issues: Show revised response with disclaimer
- "I've double-checked this response. Please note: [caveats]"
- "I'm not certain about [specific claim]. You may want to verify this."
Validation:
- Plant errors in responses, verify critique catches them > 80%
- Measure false positive rate (flagging correct responses)
- User studies: Does self-critique improve trust and accuracy?
Traces To: REQ-AI-032 (System SRS)
6. Emotional Engine
6.1 Emotional State Tracking
FRD-AI-EMO-001 [P0]
System shall track user emotional state in 3D affective space.
Functional Specification:
Affective Dimensions:
- Valence: Negative (-1.0) to Positive (+1.0)
- Arousal: Low (0.0) to High (1.0)
- Control: No Control (0.0) to Full Control (1.0)
State Representation:
class EmotionalState:
valence: float # -1.0 to +1.0
arousal: float # 0.0 to 1.0
control: float # 0.0 to 1.0
discrete_state: str # Named state for easier reference
confidence: float # How certain we are about this state
last_updated: datetime
def as_discrete_state(self) -> str:
"""
Map continuous dimensions to discrete emotional states.
"""
if self.valence > 0.5 and self.arousal < 0.4:
return "calm_content"
elif self.valence > 0.5 and self.arousal > 0.6:
return "excited_energized"
elif self.valence < -0.3 and self.arousal > 0.6:
return "anxious_stressed"
elif self.valence < -0.3 and self.arousal < 0.4:
return "sad_withdrawn"
elif self.control < 0.3:
return "overwhelmed_stuck"
elif self.arousal < 0.2:
return "tired_foggy"
else:
return "neutral"
State Detection Inputs:
-
Voice Prosody:
- Pitch variation (high variance → excited/anxious)
- Speech rate (fast → energized/anxious, slow → calm/depressed)
- Volume (loud → aroused, quiet → withdrawn)
- Voice quality (shaky → anxious, flat → depressed)
-
Word Choice:
- Negative words ("can't", "hate", "terrible") → Low valence
- Uncertainty words ("maybe", "I don't know") → Low control
- Swearing/strong language → High arousal
- Passive language ("it is", "things are") → Low control
-
Behavior Patterns:
- Task avoidance → Low control, possibly anxious
- Rapid task switching → High arousal, low control
- Long pauses → Low arousal or cognitive load
- Repeated failed attempts → Frustration, low control
-
Physiological (if sensors available):
- Heart rate (high → aroused)
- HRV (low → stressed)
- Skin temperature (high → aroused/anxious)
State Update Algorithm:
class EmotionalTracker:
def __init__(self):
self.state = EmotionalState(
valence=0.0,
arousal=0.5,
control=0.7,
discrete_state="neutral",
confidence=0.5
)
def update_from_voice(self, audio_features: dict):
# Extract emotional cues from voice
pitch_var = audio_features['pitch_variance']
speech_rate = audio_features['speech_rate']
volume = audio_features['volume']
# Update arousal
if speech_rate > 150 or pitch_var > 0.3: # words/min, normalized
self.state.arousal += 0.1
elif speech_rate < 100:
self.state.arousal -= 0.1
# Clamp to valid range
self.state.arousal = clamp(self.state.arousal, 0.0, 1.0)
self.state.confidence = 0.6 # Moderate confidence from voice alone
def update_from_text(self, text: str):
# Sentiment analysis for valence
sentiment = analyze_sentiment(text)
self.state.valence = 0.7 * self.state.valence + 0.3 * sentiment
# Detect control/agency language
control_score = detect_control_language(text)
self.state.control = 0.7 * self.state.control + 0.3 * control_score
self.state.confidence = 0.7 # Good confidence from language
def update_from_behavior(self, behavior: str):
# Behavior patterns affect all dimensions
if behavior == "task_avoidance":
self.state.control -= 0.2
self.state.arousal += 0.1
elif behavior == "rapid_switching":
self.state.arousal += 0.2
self.state.control -= 0.15
self.state.confidence = 0.5 # Moderate confidence from behavior
def get_state(self) -> EmotionalState:
# Update discrete state label
self.state.discrete_state = self.state.as_discrete_state()
self.state.last_updated = now()
return self.state
Validation:
- Human raters label emotional states from conversation samples
- Compare system predictions to human labels (target: 75% agreement)
- Track state changes over time, verify smooth transitions
Traces To: REQ-AI-040 (System SRS)
FRD-AI-EMO-002 [P0]
System shall maintain personalized Trigger Bank for each user.
Functional Specification:
Trigger Types:
- Overload Triggers - Situations that overwhelm the user's PFC
- Avoidance Triggers - Things user tends to procrastinate on
- Fear/Shame Triggers - Topics that cause anxiety or shame
- Activation Triggers - Things that energize and motivate
- Soothing Triggers - Language/approaches that calm the user
Data Model:
class Trigger:
id: str
type: TriggerType # OVERLOAD, AVOIDANCE, FEAR, ACTIVATION, SOOTHING
pattern: str # What to look for (keywords, context)
strength: float # 0.0-1.0, how strong this trigger is
examples: List[str] # Historical examples
created_at: datetime
last_observed: datetime
observation_count: int
class TriggerBank:
triggers: Dict[str, Trigger]
def add_trigger(self, trigger: Trigger):
self.triggers[trigger.id] = trigger
def match_triggers(self, text: str, context: dict) -> List[Trigger]:
matched = []
for trigger in self.triggers.values():
if trigger_matches(trigger, text, context):
matched.append(trigger)
return matched
def update_trigger_strength(self, trigger_id: str, outcome: str):
"""
Update trigger strength based on observed outcome.
Args:
trigger_id: Trigger to update
outcome: 'success', 'partial', 'failure', 'shutdown'
"""
trigger = self.triggers[trigger_id]
if outcome == 'shutdown': # Confirmed shutdown
trigger.strength = min(1.0, trigger.strength + 0.1)
elif outcome == 'success': # User pushed through
trigger.strength = max(0.0, trigger.strength - 0.05)
trigger.observation_count += 1
trigger.last_observed = now()
Example Triggers:
# Overload trigger example
trigger_taxes = Trigger(
id="overload_taxes",
type=TriggerType.OVERLOAD,
pattern="tax|taxes|taxation",
strength=0.8,
examples=[
"User said 'I need to do my taxes' and then avoided for 3 days",
"User showed anxiety when discussing tax deadlines"
]
)
# Avoidance trigger example
trigger_phone_calls = Trigger(
id="avoidance_phone_calls",
type=TriggerType.AVOIDANCE,
pattern="call|phone|speak to|contact",
strength=0.6,
examples=[
"User repeatedly postponed calling landlord",
"User asked to draft email instead of making phone call"
]
)
# Activation trigger example
trigger_music_production = Trigger(
id="activation_music",
type=TriggerType.ACTIVATION,
pattern="music|beat|song|produce",
strength=0.9,
examples=[
"User becomes energized when discussing music projects",
"User stayed focused for 3 hours working on beats"
]
)
Trigger Learning:
- Initialized with default triggers (common stressors)
- Learns from observations over time
- Strength adjusted based on user's actual responses
- Can be manually edited by user ("This doesn't stress me anymore")
Validation:
- Test trigger matching accuracy on example conversations
- Verify strength adjustments improve response appropriateness
- User satisfaction surveys on AI responsiveness
Traces To: REQ-AI-041 (System SRS)
FRD-AI-EMO-003 [P0]
System shall implement adaptive tone based on emotional state.
Functional Specification:
Tone Dimensions:
- Formality: Casual to Formal
- Empathy: Matter-of-fact to Highly Empathetic
- Directness: Indirect/Soft to Direct/Firm
- Verbosity: Terse to Detailed
- Encouragement: Neutral to Highly Encouraging
State-to-Tone Mapping:
| Emotional State | Formality | Empathy | Directness | Verbosity | Encouragement |
|---|---|---|---|---|---|
| Calm/Content | Casual | Moderate | Direct | Medium | Low |
| Excited/Energized | Casual | Low | Direct | Terse | Moderate |
| Anxious/Stressed | Casual | High | Gentle | Short | High |
| Overwhelmed | Casual | Very High | Very Gentle | Minimal | Very High |
| Sad/Withdrawn | Casual | High | Gentle | Short | High |
| Focused | Formal | Low | Direct | Terse | None |
| Tired/Foggy | Casual | Moderate | Gentle | Minimal | Moderate |
Implementation:
def adapt_tone(response: str, emotional_state: EmotionalState) -> str:
"""
Adjust response tone based on user's emotional state.
"""
# Get tone parameters
tone_params = map_state_to_tone(emotional_state)
# Apply tone adjustments
adjusted = response
# Adjust empathy
if tone_params.empathy > 0.7:
adjusted = add_empathy_markers(adjusted)
# "I understand this is difficult..."
# "It's okay to feel overwhelmed..."
# Adjust directness
if tone_params.directness < 0.3: # Very gentle
adjusted = soften_language(adjusted)
# "might want to" instead of "should"
# "could try" instead of "do"
# Adjust verbosity
if tone_params.verbosity < 0.3: # Minimal
adjusted = shorten_response(adjusted, target_sentences=2)
# Adjust encouragement
if tone_params.encouragement > 0.7:
adjusted = add_encouragement(adjusted)
# "You've got this"
# "Small progress is still progress"
return adjusted
Example Transformations:
Original: "To complete your tax return, you need to gather these documents: W-2, 1099s, deduction receipts, and mortgage interest statements. Then..."
For Anxious User: "I know taxes feel overwhelming. Let's take this one tiny step at a time. First, just find your W-2 - that's it. Nothing else right now. Once you have that, we'll move to the next small step."
For Focused User: "Tax documents needed: W-2, 1099s, deductions, mortgage interest. Gather these first, then we'll proceed with entry."
Validation:
- A/B testing: Users rate response appropriateness
- Test tone adaptation on labeled emotional states
- User satisfaction metrics for emotional intelligence
Traces To: REQ-AI-042 (System SRS)
7. Executive Function Framework (EFF)
7.1 Task Decomposition
FRD-AI-EFF-001 [P0]
System shall decompose complex tasks into micro-steps.
Functional Specification:
Decomposition Criteria:
- Each step should take < 5 minutes
- Steps should be atomic (no "and" statements)
- Steps should be concrete (no vague language)
- Steps should be ordered sequentially
- Dependencies should be explicit
Decomposition Algorithm:
def decompose_task(task: str, context: dict) -> List[MicroStep]:
"""
Break complex task into micro-steps.
Returns ordered list of concrete, time-bounded actions.
"""
# Generate decomposition prompt
decomposition_prompt = f"""
Task: {task}
Context: {context}
Break this into the smallest possible steps.
Each step must be:
- Completable in under 5 minutes
- One single action (no "and")
- Concrete and specific
Format each step as:
1. [Action] (estimated time)
Example:
Task: "Write project proposal"
Steps:
1. Open new document (30 seconds)
2. Write project title at top (1 min)
3. Write one-sentence project summary (2 min)
4. List 3 main goals in bullet points (3 min)
5. ... (continue)
Now decompose the task above.
"""
# Generate decomposition
decomposition = llm.generate(decomposition_prompt)
# Parse into structured steps
steps = parse_steps(decomposition)
# Validate steps
validated_steps = []
for step in steps:
if validate_step(step):
validated_steps.append(step)
else:
# Step too complex, further decompose
sub_steps = decompose_task(step.description, context)
validated_steps.extend(sub_steps)
return validated_steps
class MicroStep:
description: str
estimated_time_minutes: int
dependencies: List[int] # Indices of previous steps required
optional: bool
difficulty: float # 0.0-1.0
Step Presentation:
- Show only current step, not entire list (reduces overwhelm)
- After completing step, brief celebration + show next step
- Track progress visually (e.g., "3/12 steps done")
Example:
Input: "I need to apply for a job"
Output:
Step 1: Open job listing page (30 seconds)
Step 2: Read job description once through (2 minutes)
Step 3: Copy job title and company name to notes (1 minute)
Step 4: List 3 skills from job description (2 minutes)
Step 5: Open resume file (30 seconds)
... (continue)
Validation:
- Test decomposition on various complex tasks
- User testing: Can naive users follow steps successfully?
- Measure task completion rate with vs. without decomposition
Traces To: REQ-AI-050 (System SRS)
FRD-AI-EFF-002 [P0]
System shall implement prioritization engine for task ordering.
Functional Specification:
Prioritization Factors:
| Factor | Weight | Description |
|---|---|---|
| Deadline Urgency | 0.35 | How soon is deadline? |
| Importance | 0.25 | Impact on goals/life |
| Cognitive Load | 0.20 | Mental effort required |
| Dependencies | 0.15 | Blocking other tasks? |
| Emotional State | 0.05 | User's current state |
Priority Calculation:
def calculate_priority(task: Task, user_state: UserState) -> float:
"""
Calculate task priority (0.0-1.0, higher = more urgent).
"""
# Factor 1: Deadline urgency
if task.deadline:
hours_until = (task.deadline - now()).total_hours()
if hours_until < 24:
urgency_score = 1.0
elif hours_until < 72:
urgency_score = 0.7
elif hours_until < 168: # 1 week
urgency_score = 0.4
else:
urgency_score = 0.1
else:
urgency_score = 0.0
# Factor 2: Importance (user-defined or inferred)
importance_score = task.importance # 0.0-1.0
# Factor 3: Cognitive load (inverse - easier tasks prioritized when tired)
load = estimate_cognitive_load(task, user_state)
if user_state.energy_level < 0.4: # User tired
cognitive_score = 1.0 - load.total # Prioritize easy tasks
else: # User energized
cognitive_score = 0.5 # Neutral factor
# Factor 4: Dependencies (blocks other tasks?)
dependency_score = 0.0
if task.blocks_other_tasks:
dependency_score = 0.8
# Factor 5: Emotional state match
emotional_score = 0.5
if user_state.discrete_state == "anxious" and task.requires_calm:
emotional_score = 0.1 # Deprioritize anxiety-inducing tasks
elif user_state.discrete_state == "focused" and task.requires_focus:
emotional_score = 0.9 # Prioritize focus-heavy tasks
# Weighted sum
priority = (
urgency_score * 0.35 +
importance_score * 0.25 +
cognitive_score * 0.20 +
dependency_score * 0.15 +
emotional_score * 0.05
)
return priority
Task Ordering Strategies:
1. Energy-Based Ordering:
- High energy: Tackle hard, important tasks first
- Low energy: Start with easy wins to build momentum
2. Time-Based Ordering:
- Deadline-driven: Sort by urgency
- Time-available: Fit tasks into available time slots
3. Mood-Based Ordering:
- Anxious: Start with calming, predictable tasks
- Bored: Start with engaging, novel tasks
- Focused: Tackle deep work
Recommendation Format:
def recommend_next_task(tasks: List[Task], user_state: UserState) -> TaskRecommendation:
"""
Recommend best next task for user given current state.
"""
# Calculate priorities
task_priorities = [
(task, calculate_priority(task, user_state))
for task in tasks
]
# Sort by priority
task_priorities.sort(key=lambda x: x[1], reverse=True)
# Top recommendation
best_task, best_priority = task_priorities[0]
# Alternative recommendations
alternatives = task_priorities[1:4] # Next 3
return TaskRecommendation(
primary=best_task,
priority_score=best_priority,
reasoning=explain_priority(best_task, user_state),
alternatives=[t for t, _ in alternatives]
)
Validation:
- Test prioritization on diverse task lists
- User studies: Does AI prioritization match user preferences?
- Task completion rate with AI recommendations
Traces To: REQ-AI-050 (System SRS)
FRD-AI-EFF-003 [P0]
System shall implement gating to prevent harmful impulsive decisions.
Functional Specification:
Gating Triggers:
- User Exhausted: Energy level < 0.2
- Emotionally Unstable: High arousal + low valence + low control
- High-Risk Decision: Financial, health, relationship, legal
- Conflicting Goals: Decision contradicts stated goals
- Unusual Behavior: Out-of-character request
Gating Actions:
Level 1: Soft Gate (Remind & Pause)
- "This seems important. Want to sleep on it?"
- "You mentioned wanting to save money - sure about this purchase?"
- Continue if user confirms
Level 2: Hard Gate (Require Confirmation)
- "This decision could have major consequences. Let's think through it."
- Run mini-simulation showing outcomes
- Require explicit "yes, proceed" confirmation
Level 3: Block (Refuse Action)
- "I can't help with this right now - you're too stressed. Let's revisit when you're calmer."
- Offer to set reminder for later
- Log attempt for later review
Implementation:
def gate_decision(decision: Decision, user_state: UserState) -> GatingResult:
"""
Determine if decision should be gated.
"""
# Check gating triggers
triggers = []
# Trigger 1: User exhausted
if user_state.energy_level < 0.2:
triggers.append(GatingTrigger(
type="exhaustion",
severity="medium",
message="You seem tired - major decisions are best made when rested"
))
# Trigger 2: Emotionally unstable
if (user_state.arousal > 0.7 and
user_state.valence < -0.3 and
user_state.control < 0.3):
triggers.append(GatingTrigger(
type="emotional_instability",
severity="high",
message="You seem overwhelmed - let's wait until you feel calmer"
))
# Trigger 3: High-risk decision
if decision.category in ["financial", "health", "relationship", "legal"]:
risk_level = assess_risk(decision)
if risk_level > 0.7:
triggers.append(GatingTrigger(
type="high_risk",
severity="high",
message="This decision has major consequences - let's think it through carefully"
))
# Determine gating level
if not triggers:
return GatingResult(action="proceed", triggers=[])
max_severity = max(t.severity for t in triggers)
if max_severity == "low":
return GatingResult(action="soft_gate", triggers=triggers)
elif max_severity == "medium":
return GatingResult(action="hard_gate", triggers=triggers)
else: # high
return GatingResult(action="block", triggers=triggers)
User Override:
- User can override gates (except critical safety ones)
- Overrides are logged
- Repeated overrides → adjust user profile (maybe they know what they're doing)
Example Scenarios:
Scenario 1: Late-night online shopping
User (11pm, aroused, tired): "Add this $500 gadget to cart"
System: *Soft Gate* "It's late and you seem tired. This is a significant purchase - want to wait until morning to decide?"
User: "Yeah, good call"
System: "I'll remind you tomorrow to reconsider if you're still interested"
Scenario 2: Angry email
User (angry, aroused): "Send this email to my boss: [angry rant]"
System: *Hard Gate* "I notice you're upset. This email could damage your relationship with your boss. Can we draft it but wait 30 minutes before sending?"
User: "Fine"
System: [Saves draft, sets 30-min timer]
Scenario 3: Critical medical decision
User (anxious, confused): "Should I stop taking my medication?"
System: *Block* "This is a medical decision I can't advise on. Please consult your doctor before making any changes to medication."
Validation:
- Simulate test scenarios with gating triggers
- Measure false positive rate (unnecessary gates)
- User studies: Do users find gates helpful or annoying?
Traces To: REQ-AI-051 (System SRS)
7.2 Habit Formation
FRD-AI-EFF-010 [P1]
System shall support habit formation through micro-routines.
Functional Specification:
Habit Building Strategy:
Phase 1: Ultra-Small Start (Days 1-7)
- Goal: 5 minutes daily
- Focus: Consistency over performance
- Reward: Celebration for showing up
Phase 2: Gradual Increase (Days 8-30)
- Goal: Slowly increase difficulty
- Add 1-2 minutes every 3 days
- Focus: Sustainable growth
Phase 3: Integration (Days 31+)
- Goal: Make it automatic
- Link to existing routines
- Reduce explicit reminders
Micro-Routine Structure:
class MicroRoutine:
name: str
description: str
duration_minutes: int
frequency: str # "daily", "3x_week", etc.
cue: str # Trigger (time, event, location)
steps: List[str]
difficulty: float # 0.0-1.0, adjusted over time
streak: int # Days in a row completed
total_completions: int
class HabitBuilder:
def create_habit(self, goal: str, user_state: UserState) -> MicroRoutine:
"""
Create ultra-small starting routine for goal.
"""
# Start absurdly small
if "exercise" in goal.lower():
return MicroRoutine(
name="Morning Movement",
description="Just 5 minutes of gentle movement",
duration_minutes=5,
frequency="daily",
cue="right_after_waking_up",
steps=[
"Put on comfortable clothes",
"Stand up and stretch arms overhead 3 times",
"Walk around room for 2 minutes",
"Celebrate with fist pump"
],
difficulty=0.2,
streak=0
)
# Similar ultra-small starts for other goals
...
def adapt_difficulty(self, routine: MicroRoutine, completion_rate: float):
"""
Adjust difficulty based on user's adherence.
"""
if completion_rate > 0.9 and routine.streak > 7:
# User consistently succeeding - increase challenge
routine.difficulty += 0.05
routine.duration_minutes += 1
elif completion_rate < 0.5:
# User struggling - make easier
routine.difficulty = max(0.1, routine.difficulty - 0.1)
routine.duration_minutes = max(5, routine.duration_minutes - 1)
Hedonic Treadmill Management:
- Users adapt to current difficulty
- Slowly raise baseline to prevent boredom
- Never jump too big (max 10% increase at a time)
- Regression is okay - adjust down if life gets busy
Reward System:
def celebrate_completion(routine: MicroRoutine, completion: Completion):
"""
Provide positive reinforcement.
"""
messages = []
# Base celebration
messages.append("Great job! You did it! 🎉")
# Streak milestones
if routine.streak == 7:
messages.append("That's a full week! You're building real momentum!")
elif routine.streak == 30:
messages.append("30 days! This is becoming a real habit!")
elif routine.streak == 100:
messages.append("100 days!! You're unstoppable!")
# Progress recognition
if routine.total_completions % 10 == 0:
messages.append(f"That's {routine.total_completions} times you've shown up. Look at that consistency!")
return messages
Validation:
- 30-day habit formation trials
- Measure completion rate over time
- Compare to control group (no AI assistance)
Traces To: REQ-AI-052 (System SRS)
8. Tool Calling and Skills System
8.1 Function Calling
FRD-AI-TOOL-001 [P0]
System shall implement function calling for tool execution.
Functional Specification:
Tool Registry:
class Tool:
name: str
description: str
parameters: Dict[str, Parameter]
returns: str
safety_level: int # 0=safe, 1=needs_confirmation, 2=sensitive
class Parameter:
name: str
type: str # "string", "number", "boolean", "object", "array"
description: str
required: bool
enum: List[str] # Optional: allowed values
# Example tools
tool_registry = [
Tool(
name="get_weather",
description="Get current weather for a location",
parameters={
"location": Parameter(
name="location",
type="string",
description="City name or zip code",
required=True
)
},
returns="Weather data including temperature, conditions, forecast",
safety_level=0
),
Tool(
name="send_email",
description="Send an email to recipient",
parameters={
"to": Parameter(name="to", type="string", required=True),
"subject": Parameter(name="subject", type="string", required=True),
"body": Parameter(name="body", type="string", required=True)
},
returns="Confirmation message",
safety_level=1 # Requires confirmation
),
Tool(
name="create_calendar_event",
description="Add event to user's calendar",
parameters={
"title": Parameter(name="title", type="string", required=True),
"start_time": Parameter(name="start_time", type="string", required=True),
"duration_minutes": Parameter(name="duration_minutes", type="number", required=False)
},
returns="Event confirmation",
safety_level=0
),
]
Function Calling Flow:
- Intent Detection: User request suggests tool use
- Tool Selection: LLM selects appropriate tool from registry
- Parameter Extraction: LLM extracts parameters from user input
- Validation: Check parameters are valid and complete
- Safety Check: If safety_level > 0, ask user confirmation
- Execution: Call tool with parameters
- Result Processing: Integrate tool output into response
Implementation:
def execute_tool_call(tool_name: str, parameters: dict, user_state: UserState) -> ToolResult:
"""
Execute tool call with safety checks.
"""
# Get tool from registry
tool = tool_registry.get(tool_name)
if not tool:
return ToolResult(success=False, error="Tool not found")
# Validate parameters
validation = validate_parameters(parameters, tool.parameters)
if not validation.valid:
return ToolResult(success=False, error=validation.error)
# Safety check
if tool.safety_level > 0:
confirmation = get_user_confirmation(tool, parameters)
if not confirmation.approved:
return ToolResult(success=False, error="User declined", canceled=True)
# Execute tool
try:
result = tool.execute(parameters)
return ToolResult(success=True, data=result)
except Exception as e:
return ToolResult(success=False, error=str(e))
LLM Function Calling Prompt:
Available tools:
{tool_registry_formatted}
User request: "{user_query}"
If this request requires a tool, respond with:
TOOL_CALL: {
"tool": "tool_name",
"parameters": {
"param1": "value1",
...
},
"reasoning": "why this tool is needed"
}
If no tool is needed, respond normally.
Validation:
- Test tool selection accuracy on diverse queries
- Verify parameter extraction correctness
- Test safety gates for sensitive tools
- Measure tool execution success rate
Traces To: [System SRS - Tool Calling], REQ-INT-010
FRD-AI-TOOL-002 [P0]
System shall implement simulate → confirm → execute pattern for critical tools.
Functional Specification:
Purpose: Prevent accidental irreversible actions.
Pattern:
Step 1: Simulate
- Show user what WOULD happen
- Don't actually execute
- Display expected outcome
Step 2: Confirm
- Ask explicit "Yes, do this" or "No, cancel"
- Show summary again
- Timeout after 30 seconds (default to cancel)
Step 3: Execute
- Only if user confirms
- Execute actual action
- Show confirmation message
Example:
User: "Send email to boss saying I quit"
AI:
I can draft that email, but I want to make sure this is what you want.
Simulated email:
To: boss@company.com
Subject: Resignation
Body: I am resigning from my position...
This is a significant decision. Are you sure you want to send this?
- Yes, send it
- No, cancel (or let me revise)
Tools Requiring Confirmation:
- send_email
- delete_file
- send_message
- make_phone_call
- post_to_social_media
- financial_transaction
- change_settings (if security-sensitive)
Implementation:
def execute_with_confirmation(tool: Tool, parameters: dict) -> ToolResult:
"""
Simulate → Confirm → Execute pattern.
"""
# Step 1: Simulate
simulation = tool.simulate(parameters)
# Step 2: Show to user and get confirmation
confirmation_prompt = f"""
I'm about to {tool.description}.
Simulated result:
{format_simulation(simulation)}
This action {tool.consequences_description}.
Confirm:
- Say "yes" or "confirm" to proceed
- Say "no" or "cancel" to stop
- (Automatically cancels in 30 seconds if no response)
"""
response = wait_for_user_response(timeout=30)
if response in ["yes", "confirm", "do it", "proceed"]:
# Step 3: Execute
result = tool.execute(parameters)
return ToolResult(success=True, data=result)
else:
return ToolResult(success=False, canceled=True, message="Action canceled")
Validation:
- Test confirmation flow for all critical tools
- Verify timeout behaves correctly
- User acceptance: Do confirmations feel appropriate or annoying?
Traces To: [System SRS - Tool Calling]
8.2 Skills System
FRD-AI-SKILL-001 [P0]
System shall support third-party skill installation and management.
Functional Specification:
Skill Structure:
/data/klyra/skills/{skill_id}/
manifest.json # Metadata and permissions
handler.py # Skill logic (Python)
ui.hud # Optional HUD interface definition
assets/ # Images, icons, data files
README.md # Documentation
Manifest Format:
{
"skill_id": "measure_tool",
"name": "AR Measurement Tool",
"version": "1.0.0",
"author": "ThirdPartyDev",
"description": "Measure real-world objects using AR and depth sensors",
"category": "productivity",
"permissions": [
"camera",
"tof_sensor",
"lidar",
"hud_display"
],
"entry_point": "handler.py:MeasureSkill",
"activation": {
"voice_commands": ["measure", "how long is", "measure distance"],
"gesture": "two_finger_tap"
},
"dependencies": {
"python": " >= 3.9",
"libraries": ["numpy", "opencv-python"]
}
}
Skill Lifecycle:
-
Installation:
- User browses skill store
- Downloads skill package (.skillpkg file)
- Reviews permissions
- Confirms installation
- Skill extracted to /data/klyra/skills/
-
Activation:
- User triggers via voice command or gesture
- System launches skill handler
- Skill has access to granted permissions
-
Execution:
- Skill receives sensor data, context
- Processes data
- Returns results or updates HUD
-
Deactivation:
- User exits skill
- System cleanup
-
Update:
- Automatic update checks (daily)
- User approves updates
- Seamless replacement
-
Uninstallation:
- User removes skill
- All data deleted (except user data if requested to keep)
Skill API:
from klyra_sdk import Skill, HUD, Sensor
class MeasureSkill(Skill):
def on_start(self):
"""Called when skill is activated."""
self.hud.display("Point at object to measure")
self.tof_sensor = Sensor.get("tof")
self.camera = Sensor.get("camera")
def on_sensor_data(self, sensor_name, data):
"""Called when sensor has new data."""
if sensor_name == "tof":
distance = data['distance']
self.hud.display(f"Distance: {distance:.2f} meters")
def on_voice_command(self, command):
"""Called when user speaks while skill active."""
if "save" in command.lower():
self.save_measurement()
def on_stop(self):
"""Called when skill is deactivated."""
self.cleanup()
Security Sandboxing:
- Skills run in isolated process
- Limited filesystem access (only skill's own directory + shared user data with permission)
- Network access requires explicit permission
- Cannot access other skills' data
- CPU/memory limits enforced
Validation:
- Install and run test skills
- Verify permission enforcement
- Test skill updates
- Security audit of sandboxing
Traces To: REQ-INT-010, [System SRS - Developer SDK]
9. Memory Architecture
9.1 Multi-Tier Memory
FRD-AI-MEM-001 [P0]
System shall implement five-tier memory architecture.
Functional Specification:
Tier 1: Short-Term Memory (Hours)
- Storage: In-memory buffer (RAM)
- Capacity: Last 50-100 conversation turns
- Purpose: Active working memory for current conversation
- Retention: Cleared after 24 hours of inactivity
Tier 2: Mid-Term Memory (Weeks)
- Storage: SQLite database
- Capacity: Recent tasks, reminders, temporary notes
- Purpose: Ongoing projects and short-term context
- Retention: 30 days, then archived or pruned
Tier 3: Long-Term Memory (Life History)
- Storage: Encrypted SQLite + FAISS vector store
- Capacity: User's life history, preferences, important moments
- Purpose: Deep personal context, relationships, major events
- Retention: Indefinite (user can delete)
Tier 4: Domain Stores (Specialized)
- Storage: Separate vector stores per domain
- Capacity: Domain-specific knowledge and documents
- Purpose: Isolated storage for Finance, NDIS, Work, Health, etc.
- Retention: Per-domain settings
Tier 5: Immutable Core (Identity)
- Storage: Secure encrypted storage
- Capacity: User identity, values, moral rules
- Purpose: Core identity that never changes without explicit user action
- Retention: Permanent (cannot be accidentally deleted)
Memory Promotion:
def promote_memory(item: MemoryItem, from_tier: int, to_tier: int):
"""
Promote memory item to higher tier based on importance.
"""
# Calculate importance score
importance = calculate_importance(item)
# Criteria for promotion
if from_tier == 1 and to_tier == 2:
# Short-term to mid-term
if (importance > 0.6 or
item.user_marked_important or
item.reference_count > 3):
move_to_tier2(item)
elif from_tier == 2 and to_tier == 3:
# Mid-term to long-term
if (importance > 0.8 or
item.age_days > 30 and item.reference_count > 5 or
item.emotional_significance > 0.7):
move_to_tier3(item)
Memory Garbage Collection:
- Tier 1: Automatic (oldest turns dropped when buffer full)
- Tier 2: Weekly cleanup (items > 30 days old archived or deleted)
- Tier 3: User-controlled (never auto-delete)
Validation:
- Test memory promotion logic
- Verify no unintentional data loss
- Measure memory usage across tiers
Traces To: [System SRS - Memory Architecture]
FRD-AI-MEM-002 [P0]
System shall implement versioned embeddings for forward compatibility.
Functional Specification:
Problem: Embedding models improve over time. Old embeddings become incompatible.
Solution: Version all embeddings and support migration.
Data Model:
class EmbeddingVersion:
version_id: int
model_name: str
dimensions: int
created_at: datetime
class DocumentEmbedding:
document_id: int
embedding_version_id: int
embedding_data: np.array
Migration Strategy:
Option 1: Lazy Migration
- Keep old embeddings until document is accessed
- On access, re-embed with new model
- Gradual migration over time
Option 2: Background Migration
- Schedule background job to re-embed all documents
- Progress tracking
- User can pause/resume
Implementation:
class EmbeddingManager:
def __init__(self):
self.current_version = self.get_current_version()
self.models = {
1: load_model("all-MiniLM-L6-v2"),
2: load_model("all-mpnet-base-v2"), # Future upgrade
}
def embed(self, text: str) -> np.array:
"""Embed with current version."""
model = self.models[self.current_version.version_id]
return model.encode(text)
def search(self, query: str, domain: str) -> List[Chunk]:
"""Search with automatic version handling."""
query_embedding = self.embed(query)
# Search only documents with matching embedding version
results = self.faiss_search(
query_embedding,
domain=domain,
embedding_version=self.current_version.version_id
)
# If few results and old version exists, include old version results
if len(results) < 5:
old_results = self.search_old_version(query, domain)
results.extend(old_results)
return results
def migrate_embeddings(self, from_version: int, to_version: int):
"""Background migration task."""
documents = get_documents_with_version(from_version)
total = len(documents)
for i, doc in enumerate(documents):
new_embedding = self.embed_with_version(doc.content, to_version)
update_embedding(doc.id, new_embedding, to_version)
if i % 100 == 0:
log_progress(i, total)
yield i / total # Progress percentage
Validation:
- Test migration from v1 to v2 embeddings
- Verify search works during migration
- Measure migration performance
Traces To: REQ-AI-021 (System SRS)
10. Anticipation Layer (Pre-Thought System)
FRD-AI-ANTICIP-001 [P1]
System shall implement predictive assistance based on context.
Functional Specification:
Anticipation Triggers:
- Time-based: User routines at specific times
- Location-based: Entering known locations (home, work, gym)
- Event-based: Calendar events approaching
- Pattern-based: Repeated behaviors (e.g., always asks weather in morning)
- Context-based: Sensor data indicating activity (walking, driving)
Anticipation Examples:
Scenario 1: Morning Routine
- Trigger: 7:00 AM, user wakes up (detected by movement)
- Pre-thought: Load weather, news, calendar for today
- Action: "Good morning! It's 72°F and sunny. You have a 10 AM meeting with Sarah."
Scenario 2: Leaving Work
- Trigger: 5:30 PM, user leaves office (location detected)
- Pre-thought: Load traffic to home, check grocery list
- Action: "Traffic home is light, 15 minutes. Want me to remind you to stop for milk?"
Scenario 3: Grocery Store
- Trigger: User enters grocery store (location + context)
- Pre-thought: Load grocery list, activate Identify & Profit scanner
- Action: Display grocery list on HUD, ready to scan items
Implementation:
class AnticipationEngine:
def __init__(self):
self.patterns = load_user_patterns()
self.predictions = []
def observe_context(self, context: Context):
"""
Observe current context and generate predictions.
"""
predictions = []
# Time-based predictions
current_time = now()
if current_time.hour == 7 and current_time.minute < 15:
predictions.append(Prediction(
action="load_morning_brief",
confidence=0.9,
data=self.generate_morning_brief()
))
# Location-based predictions
if context.location == "grocery_store":
predictions.append(Prediction(
action="show_grocery_list",
confidence=0.95,
data=load_grocery_list()
))
# Pattern-based predictions
if self.user_usually_does(action="check_weather", at_time=current_time):
predictions.append(Prediction(
action="fetch_weather",
confidence=0.8,
data=fetch_weather(context.location)
))
# Store predictions for quick access
self.predictions = predictions
# Pre-load high-confidence predictions
for pred in predictions:
if pred.confidence > 0.8:
pred.preload()
def suggest_actions(self) -> List[Suggestion]:
"""
Suggest actions user might want to take.
"""
suggestions = []
for pred in self.predictions:
if pred.confidence > 0.7:
suggestions.append(Suggestion(
description=pred.description,
action=pred.action,
confidence=pred.confidence
))
return suggestions
User Control:
- Anticipation is SUGGESTIVE, not AUTOMATIC (except pre-loading)
- User can disable specific anticipations
- User can adjust confidence threshold (more or fewer suggestions)
- All anticipations logged for transparency
Validation:
- Test prediction accuracy on user routines
- Measure user acceptance rate of suggestions
- Track false positive rate (unhelpful suggestions)
Traces To: [System SRS - Anticipation Layer]
11. Integration and Dependencies
11.1 Hardware Dependencies
FRD-AI-HW-001 [P0]
AI System shall gracefully degrade if hardware sensors unavailable.
Functional Specification:
Sensor Dependencies:
- IMU: Required for gesture detection, head tracking
- ToF/LiDAR: Required for obstacle detection, walking assistance
- Camera: Required for OCR, object detection, photography
- Microphone: Required for voice input
- Speaker/Bone Conduction: Required for audio output
Degradation Strategy:
def check_hardware_availability() -> HardwareStatus:
"""Check which hardware components are available."""
status = HardwareStatus()
status.imu = test_sensor('imu')
status.tof = test_sensor('tof')
status.lidar = test_sensor('lidar')
status.camera = test_sensor('camera')
status.microphone = test_sensor('microphone')
status.speaker = test_sensor('speaker')
return status
def adapt_ai_capabilities(hw_status: HardwareStatus):
"""Disable features that require missing hardware."""
if not hw_status.microphone:
disable_feature('voice_input')
enable_alternative('text_input_via_companion_app')
if not hw_status.camera:
disable_feature('ocr')
disable_feature('object_detection')
disable_feature('photography')
if not hw_status.tof and not hw_status.lidar:
disable_feature('obstacle_detection')
disable_feature('walking_assist')
warn_user("Safety features limited")
if not hw_status.speaker:
enable_alternative('visual_only_mode')
Validation:
- Test with each sensor disabled
- Verify appropriate features disabled
- Confirm alternative input/output methods work
Traces To: [Hardware Requirements Document]
11.2 Performance Dependencies
FRD-AI-PERF-001 [P0]
AI System shall adapt to available compute resources.
Functional Specification:
Resource Monitoring:
- CPU usage
- GPU/NPU availability and usage
- RAM available
- Battery level
- Temperature
Adaptation Strategy:
class ResourceAdaptiveAI:
def select_model_config(self) -> ModelConfig:
"""Select AI model based on available resources."""
cpu_usage = get_cpu_usage()
ram_available = get_ram_available()
battery_level = get_battery_level()
temperature = get_temperature()
# Emergency mode: Critical resources
if battery_level < 10 or temperature > 48:
return ModelConfig(
model_size="1B",
quantization="4bit",
max_tokens=256,
reasoning="emergency_low_power"
)
# Low power mode: Limited resources
elif battery_level < 20 or temperature > 45:
return ModelConfig(
model_size="3B",
quantization="4bit",
max_tokens=512,
reasoning="power_saving"
)
# Normal mode: Adequate resources
elif ram_available > 3000 and battery_level > 40:
return ModelConfig(
model_size="7B",
quantization="4bit",
max_tokens=2048,
reasoning="full_capability"
)
# Default: Balanced mode
else:
return ModelConfig(
model_size="4B",
quantization="4bit",
max_tokens=1024,
reasoning="balanced"
)
Validation:
- Test model selection under various resource conditions
- Verify performance remains acceptable in low-power mode
- Measure battery life extension from adaptive strategies
Traces To: REQ-PERF-001, REQ-PERF-002 (System SRS)
12. Testing and Validation
12.1 Unit Testing
FRD-AI-TEST-001 [P0]
All AI system modules shall have unit tests with > 80% code coverage.
Test Categories:
- Function correctness: Does each function produce expected outputs?
- Edge cases: Handles invalid inputs gracefully
- Performance: Meets latency and throughput requirements
- Resource usage: Doesn't exceed memory or CPU budgets
Example Test:
def test_task_decomposition():
"""Test task decomposition produces valid micro-steps."""
task = "Write a research paper on AI safety"
steps = decompose_task(task, context={})
# Assert: Multiple steps generated
assert len(steps) > 5
# Assert: Each step under 5 minutes
for step in steps:
assert step.estimated_time_minutes <= 5
# Assert: Steps are ordered
for i, step in enumerate(steps):
if step.dependencies:
for dep in step.dependencies:
assert dep < i # Dependency comes before
# Assert: Steps are concrete (no vague language)
vague_words = ["think about", "consider", "maybe"]
for step in steps:
for word in vague_words:
assert word not in step.description.lower()
12.2 Integration Testing
FRD-AI-TEST-010 [P0]
AI system shall have end-to-end integration tests for critical paths.
Critical Paths:
- Voice query → Response (full pipeline)
- Document upload → RAG retrieval → Response
- Emotional state detection → Tone adaptation
- Task decomposition → Prioritization → Execution
- Tool calling → Confirmation → Execution
- Multi-turn conversation with context management
12.3 Performance Testing
FRD-AI-TEST-020 [P0]
AI system shall meet performance benchmarks under load.
Benchmarks:
- Inference latency: P50 < 150ms, P95 < 200ms
- RAG retrieval: < 100ms
- Token throughput: > 10 tokens/sec (3B model)
- Memory usage: < 5 GB for 7-8B models
- Battery runtime: 5.5-8 hours under real usage
12.4 User Acceptance Testing
FRD-AI-TEST-030 [P0]
AI system shall be validated with target user personas.
Test Personas:
- Sarah (accessibility user)
- Marcus (tech enthusiast)
- Jordan (enterprise professional)
- Alex (active lifestyle)
Test Scenarios:
- Daily use cases for each persona
- Edge cases (confusion, errors, misunderstandings)
- Stress tests (overwhelming tasks, emotional distress)
Success Metrics:
- Task completion rate > 80%
- User satisfaction > 4.5/5
- NPS > 50
13. Appendices
13.1 Traceability Matrix
| FRD Requirement | System SRS | Hardware Req | Test Case |
|---|---|---|---|
| FRD-AI-LLM-001 | REQ-AI-001 | REQ-HW-150 | TC-AI-LLM-001 |
| FRD-AI-RAG-001 | REQ-AI-020 | REQ-HW-150 | TC-AI-RAG-001 |
| FRD-AI-CRK-001 | REQ-AI-030 | - | TC-AI-CRK-001 |
| FRD-AI-EMO-001 | REQ-AI-040 | REQ-HW-130 | TC-AI-EMO-001 |
| FRD-AI-EFF-001 | REQ-AI-050 | - | TC-AI-EFF-001 |
(Full matrix maintained separately)
13.2 Glossary
| Term | Definition |
|---|---|
| CRK | Critical Reasoning Kernel - anti-hallucination system |
| EFF | Executive Function Framework - task management system |
| PFC | Prefrontal Cortex Load - measure of cognitive difficulty |
| RAG | Retrieval-Augmented Generation - knowledge retrieval |
| LLM | Large Language Model |
| GGUF | GPT-Generated Unified Format for quantized models |
| Quantization | Reducing model precision (e.g., 4-bit) for efficiency |
| Micro-step | Smallest possible task unit ( < 5 minutes) |
| Trigger | Emotional/behavioral pattern that affects user state |
| Domain | Isolated memory space (Finance, Work, Health, etc.) |
13.3 References
Standards:
- ISO/IEC 25010: Software quality model
Internal Documents:
- Master PRD
- System SRS
- Hardware Requirements Document
- GROOT FORCE Master File Volumes 1-8
External Resources:
- Llama.cpp documentation
- FAISS documentation
- Sentence Transformers documentation
Document Approval
Approved by:
- AI/ML Lead: _________________ Date: _______
- Software Architect: _________________ Date: _______
- Security Lead: _________________ Date: _______
END OF FRD: CORE AI SYSTEM
This FRD defines the detailed functional requirements for the AI brain of GROOT FORCE. Implementation teams use this as the specification for building the intelligence that makes GROOT FORCE unique - a human-bound, emotionally intelligent, privacy-first AI assistant.