Pranab Sarkar
AI Researcher & Senior Software Engineer
Building intelligent systems at the intersection of industry-scale engineering and AI research.
About
I'm a Senior Software Developer at Walmart Global Tech with 15+ years of experience building scalable, production-grade systems. My work sits at the intersection of industry-scale engineering and AI research.
Beyond production systems, I research and publish on AI infrastructure — exploring ideas like persistent KV caching for tool-augmented LLMs, schema compression for tool use, and cognitive memory architectures for AI agents. I also build open-source MCP servers that enable multi-agent AI workflows.
Educated at Techno India (2006–2010). Recognized as a GitHub Copilot Champion. Provisional patent holder.
Research & Publications
Preprints on LLM optimization, tool use, and AI infrastructure
ContextCache
Persistent KV Cache with Content-Hash Addressing for Zero-Degradation Tool Schema Caching
A persistent KV cache system that accelerates tool-augmented LLM inference by caching prefilled key-value states of tool schema prefixes with SHA-256 content-hash addressing. On cache hits, only the user query requires prefilling, reducing time-to-first-token by 6.9x (787ms to 114ms) with zero quality degradation.
ToolFormerMicro
Composable Tool Schema Compression via Gated Cross-Attention
A ~428M parameter encoder-decoder model that compresses verbose tool schemas into compact 8-token gist vectors via gated cross-attention. Achieves 0.818 Tool Selection Accuracy with zero false positives across seen, held-out, and unseen tool splits.
YantrikDB
A Cognitive Memory Engine for Persistent AI Systems
An embedded cognitive memory engine that unifies five index types — vector (HNSW), knowledge graph, temporal, decay heap, and key-value — within a single embedded database. Implements multi-signal retrieval scoring with relevance-gated importance amplification. 16,000 lines of Rust.
SDF
Convert Once, Consume Many: Cacheable, Typed Semantic Extraction from Web Pages
An open, schema-validated JSON protocol for publishing pre-extracted, agent-oriented semantic representations of web content. A fine-tuned 1.5B + 3B pipeline achieves 4.1x latency reduction versus a 14B baseline with 90% exact extraction accuracy.
Skill as Memory
Skill as Memory, Not Document: A Database-Native Substrate for Agent Skill Catalogs
Current LLM agent skill systems store skills as documents (SKILL.md files, Python source) optimized for human editorial workflow, collapsing authoring format, retrieval metadata, and runtime body into one artifact. This work reframes the shift as 'skill as memory, not document' — a database-native substrate where skills are written by agents at runtime, retrieved by relevance at inference, and scored over agent-emitted outcome events.
Tier-Based Tool Routing
Adapt the Interface, Not the Model: Tier-Based Tool Routing for AI Agents
Tool selection accuracy depends more on how tools are presented than on model size. A 1.5B model achieves 89% within-family accuracy — the bottleneck is finding the right tool neighborhood, not picking from it. Evaluated across 1,000+ native tool calls with 80 tools and 4 model sizes (1.5B–35B).
Memory-Continuous Architecture
Decoupling Deployment Boundaries from Memory Boundaries in Function Composition
Introduces Memory-Continuous Architecture (MCA), a runtime model that decouples deployment boundaries from memory boundaries in function composition. KubeFn implements MCA across JVM, Python, and Node.js runtimes, demonstrating 4–100x latency reduction in full HTTP benchmarks.
Open Source
MCP servers and developer tools for AI agent workflows
YantrikDB
Cognitive memory database for AI agents — consolidates duplicate memories, detects contradictions, and fades stale ones via temporal decay. A Rust engine that ships as a library, MCP server, and HTTP cluster.
- Temporal decay + contradiction detection
- Library, MCP server, and HTTP cluster
YantrikDB Hermes Plugin
YantrikDB memory provider for Nous Research's Hermes agent — self-maintaining memory with canonicalization, contradiction tracking, recency ranking, and explainable recall.
- Self-maintaining memory + contradiction tracking
- Explainable recall for Hermes agents
ContextCache
Persistent KV cache with content-hash addressing for tool-augmented LLMs. Caches prefilled tool-schema key-value states for 6.9x faster time-to-first-token with zero quality degradation.
- 6.9x TTFT speedup on cache hits
- SHA-256 content-hash addressing
Cognitive Memory Engine with Instinct-Driven Proactive Behavior, Unified In-Process Companion Runtime, and Contradiction-Aware Adaptive Retrieval
63/991,357
SARKAR-2026-001
February 26, 2026
Pranab Sarkar
Get in touch
Let's Connect
Interested in research collaboration, open-source contributions, or discussing AI infrastructure? I'd love to hear from you.
mail@pranab.co.in