Senior Software Developer @ Walmart Global Tech

Pranab Sarkar

AI Researcher & Senior Software Engineer

Building intelligent systems at the intersection of industry-scale engineering and AI research.

7Research Papers
15+Years Experience
1Patent
344+GitHub Stars
Scroll

About

I'm a Senior Software Developer at Walmart Global Tech with 15+ years of experience building scalable, production-grade systems. My work sits at the intersection of industry-scale engineering and AI research.

Beyond production systems, I research and publish on AI infrastructure — exploring ideas like persistent KV caching for tool-augmented LLMs, schema compression for tool use, and cognitive memory architectures for AI agents. I also build open-source MCP servers that enable multi-agent AI workflows.

Educated at Techno India (2006–2010). Recognized as a GitHub Copilot Champion. Provisional patent holder.

0+Years Experience
0Research Papers
0Open Source Projects
0+GitHub Stars

Research & Publications

Preprints on LLM optimization, tool use, and AI infrastructure

01Feb 2026
preprint

ContextCache

Persistent KV Cache with Content-Hash Addressing for Zero-Degradation Tool Schema Caching

6.9x TTFT speedup

A persistent KV cache system that accelerates tool-augmented LLM inference by caching prefilled key-value states of tool schema prefixes with SHA-256 content-hash addressing. On cache hits, only the user query requires prefilling, reducing time-to-first-token by 6.9x (787ms to 114ms) with zero quality degradation.

LLM InferenceKV CacheTool Use
02Feb 2026
preprint

ToolFormerMicro

Composable Tool Schema Compression via Gated Cross-Attention

0.818 TSA, zero false positives

A ~428M parameter encoder-decoder model that compresses verbose tool schemas into compact 8-token gist vectors via gated cross-attention. Achieves 0.818 Tool Selection Accuracy with zero false positives across seen, held-out, and unseen tool splits.

TransformerCompressionTool Use
03Feb 2026
preprint

YantrikDB

A Cognitive Memory Engine for Persistent AI Systems

5 unified index types

An embedded cognitive memory engine that unifies five index types — vector (HNSW), knowledge graph, temporal, decay heap, and key-value — within a single embedded database. Implements multi-signal retrieval scoring with relevance-gated importance amplification. 16,000 lines of Rust.

RustDatabaseAI MemoryHNSW
04Feb 2026
preprint

SDF

Convert Once, Consume Many: Cacheable, Typed Semantic Extraction from Web Pages

4.1x latency reduction

An open, schema-validated JSON protocol for publishing pre-extracted, agent-oriented semantic representations of web content. A fine-tuned 1.5B + 3B pipeline achieves 4.1x latency reduction versus a 14B baseline with 90% exact extraction accuracy.

Data FormatWebSemantic Extraction
05May 2026
preprint

Skill as Memory

Skill as Memory, Not Document: A Database-Native Substrate for Agent Skill Catalogs

919K → 369 tokens at recall

Current LLM agent skill systems store skills as documents (SKILL.md files, Python source) optimized for human editorial workflow, collapsing authoring format, retrieval metadata, and runtime body into one artifact. This work reframes the shift as 'skill as memory, not document' — a database-native substrate where skills are written by agents at runtime, retrieved by relevance at inference, and scored over agent-emitted outcome events.

AI MemoryAgent SkillsDatabase
06Mar 2026
preprint

Tier-Based Tool Routing

Adapt the Interface, Not the Model: Tier-Based Tool Routing for AI Agents

89% accuracy at 1.5B

Tool selection accuracy depends more on how tools are presented than on model size. A 1.5B model achieves 89% within-family accuracy — the bottleneck is finding the right tool neighborhood, not picking from it. Evaluated across 1,000+ native tool calls with 80 tools and 4 model sizes (1.5B–35B).

Tool UseLLM AgentsRouting
07Mar 2026
preprint

Memory-Continuous Architecture

Decoupling Deployment Boundaries from Memory Boundaries in Function Composition

4–100x latency reduction

Introduces Memory-Continuous Architecture (MCA), a runtime model that decouples deployment boundaries from memory boundaries in function composition. KubeFn implements MCA across JVM, Python, and Node.js runtimes, demonstrating 4–100x latency reduction in full HTTP benchmarks.

ServerlessArchitectureFunction Composition

Open Source

MCP servers and developer tools for AI agent workflows

Most popular152

YantrikDB

Cognitive memory database for AI agents — consolidates duplicate memories, detects contradictions, and fades stale ones via temporal decay. A Rust engine that ships as a library, MCP server, and HTTP cluster.

  • Temporal decay + contradiction detection
  • Library, MCP server, and HTTP cluster
RustAI MemoryHNSWMCP
View repository
61

brainstorm-mcp

MCP server for multi-round AI brainstorming debates between multiple models (GPT, DeepSeek, Groq, Ollama, etc.)

  • Multi-round cross-model debates
  • Supports GPT, DeepSeek, Groq, Ollama
TypeScriptMCPMulti-Agent
View repository
45

YantrikDB Hermes Plugin

YantrikDB memory provider for Nous Research's Hermes agent — self-maintaining memory with canonicalization, contradiction tracking, recency ranking, and explainable recall.

  • Self-maintaining memory + contradiction tracking
  • Explainable recall for Hermes agents
PythonAI MemoryHermes
View repository
26

ClawBrain

AI memory and personalization system that enables truly personalized AI-human communication with evolving personality traits and real-time mood detection.

  • Evolving personality traits
  • Encrypted local-first storage
PythonAI MemorySQLitePostgreSQL
View repository
25

saga-mcp

A Jira-like project tracker MCP server for AI agents. SQLite-backed with 22 built-in tools.

  • 22 built-in tools
  • SQLite-backed storage
TypeScriptMCPSQLite
View repository
21

ContextCache

Persistent KV cache with content-hash addressing for tool-augmented LLMs. Caches prefilled tool-schema key-value states for 6.9x faster time-to-first-token with zero quality degradation.

  • 6.9x TTFT speedup on cache hits
  • SHA-256 content-hash addressing
PythonLLM InferenceKV Cache
View repository
7

SwarmCode

MCP server for real-time cross-machine communication between Claude Code instances, enabling coordinated multi-agent workflows across machines.

  • Real-time cross-machine messaging
  • Coordinate multiple Claude Code agents
JavaScriptMCPMulti-Agent
View repository
5

truenas-mcp

The most comprehensive MCP server for TrueNAS SCALE — 278 actions and 12 resources exposed through a single hierarchical tool (~200 tokens vs ~28k).

  • 278 actions, 12 resources
  • ~200 tokens vs ~28k via hierarchical tool
TypeScriptMCPTrueNAS
View repository
2

MCP Registry Portal

Enterprise-grade web portal for managing MCP applications with multi-environment support, API key management, and security provider integration.

  • JWT auth with role-based access
  • 5 secret management backends
TypeScriptNext.jsPrismaMCP
View repository
U.S. Provisional PatentFiled

Cognitive Memory Engine with Instinct-Driven Proactive Behavior, Unified In-Process Companion Runtime, and Contradiction-Aware Adaptive Retrieval

Application No.

63/991,357

Docket No.

SARKAR-2026-001

Filing Date

February 26, 2026

Inventor

Pranab Sarkar

Get in touch

Let's Connect

Interested in research collaboration, open-source contributions, or discussing AI infrastructure? I'd love to hear from you.

mail@pranab.co.in