Senior Software Developer @ Walmart Global Tech

Pranab Sarkar

AI Researcher & Senior Software Engineer

Building intelligent systems at the intersection of industry-scale engineering and AI research.

7Research Papers

15+Years Experience

1Patent

344+GitHub Stars

View Research GitHub LinkedIn

Scroll

About

I'm a Senior Software Developer at Walmart Global Tech with 15+ years of experience building scalable, production-grade systems. My work sits at the intersection of industry-scale engineering and AI research.

Beyond production systems, I research and publish on AI infrastructure — exploring ideas like persistent KV caching for tool-augmented LLMs, schema compression for tool use, and cognitive memory architectures for AI agents. I also build open-source MCP servers that enable multi-agent AI workflows.

Educated at Techno India (2006–2010). Recognized as a GitHub Copilot Champion. Provisional patent holder.

0+Years Experience

0Research Papers

0Open Source Projects

0+GitHub Stars

Research & Publications

Preprints on LLM optimization, tool use, and AI infrastructure

ORCID0009-0009-8683-1481

01Feb 2026

preprint

ContextCache

Persistent KV Cache with Content-Hash Addressing for Zero-Degradation Tool Schema Caching

6.9x TTFT speedup

A persistent KV cache system that accelerates tool-augmented LLM inference by caching prefilled key-value states of tool schema prefixes with SHA-256 content-hash addressing. On cache hits, only the user query requires prefilling, reducing time-to-first-token by 6.9x (787ms to 114ms) with zero quality degradation.

LLM InferenceKV CacheTool Use

Read paper

02Feb 2026

preprint

ToolFormerMicro

Composable Tool Schema Compression via Gated Cross-Attention

0.818 TSA, zero false positives

A ~428M parameter encoder-decoder model that compresses verbose tool schemas into compact 8-token gist vectors via gated cross-attention. Achieves 0.818 Tool Selection Accuracy with zero false positives across seen, held-out, and unseen tool splits.

TransformerCompressionTool Use

Read paper

03Feb 2026

preprint

YantrikDB

A Cognitive Memory Engine for Persistent AI Systems

5 unified index types

An embedded cognitive memory engine that unifies five index types — vector (HNSW), knowledge graph, temporal, decay heap, and key-value — within a single embedded database. Implements multi-signal retrieval scoring with relevance-gated importance amplification. 16,000 lines of Rust.

RustDatabaseAI MemoryHNSW

Read paper

04Feb 2026

preprint

SDF

Convert Once, Consume Many: Cacheable, Typed Semantic Extraction from Web Pages

4.1x latency reduction

An open, schema-validated JSON protocol for publishing pre-extracted, agent-oriented semantic representations of web content. A fine-tuned 1.5B + 3B pipeline achieves 4.1x latency reduction versus a 14B baseline with 90% exact extraction accuracy.

Data FormatWebSemantic Extraction

Read paper

05May 2026

preprint

Skill as Memory

Skill as Memory, Not Document: A Database-Native Substrate for Agent Skill Catalogs

919K → 369 tokens at recall

Current LLM agent skill systems store skills as documents (SKILL.md files, Python source) optimized for human editorial workflow, collapsing authoring format, retrieval metadata, and runtime body into one artifact. This work reframes the shift as 'skill as memory, not document' — a database-native substrate where skills are written by agents at runtime, retrieved by relevance at inference, and scored over agent-emitted outcome events.

AI MemoryAgent SkillsDatabase

Read paper

06Mar 2026

preprint

Tier-Based Tool Routing

Adapt the Interface, Not the Model: Tier-Based Tool Routing for AI Agents

89% accuracy at 1.5B

Tool selection accuracy depends more on how tools are presented than on model size. A 1.5B model achieves 89% within-family accuracy — the bottleneck is finding the right tool neighborhood, not picking from it. Evaluated across 1,000+ native tool calls with 80 tools and 4 model sizes (1.5B–35B).

Tool UseLLM AgentsRouting

Read paper

07Mar 2026

preprint

Memory-Continuous Architecture

Decoupling Deployment Boundaries from Memory Boundaries in Function Composition

4–100x latency reduction

Introduces Memory-Continuous Architecture (MCA), a runtime model that decouples deployment boundaries from memory boundaries in function composition. KubeFn implements MCA across JVM, Python, and Node.js runtimes, demonstrating 4–100x latency reduction in full HTTP benchmarks.

ServerlessArchitectureFunction Composition

Read paper

Open Source

MCP servers and developer tools for AI agent workflows

Most popular152

YantrikDB

Cognitive memory database for AI agents — consolidates duplicate memories, detects contradictions, and fades stale ones via temporal decay. A Rust engine that ships as a library, MCP server, and HTTP cluster.

Temporal decay + contradiction detection
Library, MCP server, and HTTP cluster

RustAI MemoryHNSWMCP

View repository

brainstorm-mcp

MCP server for multi-round AI brainstorming debates between multiple models (GPT, DeepSeek, Groq, Ollama, etc.)

Multi-round cross-model debates
Supports GPT, DeepSeek, Groq, Ollama

TypeScriptMCPMulti-Agent

View repository

YantrikDB Hermes Plugin

YantrikDB memory provider for Nous Research's Hermes agent — self-maintaining memory with canonicalization, contradiction tracking, recency ranking, and explainable recall.

Self-maintaining memory + contradiction tracking
Explainable recall for Hermes agents

PythonAI MemoryHermes

View repository

ClawBrain

AI memory and personalization system that enables truly personalized AI-human communication with evolving personality traits and real-time mood detection.

Evolving personality traits
Encrypted local-first storage

PythonAI MemorySQLitePostgreSQL

View repository

saga-mcp

A Jira-like project tracker MCP server for AI agents. SQLite-backed with 22 built-in tools.

22 built-in tools
SQLite-backed storage

TypeScriptMCPSQLite

View repository

ContextCache

Persistent KV cache with content-hash addressing for tool-augmented LLMs. Caches prefilled tool-schema key-value states for 6.9x faster time-to-first-token with zero quality degradation.

6.9x TTFT speedup on cache hits
SHA-256 content-hash addressing

PythonLLM InferenceKV Cache

View repository

SwarmCode

MCP server for real-time cross-machine communication between Claude Code instances, enabling coordinated multi-agent workflows across machines.

Real-time cross-machine messaging
Coordinate multiple Claude Code agents

JavaScriptMCPMulti-Agent

View repository

truenas-mcp

The most comprehensive MCP server for TrueNAS SCALE — 278 actions and 12 resources exposed through a single hierarchical tool (~200 tokens vs ~28k).

278 actions, 12 resources
~200 tokens vs ~28k via hierarchical tool

TypeScriptMCPTrueNAS

View repository

MCP Registry Portal

Enterprise-grade web portal for managing MCP applications with multi-environment support, API key management, and security provider integration.

JWT auth with role-based access
5 secret management backends

TypeScriptNext.jsPrismaMCP

View repository

View all projects on GitHub

U.S. Provisional PatentFiled

Cognitive Memory Engine with Instinct-Driven Proactive Behavior, Unified In-Process Companion Runtime, and Contradiction-Aware Adaptive Retrieval

Application No.

63/991,357

Docket No.

SARKAR-2026-001

Filing Date

February 26, 2026

Inventor

Pranab Sarkar

Get in touch

Let's Connect

Interested in research collaboration, open-source contributions, or discussing AI infrastructure? I'd love to hear from you.

mail@pranab.co.in

Email Me GitHub LinkedIn