What we mean by these terms.

A working glossary of the concepts, methods, and tools across Skelf Research's five research pillars. Each entry is a DefinedTerm in our structured data, so AI assistants and search engines can extract them directly. Every definition links to the open-source projects that implement it.

LLM Cognition & Prompt Theory

Declarative Prompt Specification
Also: declarative prompting,prompt-as-code,typed prompts
A way of writing prompts (and entire LLM behaviours) by describing the desired output structure, constraints, and format — rather than providing step-by-step natural-language instructions. Declarative prompt specifications are typically expressed in a structured language (YAML, JSON, or a custom DSL) so they can be parsed, type-checked, versioned, and ported across LLM providers without rewriting.
Prompts are typically treated as ad-hoc strings concatenated at runtime. Declarative specification treats them as first-class engineering artefacts: typed, validated, and reproducible. The opposite of declarative is imperative prompting, which spells out the reasoning steps the model should follow.
Implemented in: promptelblogus
Prompt Lifecycle Management
Also: prompt versioning,prompt CI/CD,prompt observability
The set of practices and tooling for managing prompts as evolving artefacts: extraction from application code, version control, dependency declaration, regression testing, deployment, and rollback. A prompt lifecycle treats prompts the same way software dependencies are treated.
When prompts are embedded in application code, they cannot be A/B tested, cannot be rolled back, and cannot be evolved independently of the code. Prompt lifecycle management makes them first-class engineering objects.
Implemented in: bloguspromptel
LLM Routing
Also: LLM gateway,model router,model cascading,cost-quality routing
The practice of selecting which language model to use for a given query, based on cost, latency, quality, and the query content. A router can pick a small fast model for simple queries and a large expensive model for hard ones, optimising the cost-quality frontier.
LLM routing sits between the application and the LLM providers. It can be a hand-coded decision tree or a learned policy (e.g. MIPROv2). Related to the concept of a model gateway.
Implemented in: route-switch
Agent Memory
Also: persistent agent memory,long-term memory for agents,agent state
Persistent, structured storage that allows an LLM agent to retain information across sessions. Distinct from the in-context conversation: memory lives outside the model in a database or vector store and is selectively loaded into the context window when relevant. Memory layers typically distinguish between semantic memory (facts), episodic memory (past interactions), and procedural memory (learned workflows).
Modern LLMs can handle 128K-1M token context windows, but context is ephemeral. Memory is the persistent store; context is the active working set. The two are often conflated in the literature, which causes architectural confusion.
Implemented in: memorg
Local LLM Serving
Also: self-hosted LLM,on-prem LLM,private LLM
Running large language models on local infrastructure (developer laptop, on-prem server, or edge device) instead of calling a hosted API. Local serving gives full data control, eliminates per-token cost, and removes network latency, at the expense of hardware requirements and operational burden.
Tools in this space include llama.cpp (the underlying inference engine), Ollama (a high-level wrapper), vLLM (production-grade GPU serving), LocalAI (OpenAI-compatible), and LM Studio (desktop GUI). Skelf's mullama is a research-focused alternative that exposes llama.cpp internals for instrumentation.
Implemented in: mullama
Ephemeral Credentials
Also: scoped tokens,short-lived credentials,zero-trust LLM access
Short-lived, scoped tokens issued by a proxy in front of an LLM API, replacing long-lived API keys. The proxy issues a token valid for minutes or hours, with a constrained scope (specific model, max tokens, allowed endpoints). When the token expires, the credential is unusable. This is the zero-trust pattern applied to LLM access.
Long-lived API keys are a security liability: a leak exposes the account indefinitely. Ephemeral credentials limit the blast radius of a leak to the token lifetime.
Implemented in: perishable

Safe & Verifiable Computing

Memory-Safe C Dialect
Also: safe C,verified C,C with safety,AI-safe C
A restricted subset of C, augmented with compiler-enforced safety invariants, that retains C's syntax and direct mapping to machine code while preventing the memory-safety bugs that plague C programs. The goal is a language that AI code generators can reliably produce correct code in, and that existing C toolchains can compile.
LLM agents increasingly generate systems code, but C is unsafe (buffer overflows, use-after-free) and Rust is hard to generate correctly (borrow checker errors). A memory-safe C dialect is a middle path: the compiler enforces the invariants that AI code generators tend to satisfy.
AI Code Sandbox
Also: code sandbox,untrusted code execution,agent sandbox
A runtime environment that isolates code generated or executed by an LLM agent from the host system. Sandboxing enforces capability restrictions: the code can read/write only what it has been granted, can make only approved network calls, and cannot escape the sandbox. Implementations include containers (gVisor, Firecracker), WASM runtimes, and language-level isolation (Lua, JavaScript).
Code generated by an LLM cannot be trusted. A sandbox is the standard defence: assume the code is malicious and constrain what it can do.
Implemented in: zviz
NUMA-Aware Scheduling
Also: NUMA topology,NUMA pinning,memory locality
A scheduling strategy that respects the Non-Uniform Memory Access (NUMA) topology of multi-socket servers: threads are placed on cores close to the memory they access, memory is allocated on the local node first, and inter-node traffic is minimised. NUMA-aware scheduling can deliver 10-40% latency improvements on memory-bound workloads like LLM inference.
On a single-socket server, NUMA is irrelevant. On multi-socket servers, ignoring NUMA causes "stranger" threads to access "home" memory through the interconnect, which is significantly slower.
Implemented in: numaperf
Embedded Vector Search
Also: in-process vector search,ANN in SQLite,SQLite vector search
Approximate nearest-neighbour (ANN) search that runs in-process, typically backed by an embedded database (SQLite, RocksDB, DuckDB) rather than a dedicated vector database server (Pinecone, Weaviate, Milvus). Embedded vector search trades peak throughput for operational simplicity: no separate service to deploy, no network round-trips, no extra failure modes.
Whether you need a dedicated vector database depends on scale. For sub-100K-vector corpora and prototype workloads, embedded search is usually sufficient and dramatically simpler to operate.
Implemented in: memistaembedcachepolymathy
Programmable Database
Also: Lua database,embedded scripting database,extensible DB
A database in which stored procedures, triggers, and query logic can be written in a general-purpose embedded scripting language (Lua, JavaScript, Python) rather than only SQL. The scripting layer is a first-class citizen: queries and transformations can be expressed in the same language as the application code, and the database can be extended without writing C extensions.
Traditional databases push logic out to the application; programmable databases pull it back in. The trade-off is operational coupling — but for AI workflow prototyping, where iteration speed matters, it is often the right choice.
Implemented in: liathliath-rs

Formal Optimisation & Decision Science

Constraint Satisfaction Problem (CSP)
Also: CSP,constraint solving,SMT
A mathematical problem defined by a set of variables, each with a domain of possible values, and a set of constraints restricting which combinations of values are allowed. CSPs are solved by backtracking search, constraint propagation, or — for small problems — exhaustive enumeration. Solvers include Z3, OR-Tools, MiniZinc, and Gurobi.
Many real-world problems (scheduling, allocation, configuration) are naturally expressed as CSPs. Bridging natural-language problem descriptions to CSPs is what savanty does.
Implemented in: savanty
Multi-Armed Bandit Ranking
Also: bandit ranking,pairwise comparison,MAB ranking
A ranking strategy that uses multi-armed bandit algorithms to choose which items to compare next, in order to learn the correct ordering of items with the fewest human comparisons possible. Each comparison is a "pull" of a bandit arm, and the algorithm balances exploration (comparing uncertain pairs) with exploitation (confirming confident pairs).
Naive ranking requires comparing every pair (O(n²) comparisons). Bandit ranking can match the same quality with O(n log n) or fewer, dramatically reducing human labelling cost.
Implemented in: compere
Quantitative Trading Signal Compiler
Also: signal compiler,quant signal DSL,alpha compiler
A tool that compiles a visual or declarative specification of a trading signal (entry conditions, exit conditions, risk limits) into a verified executable for backtesting and live trading. The compiler enforces the syntactic correctness of the specification and emits production-ready code in a memory-safe language.
A typical quant workflow — research idea → spec → production code — takes weeks. A signal compiler compresses this to minutes by removing the manual translation step.
Implemented in: sigc
Natural Language to Solver Pipeline
Also: NL2OPT,NL to constraint,LLM to SMT
A pipeline that takes a natural-language description of an optimisation problem and produces a formal solver input (e.g. a Z3 script, a MiniZinc model, an OR-Tools program) that, when run, returns a mathematically guaranteed solution. The pipeline uses an LLM to translate intent into formal constraints, and a solver to find the answer.
Pure LLM output cannot guarantee optimality. A pipeline that delegates the solving to a formal solver inherits the solver's mathematical guarantees.
Implemented in: savanty

Edge Intelligence & On-Device AI

On-Device LLM Inference
Also: local LLM,device-side AI,edge LLM
Running a large language model entirely on the user's device (phone, laptop, embedded system) without any network round-trips to a server. The model is loaded into memory from local storage, inference runs on the device CPU/GPU/NPU, and outputs are generated locally. The defining property is zero cloud dependency.
On-device inference gives full data privacy, zero network latency, and no per-call cost, at the expense of model quality (smaller models, more aggressive quantisation) and battery / thermals.
Implemented in: llamafuukkin
On-Device Mobile AI Agent
Also: mobile agent,device-side agent,autonomous mobile agent
An autonomous AI agent that runs on a mobile device, observing the screen and acting on it (taps, swipes, text input) without sending data to a remote server. The agent uses on-device LLM inference, accessibility APIs, and a tiered permission model to balance autonomy with safety.
Cloud-based agents (ChatGPT Operator, Anthropic Computer Use) are powerful but require sending the user's screen contents to a remote server. On-device mobile agents keep everything local, at the cost of reasoning quality.
Implemented in: ukkinllamafu
LLM Quantisation
Also: model quantisation,Q4 quantisation,GGUF
Reducing the numerical precision of a language model's weights from 16-bit or 32-bit floating-point to lower bit-widths (8-bit, 5-bit, 4-bit, even 2-bit) to shrink the model size and speed up inference, at the cost of some model quality. Common formats include GGUF with Q4_K_M, Q5_K_M, and Q8_0 quantisations.
A 7B-parameter model at 16-bit precision is 14 GB — too large for most phones. At 4-bit quantisation it is ~4 GB, which fits on a high-end phone. The quality trade-off is small for Q8, noticeable for Q4, and significant for Q2.
Implemented in: llamafumullama
Deliberative Search
Also: reasoning search,agentic search,intent-first search
A search paradigm in which the engine reasons about the user's intent *before* retrieving results, rather than retrieving first and ranking later. Deliberative search uses a language model to interpret the query, generate a structured understanding of what the user actually wants, and then issues targeted retrievals.
Traditional search retrieves documents that match the query terms and ranks them. Deliberative search generates a model of intent first, which can dramatically improve precision for ambiguous or complex queries. The term is sometimes used interchangeably with "agentic search" or "reasoning search".
Implemented in: slorg
Browser-Extension LLM Framework
Also: Chrome extension LLM,browser LLM,extension AI
A framework that makes it straightforward to embed LLM capabilities into a browser extension, including a manifest for declaring permissions, a sandboxed execution context, a content-script bridge, and a UI overlay. The framework abstracts the manifest v3 quirks and provides a portable API surface for the LLM calls.
Browser extensions run in a security-restricted environment that is hostile to LLM workflows. A framework hides the manifest details and provides a familiar async-API surface.
Implemented in: anouk

Cross-cutting

Hypotheses as Software
Also: OSS-as-research,research-as-code,code-as-paper
A research methodology in which every research hypothesis is encoded as a runnable, testable, peer-reviewable software artefact. The codebase is the proof: it can be executed, benchmarked, falsified, and extended by the community. The opposite of "hypotheses as papers".
Skelf Research operates on this methodology. Each of the 25 public repositories encodes a specific research question, and the code itself is the answer — or, more often, the falsifiable claim about the answer.
Open Science
Also: open research,reproducible AI,FAIR data
The practice of making the scientific process — including data, methods, code, and reasoning — fully open and reproducible. In AI research specifically, open science means publishing not just papers but the runnable code, the test data, the benchmarks, and the negative results.
Open science is a precondition for falsifiable research. If a result cannot be reproduced, it cannot be verified; if it cannot be verified, it is not science. Skelf Research is an open-science lab.