RAG only route
Use approved sources, retrieval and context window rules when the model should cite evidence rather than learn new behaviour.
Tesrex designs private, local and open weight LLM routes where model behaviour, data boundaries and reviewer control have to be engineered together.
The route starts with approved sources and decision criteria: data volume, behaviour gap, latency, risk and cost. It then selects RAG, LoRA, QLoRA, full fine tune, RL/preference or a hybrid path.
The model is only one component. The repeatable system is boundary, adapter, inference stack, evals, reviewer queue and feedback.
Some workflows need retrieval and citations. Some need model behaviour changed. Some need a local run path first, then a private hosted or isolated route once the evidence is strong.
That decision is what separates private AI engineering from a local chatbot experiment.
Use approved sources, retrieval and context window rules when the model should cite evidence rather than learn new behaviour.
Adapt style, terminology or task behaviour with a lightweight adapter while keeping the base model route controlled.
Use quantized adapter training when GPU memory, cost or local experimentation makes full precision tuning impractical.
Reserved for strong datasets, stable requirements and enough evaluation evidence to justify changing more of the model behaviour.
Apply preference or reward based tuning when reviewer choices and policy trade offs need to shape the output pattern.
Split retrieval, drafting, validation and evaluation across different hosted, private, local or sidecar models.
For bounded, repeatable work, a tuned open weight route can be more controllable and reproducible because the team can fix the weights, adapter, inference stack, decoding settings, eval set and release version.
This is not a universal claim that open weights are always more deterministic than closed cloud models. It is a design argument for narrow workloads where the route can be owned and tested.
Private AI work needs practical tool choices: beginner local runs, adapter training, deeper fine tune or RL routes, evaluation harnesses, private serving and handover.
The point is not tool worship. It is choosing the smallest route that makes the workflow dependable.
LM Studio and local API runners help teams test open weight models, prompts and GGUF routes before platform engineering begins.
Unsloth Studio, LoRA and QLoRA paths help turn small, clean datasets into controlled behaviour changes.
Axolotl style YAML pipelines support full fine tunes, preference tuning, RL routes, dataset prep and evaluation stages.
Fixed eval sets, reviewer rubrics, sidecar checks and release comparisons show whether the tuned route is improving.
Local, private hosted, isolated and hybrid serving routes keep deployment aligned to source policy and support reality.
Runbooks, model cards, release versions, rollback paths and training keep the workflow usable after the prototype.
A practical engineering workpack that helps leadership decide whether to use RAG, tune an adapter, run a fine tune, apply preference/RL, build a hybrid path or stop.
We will separate source governance, customer data boundaries, model routing, context window design and reviewer workflow so the architecture fits the work.