> ## Documentation Index
> Fetch the complete documentation index at: https://docs.antryk.com/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM

> Deploy and manage Large Language Models (LLMs), AI inference APIs, generative AI systems, and production-ready model serving infrastructure using Antryk LLM Deployment Services. Configure repositories, model providers, runtime environments, GPU infrastructure, environment variables, and scalable inference deployments directly from the dashboard.

# LLM Deployment Services

Deploy production-ready Large Language Models (LLMs), AI inference APIs, generative AI applications, embeddings services, and scalable AI workloads directly inside Antryk using the built-in LLM Deployment platform.

Antryk LLM Deployment Services provide scalable GPU-powered infrastructure for hosting open-source and custom AI models with automated deployment workflows, runtime configuration, inference optimization, and centralized AI infrastructure management.

The deployment platform allows organizations to connect repositories, configure model-serving environments, select AI providers, manage runtime settings, configure environment variables, and deploy production-ready inference services with simplified infrastructure provisioning.

Using Antryk LLM Deployment Services, teams can:

* Deploy open-source LLMs
* Host AI inference APIs
* Launch chat applications
* Deploy embeddings services
* Build RAG pipelines
* Deploy fine-tuned AI models
* Configure GPU-powered inference infrastructure
* Manage AI deployments centrally
* Deploy directly from Git repositories
* Configure runtime environments
* Scale production AI workloads
* Manage inference infrastructure efficiently

This provides organizations with centralized AI infrastructure management and production-ready model deployment workflows.

***

# What is LLM Deployment Service?

LLM Deployment Service is Antryk’s managed AI infrastructure platform designed for hosting, scaling, and managing Large Language Models and generative AI systems.

The platform simplifies AI deployment operations by allowing users to configure repositories, model providers, runtime settings, GPU infrastructure, environment variables, and deployment behavior directly from a centralized dashboard.

Antryk automatically provisions infrastructure, installs dependencies, configures inference runtimes, initializes model-serving systems, and launches scalable AI deployments.

The platform supports:

* AI inference APIs
* Chatbot deployments
* Open-source LLM hosting
* Fine-tuned model serving
* Embeddings infrastructure
* RAG applications
* CUDA-based AI runtimes
* Python AI workloads
* Multi-model deployments
* API-based inference systems
* GPU-accelerated AI infrastructure

***

# Supported AI Models & Providers

Antryk supports deployment and hosting for multiple open-source and production-ready LLM ecosystems.

Supported AI providers include:

| Provider | Description                               |
| -------- | ----------------------------------------- |
| DeepSeek | Advanced reasoning and coding models      |
| Gemma    | Lightweight open AI models                |
| Minimax  | High-performance conversational AI models |
| Qwen     | Alibaba open-source LLM ecosystem         |
| Mistral  | Efficient open-weight AI models           |
| Ollama   | Local model serving runtime               |

These providers enable organizations to deploy modern AI systems optimized for inference, reasoning, coding assistance, chat applications, and generative AI workloads.

***

# Creating a New LLM Deployment

Antryk allows users to deploy production-ready AI services directly from connected Git repositories using a guided deployment workflow.

The deployment system simplifies AI infrastructure provisioning and inference deployment operations.

***

# Create LLM Deployment Form

The Create LLM Deployment form allows users to configure repositories, runtime environments, AI model providers, environment variables, GPU infrastructure, and deployment behavior.

The deployment workflow includes:

* Basic deployment information
* Git provider connection
* Repository selection
* Branch configuration
* Build configuration
* AI model configuration
* Environment variable management
* GPU infrastructure selection
* Deployment execution

***

# Step 1 — Deployment Information

The Deployment Information section defines the AI deployment identity.

## Deployment Name

Enter a descriptive name for the deployment.

Examples:

* support-chatbot
* rag-inference-api
* ai-assistant
* coding-llm
* embeddings-service

Using descriptive deployment names helps teams identify AI services quickly across environments.

***

# Step 2 — Source Code Repository

The Source Code Repository section connects the deployment to a Git provider.

## Connect Git Provider

Users can connect supported Git providers for AI deployment integration.

Supported providers include:

| Provider  | Status      |
| --------- | ----------- |
| GitHub    | Available   |
| GitLab    | Coming Soon |
| Bitbucket | Coming Soon |

After connecting the provider, users can select repositories directly from their account.

## Repository Selection

Choose the repository containing the AI application or inference service.

Examples:

```txt theme={null}
llm-chat-service
rag-pipeline
ollama-server
deepseek-api
```

## Branch Selection

Select the Git branch to deploy.

Examples:

```txt id="5e5w7o" theme={null}
main
production
staging
develop
```

This enables repository-driven AI deployment workflows.

***

# Step 3 — Build Configuration

The Build Configuration section defines how the AI application should be installed, built, and executed.

Users can configure:

* Install command
* Build command
* Start command
* Output directory
* Root directory

## Install Command

Defines dependency installation behavior.

Examples:

```bash id="tr2v55" theme={null}
pip install -r requirements.txt
```

```bash id="st4wyf" theme={null}
npm install
```

## Build Command

Defines the application build process.

Examples:

```bash id="pn3jjw" theme={null}
npm run build
```

```bash id="v4itq3" theme={null}
python setup.py install
```

## Start Command

Defines the inference runtime execution command.

Examples:

```bash id="fw0lq6" theme={null}
python app.py
```

```bash id="o18n3x" theme={null}
ollama serve
```

```bash id="wfw0t8" theme={null}
python -m vllm.entrypoints.openai.api_server
```

## Output Directory

Defines the generated build output directory.

Examples:

```txt id="bk2dc3" theme={null}
dist
build
.next
```

## Root Directory

Defines the application root path inside the repository.

Examples:

```txt id="0cv4cl" theme={null}
/
services/llm
apps/api
```

The build configuration system supports multiple AI frameworks and deployment architectures.

***

# Step 4 — AI Model Configuration

The AI Model Configuration section allows users to configure the AI provider, model selection, runtime engine, and inference settings.

Users can configure:

* AI provider
* Model name
* Runtime engine
* Context window
* Quantization settings
* API configuration
* Token limits
* Runtime optimization

## AI Provider Selection

Select the AI provider for deployment.

Supported providers include:

| Provider | Use Cases                               |
| -------- | --------------------------------------- |
| DeepSeek | Coding, reasoning, developer assistants |
| Gemma    | Lightweight AI applications             |
| Minimax  | Conversational AI systems               |
| Qwen     | Enterprise AI workloads                 |
| Mistral  | Efficient open-weight inference         |
| Ollama   | Local AI runtime deployments            |

## Model Name

Specify the AI model to deploy.

Examples:

```txt id="p5gmlr" theme={null}
deepseek-coder
gemma-7b
qwen2-72b
mistral-7b
minimax-chat
llama3
```

## Runtime Engine

Select the inference runtime engine.

Supported engines include:

| Engine         | Description                     |
| -------------- | ------------------------------- |
| Ollama Runtime | Lightweight local model serving |
| vLLM           | High-performance LLM inference  |
| Transformers   | Hugging Face runtime            |
| TensorRT-LLM   | NVIDIA optimized inference      |
| Custom Runtime | Custom AI serving environment   |

## Context Window

Configure the maximum context size supported by the deployment.

Examples:

```txt id="4e9m0u" theme={null}
4096
8192
32768
128000
```

This allows organizations to optimize inference behavior and runtime performance.

***

# Step 5 — Environment Variables

The Environment Variables section allows users to securely configure runtime secrets and infrastructure settings.

Users can:

* Add environment variables
* Import configuration values
* Copy variables
* Remove variables
* Manage secrets securely

Examples:

```env id="4lc9h7" theme={null}
HF_TOKEN=xxxx
OPENAI_API_KEY=xxxx
REDIS_URL=redis://localhost:6379
MODEL_CACHE=/models
CUDA_VISIBLE_DEVICES=0
```

This enables secure integration with external infrastructure and AI services.

***

# Step 6 — Select GPU Plan

The Select Plan section allows users to choose GPU infrastructure optimized for AI inference and model-serving workloads.

Available GPU plans include:

| GPU Plan     | GPU Memory | Recommended Use Cases           |
| ------------ | ---------- | ------------------------------- |
| A4000        | 16 GB      | Lightweight inference workloads |
| A4500        | 16 GB      | Mid-scale AI applications       |
| RTX 4000     | 16 GB      | AI APIs and inference services  |
| RTX 2000     | 16 GB      | Entry-level AI workloads        |
| L4           | 24 GB      | Optimized inference workloads   |
| A5000        | 24 GB      | AI training and embeddings      |
| RTX 3090     | 24 GB      | High-performance inference      |
| RTX 4090 PRO | 24 GB      | Advanced AI serving             |
| A6000        | 48 GB      | Large-scale model hosting       |
| A40          | 48 GB      | Enterprise AI infrastructure    |
| L40          | 48 GB      | Generative AI workloads         |
| L40s         | 48 GB      | Optimized LLM inference         |
| RTX 6000 Ada | 48 GB      | Professional AI acceleration    |
| A100         | 80 GB      | Enterprise AI training          |
| H100 Pro     | 80 GB      | Advanced AI inference           |
| H200 Pro     | 141 GB     | Ultra-large AI workloads        |

This enables organizations to optimize AI infrastructure performance and scalability.

***

# Step 7 — Deploy LLM Service

After completing the deployment configuration, users can launch the AI service directly from the dashboard.

## Deployment Workflow

The deployment system automatically:

* Provisions GPU infrastructure
* Pulls repository source code
* Installs dependencies
* Configures inference runtimes
* Injects environment variables
* Initializes AI model runtimes
* Starts inference APIs
* Launches deployment services

Users can deploy the AI service using the **Deploy Service** button.

***

# Deployment Features

Antryk LLM Deployment Services provide:

* One-click AI deployments
* GPU-powered infrastructure
* Multi-model deployment workflows
* Repository-based deployments
* Runtime configuration management
* Environment variable security
* Production inference infrastructure
* Runtime optimization
* AI provider flexibility
* Centralized AI operations

***

# Infrastructure Scalability

Antryk allows organizations to scale AI infrastructure dynamically based on workload requirements.

Teams can:

* Upgrade GPU plans
* Modify runtime settings
* Scale inference systems
* Optimize deployment performance
* Redeploy AI services
* Manage production AI infrastructure centrally

This enables organizations to build scalable AI systems efficiently using managed LLM infrastructure.