Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.antryk.com/llms.txt

Use this file to discover all available pages before exploring further.

LLM Deployment Services

Deploy production-ready Large Language Models (LLMs), AI inference APIs, generative AI applications, embeddings services, and scalable AI workloads directly inside Antryk using the built-in LLM Deployment platform. Antryk LLM Deployment Services provide scalable GPU-powered infrastructure for hosting open-source and custom AI models with automated deployment workflows, runtime configuration, inference optimization, and centralized AI infrastructure management. The deployment platform allows organizations to connect repositories, configure model-serving environments, select AI providers, manage runtime settings, configure environment variables, and deploy production-ready inference services with simplified infrastructure provisioning. Using Antryk LLM Deployment Services, teams can:
  • Deploy open-source LLMs
  • Host AI inference APIs
  • Launch chat applications
  • Deploy embeddings services
  • Build RAG pipelines
  • Deploy fine-tuned AI models
  • Configure GPU-powered inference infrastructure
  • Manage AI deployments centrally
  • Deploy directly from Git repositories
  • Configure runtime environments
  • Scale production AI workloads
  • Manage inference infrastructure efficiently
This provides organizations with centralized AI infrastructure management and production-ready model deployment workflows.

What is LLM Deployment Service?

LLM Deployment Service is Antryk’s managed AI infrastructure platform designed for hosting, scaling, and managing Large Language Models and generative AI systems. The platform simplifies AI deployment operations by allowing users to configure repositories, model providers, runtime settings, GPU infrastructure, environment variables, and deployment behavior directly from a centralized dashboard. Antryk automatically provisions infrastructure, installs dependencies, configures inference runtimes, initializes model-serving systems, and launches scalable AI deployments. The platform supports:
  • AI inference APIs
  • Chatbot deployments
  • Open-source LLM hosting
  • Fine-tuned model serving
  • Embeddings infrastructure
  • RAG applications
  • CUDA-based AI runtimes
  • Python AI workloads
  • Multi-model deployments
  • API-based inference systems
  • GPU-accelerated AI infrastructure

Supported AI Models & Providers

Antryk supports deployment and hosting for multiple open-source and production-ready LLM ecosystems. Supported AI providers include:
ProviderDescription
DeepSeekAdvanced reasoning and coding models
GemmaLightweight open AI models
MinimaxHigh-performance conversational AI models
QwenAlibaba open-source LLM ecosystem
MistralEfficient open-weight AI models
OllamaLocal model serving runtime
These providers enable organizations to deploy modern AI systems optimized for inference, reasoning, coding assistance, chat applications, and generative AI workloads.

Creating a New LLM Deployment

Antryk allows users to deploy production-ready AI services directly from connected Git repositories using a guided deployment workflow. The deployment system simplifies AI infrastructure provisioning and inference deployment operations.

Create LLM Deployment Form

The Create LLM Deployment form allows users to configure repositories, runtime environments, AI model providers, environment variables, GPU infrastructure, and deployment behavior. The deployment workflow includes:
  • Basic deployment information
  • Git provider connection
  • Repository selection
  • Branch configuration
  • Build configuration
  • AI model configuration
  • Environment variable management
  • GPU infrastructure selection
  • Deployment execution

Step 1 — Deployment Information

The Deployment Information section defines the AI deployment identity.

Deployment Name

Enter a descriptive name for the deployment. Examples:
  • support-chatbot
  • rag-inference-api
  • ai-assistant
  • coding-llm
  • embeddings-service
Using descriptive deployment names helps teams identify AI services quickly across environments.

Step 2 — Source Code Repository

The Source Code Repository section connects the deployment to a Git provider.

Connect Git Provider

Users can connect supported Git providers for AI deployment integration. Supported providers include:
ProviderStatus
GitHubAvailable
GitLabComing Soon
BitbucketComing Soon
After connecting the provider, users can select repositories directly from their account.

Repository Selection

Choose the repository containing the AI application or inference service. Examples:
llm-chat-service
rag-pipeline
ollama-server
deepseek-api

Branch Selection

Select the Git branch to deploy. Examples:
main
production
staging
develop
This enables repository-driven AI deployment workflows.

Step 3 — Build Configuration

The Build Configuration section defines how the AI application should be installed, built, and executed. Users can configure:
  • Install command
  • Build command
  • Start command
  • Output directory
  • Root directory

Install Command

Defines dependency installation behavior. Examples:
pip install -r requirements.txt
npm install

Build Command

Defines the application build process. Examples:
npm run build
python setup.py install

Start Command

Defines the inference runtime execution command. Examples:
python app.py
ollama serve
python -m vllm.entrypoints.openai.api_server

Output Directory

Defines the generated build output directory. Examples:
dist
build
.next

Root Directory

Defines the application root path inside the repository. Examples:
/
services/llm
apps/api
The build configuration system supports multiple AI frameworks and deployment architectures.

Step 4 — AI Model Configuration

The AI Model Configuration section allows users to configure the AI provider, model selection, runtime engine, and inference settings. Users can configure:
  • AI provider
  • Model name
  • Runtime engine
  • Context window
  • Quantization settings
  • API configuration
  • Token limits
  • Runtime optimization

AI Provider Selection

Select the AI provider for deployment. Supported providers include:
ProviderUse Cases
DeepSeekCoding, reasoning, developer assistants
GemmaLightweight AI applications
MinimaxConversational AI systems
QwenEnterprise AI workloads
MistralEfficient open-weight inference
OllamaLocal AI runtime deployments

Model Name

Specify the AI model to deploy. Examples:
deepseek-coder
gemma-7b
qwen2-72b
mistral-7b
minimax-chat
llama3

Runtime Engine

Select the inference runtime engine. Supported engines include:
EngineDescription
Ollama RuntimeLightweight local model serving
vLLMHigh-performance LLM inference
TransformersHugging Face runtime
TensorRT-LLMNVIDIA optimized inference
Custom RuntimeCustom AI serving environment

Context Window

Configure the maximum context size supported by the deployment. Examples:
4096
8192
32768
128000
This allows organizations to optimize inference behavior and runtime performance.

Step 5 — Environment Variables

The Environment Variables section allows users to securely configure runtime secrets and infrastructure settings. Users can:
  • Add environment variables
  • Import configuration values
  • Copy variables
  • Remove variables
  • Manage secrets securely
Examples:
HF_TOKEN=xxxx
OPENAI_API_KEY=xxxx
REDIS_URL=redis://localhost:6379
MODEL_CACHE=/models
CUDA_VISIBLE_DEVICES=0
This enables secure integration with external infrastructure and AI services.

Step 6 — Select GPU Plan

The Select Plan section allows users to choose GPU infrastructure optimized for AI inference and model-serving workloads. Available GPU plans include:
GPU PlanGPU MemoryRecommended Use Cases
A400016 GBLightweight inference workloads
A450016 GBMid-scale AI applications
RTX 400016 GBAI APIs and inference services
RTX 200016 GBEntry-level AI workloads
L424 GBOptimized inference workloads
A500024 GBAI training and embeddings
RTX 309024 GBHigh-performance inference
RTX 4090 PRO24 GBAdvanced AI serving
A600048 GBLarge-scale model hosting
A4048 GBEnterprise AI infrastructure
L4048 GBGenerative AI workloads
L40s48 GBOptimized LLM inference
RTX 6000 Ada48 GBProfessional AI acceleration
A10080 GBEnterprise AI training
H100 Pro80 GBAdvanced AI inference
H200 Pro141 GBUltra-large AI workloads
This enables organizations to optimize AI infrastructure performance and scalability.

Step 7 — Deploy LLM Service

After completing the deployment configuration, users can launch the AI service directly from the dashboard.

Deployment Workflow

The deployment system automatically:
  • Provisions GPU infrastructure
  • Pulls repository source code
  • Installs dependencies
  • Configures inference runtimes
  • Injects environment variables
  • Initializes AI model runtimes
  • Starts inference APIs
  • Launches deployment services
Users can deploy the AI service using the Deploy Service button.

Deployment Features

Antryk LLM Deployment Services provide:
  • One-click AI deployments
  • GPU-powered infrastructure
  • Multi-model deployment workflows
  • Repository-based deployments
  • Runtime configuration management
  • Environment variable security
  • Production inference infrastructure
  • Runtime optimization
  • AI provider flexibility
  • Centralized AI operations

Infrastructure Scalability

Antryk allows organizations to scale AI infrastructure dynamically based on workload requirements. Teams can:
  • Upgrade GPU plans
  • Modify runtime settings
  • Scale inference systems
  • Optimize deployment performance
  • Redeploy AI services
  • Manage production AI infrastructure centrally
This enables organizations to build scalable AI systems efficiently using managed LLM infrastructure.