Documentation Index
Fetch the complete documentation index at: https://docs.antryk.com/llms.txt
Use this file to discover all available pages before exploring further.
LLM Deployment Services
Deploy production-ready Large Language Models (LLMs), AI inference APIs, generative AI applications, embeddings services, and scalable AI workloads directly inside Antryk using the built-in LLM Deployment platform.
Antryk LLM Deployment Services provide scalable GPU-powered infrastructure for hosting open-source and custom AI models with automated deployment workflows, runtime configuration, inference optimization, and centralized AI infrastructure management.
The deployment platform allows organizations to connect repositories, configure model-serving environments, select AI providers, manage runtime settings, configure environment variables, and deploy production-ready inference services with simplified infrastructure provisioning.
Using Antryk LLM Deployment Services, teams can:
- Deploy open-source LLMs
- Host AI inference APIs
- Launch chat applications
- Deploy embeddings services
- Build RAG pipelines
- Deploy fine-tuned AI models
- Configure GPU-powered inference infrastructure
- Manage AI deployments centrally
- Deploy directly from Git repositories
- Configure runtime environments
- Scale production AI workloads
- Manage inference infrastructure efficiently
This provides organizations with centralized AI infrastructure management and production-ready model deployment workflows.
What is LLM Deployment Service?
LLM Deployment Service is Antryk’s managed AI infrastructure platform designed for hosting, scaling, and managing Large Language Models and generative AI systems.
The platform simplifies AI deployment operations by allowing users to configure repositories, model providers, runtime settings, GPU infrastructure, environment variables, and deployment behavior directly from a centralized dashboard.
Antryk automatically provisions infrastructure, installs dependencies, configures inference runtimes, initializes model-serving systems, and launches scalable AI deployments.
The platform supports:
- AI inference APIs
- Chatbot deployments
- Open-source LLM hosting
- Fine-tuned model serving
- Embeddings infrastructure
- RAG applications
- CUDA-based AI runtimes
- Python AI workloads
- Multi-model deployments
- API-based inference systems
- GPU-accelerated AI infrastructure
Supported AI Models & Providers
Antryk supports deployment and hosting for multiple open-source and production-ready LLM ecosystems.
Supported AI providers include:
| Provider | Description |
|---|
| DeepSeek | Advanced reasoning and coding models |
| Gemma | Lightweight open AI models |
| Minimax | High-performance conversational AI models |
| Qwen | Alibaba open-source LLM ecosystem |
| Mistral | Efficient open-weight AI models |
| Ollama | Local model serving runtime |
These providers enable organizations to deploy modern AI systems optimized for inference, reasoning, coding assistance, chat applications, and generative AI workloads.
Creating a New LLM Deployment
Antryk allows users to deploy production-ready AI services directly from connected Git repositories using a guided deployment workflow.
The deployment system simplifies AI infrastructure provisioning and inference deployment operations.
Create LLM Deployment Form
The Create LLM Deployment form allows users to configure repositories, runtime environments, AI model providers, environment variables, GPU infrastructure, and deployment behavior.
The deployment workflow includes:
- Basic deployment information
- Git provider connection
- Repository selection
- Branch configuration
- Build configuration
- AI model configuration
- Environment variable management
- GPU infrastructure selection
- Deployment execution
Step 1 — Deployment Information
The Deployment Information section defines the AI deployment identity.
Deployment Name
Enter a descriptive name for the deployment.
Examples:
- support-chatbot
- rag-inference-api
- ai-assistant
- coding-llm
- embeddings-service
Using descriptive deployment names helps teams identify AI services quickly across environments.
Step 2 — Source Code Repository
The Source Code Repository section connects the deployment to a Git provider.
Connect Git Provider
Users can connect supported Git providers for AI deployment integration.
Supported providers include:
| Provider | Status |
|---|
| GitHub | Available |
| GitLab | Coming Soon |
| Bitbucket | Coming Soon |
After connecting the provider, users can select repositories directly from their account.
Repository Selection
Choose the repository containing the AI application or inference service.
Examples:
llm-chat-service
rag-pipeline
ollama-server
deepseek-api
Branch Selection
Select the Git branch to deploy.
Examples:
main
production
staging
develop
This enables repository-driven AI deployment workflows.
Step 3 — Build Configuration
The Build Configuration section defines how the AI application should be installed, built, and executed.
Users can configure:
- Install command
- Build command
- Start command
- Output directory
- Root directory
Install Command
Defines dependency installation behavior.
Examples:
pip install -r requirements.txt
Build Command
Defines the application build process.
Examples:
Start Command
Defines the inference runtime execution command.
Examples:
python -m vllm.entrypoints.openai.api_server
Output Directory
Defines the generated build output directory.
Examples:
Root Directory
Defines the application root path inside the repository.
Examples:
The build configuration system supports multiple AI frameworks and deployment architectures.
Step 4 — AI Model Configuration
The AI Model Configuration section allows users to configure the AI provider, model selection, runtime engine, and inference settings.
Users can configure:
- AI provider
- Model name
- Runtime engine
- Context window
- Quantization settings
- API configuration
- Token limits
- Runtime optimization
AI Provider Selection
Select the AI provider for deployment.
Supported providers include:
| Provider | Use Cases |
|---|
| DeepSeek | Coding, reasoning, developer assistants |
| Gemma | Lightweight AI applications |
| Minimax | Conversational AI systems |
| Qwen | Enterprise AI workloads |
| Mistral | Efficient open-weight inference |
| Ollama | Local AI runtime deployments |
Model Name
Specify the AI model to deploy.
Examples:
deepseek-coder
gemma-7b
qwen2-72b
mistral-7b
minimax-chat
llama3
Runtime Engine
Select the inference runtime engine.
Supported engines include:
| Engine | Description |
|---|
| Ollama Runtime | Lightweight local model serving |
| vLLM | High-performance LLM inference |
| Transformers | Hugging Face runtime |
| TensorRT-LLM | NVIDIA optimized inference |
| Custom Runtime | Custom AI serving environment |
Context Window
Configure the maximum context size supported by the deployment.
Examples:
This allows organizations to optimize inference behavior and runtime performance.
Step 5 — Environment Variables
The Environment Variables section allows users to securely configure runtime secrets and infrastructure settings.
Users can:
- Add environment variables
- Import configuration values
- Copy variables
- Remove variables
- Manage secrets securely
Examples:
HF_TOKEN=xxxx
OPENAI_API_KEY=xxxx
REDIS_URL=redis://localhost:6379
MODEL_CACHE=/models
CUDA_VISIBLE_DEVICES=0
This enables secure integration with external infrastructure and AI services.
Step 6 — Select GPU Plan
The Select Plan section allows users to choose GPU infrastructure optimized for AI inference and model-serving workloads.
Available GPU plans include:
| GPU Plan | GPU Memory | Recommended Use Cases |
|---|
| A4000 | 16 GB | Lightweight inference workloads |
| A4500 | 16 GB | Mid-scale AI applications |
| RTX 4000 | 16 GB | AI APIs and inference services |
| RTX 2000 | 16 GB | Entry-level AI workloads |
| L4 | 24 GB | Optimized inference workloads |
| A5000 | 24 GB | AI training and embeddings |
| RTX 3090 | 24 GB | High-performance inference |
| RTX 4090 PRO | 24 GB | Advanced AI serving |
| A6000 | 48 GB | Large-scale model hosting |
| A40 | 48 GB | Enterprise AI infrastructure |
| L40 | 48 GB | Generative AI workloads |
| L40s | 48 GB | Optimized LLM inference |
| RTX 6000 Ada | 48 GB | Professional AI acceleration |
| A100 | 80 GB | Enterprise AI training |
| H100 Pro | 80 GB | Advanced AI inference |
| H200 Pro | 141 GB | Ultra-large AI workloads |
This enables organizations to optimize AI infrastructure performance and scalability.
Step 7 — Deploy LLM Service
After completing the deployment configuration, users can launch the AI service directly from the dashboard.
Deployment Workflow
The deployment system automatically:
- Provisions GPU infrastructure
- Pulls repository source code
- Installs dependencies
- Configures inference runtimes
- Injects environment variables
- Initializes AI model runtimes
- Starts inference APIs
- Launches deployment services
Users can deploy the AI service using the Deploy Service button.
Deployment Features
Antryk LLM Deployment Services provide:
- One-click AI deployments
- GPU-powered infrastructure
- Multi-model deployment workflows
- Repository-based deployments
- Runtime configuration management
- Environment variable security
- Production inference infrastructure
- Runtime optimization
- AI provider flexibility
- Centralized AI operations
Infrastructure Scalability
Antryk allows organizations to scale AI infrastructure dynamically based on workload requirements.
Teams can:
- Upgrade GPU plans
- Modify runtime settings
- Scale inference systems
- Optimize deployment performance
- Redeploy AI services
- Manage production AI infrastructure centrally
This enables organizations to build scalable AI systems efficiently using managed LLM infrastructure.