ChatStream®
LLM Serving Solution

LLM Serving Solution for Commercial Services
We develop and provide "ChatStream®", an LLM serving solution for commercial services.
With the evolution of LLMs, there is a high demand for dedicated LLMs such as "domain-specific" and "industry-specific" types tailored to enterprise internal operations, along with the need for information security.
Our ChatStream is an LLM distributed inference server (ChatStream® Server) that can host such specialized LLMs.
Through our proprietary algorithm for GPU server load balancing, flexible scale-out is possible, achieving stable performance even under high-volume access to your proprietary LLMs.
Additionally, our chat UI (ChatStream® UI) is fully scratch-developed in-house, offering assured quality and high customizability.
Utilizing open and high-performance LLMs such as Llama4 and Mistral, you can build various commercial LLM applications including chat with low code and short delivery times.
(We support not only open LLMs but also integration with commercial LLM APIs such as OpenAI's ChatGPT series and Anthropic's Claude)
Furthermore, developed by Japanese engineers, our solution is highly compatible with Japanese language processing (prompt handling, search functions, etc.).
Features
- Build full-fledged LLM applications with no-code/low-code approach
- Wide support for proprietary LLMs and open-source LLMs
- Proprietary scale-out technology with high serving capacity flexibly handles multi-user simultaneous access and high-load environments
- Standard chat interface with advanced UX, flexibly customizable for various domain operations
Specifications
Python: Runs on Python 3.11, Pytorch 2.~ environment
Supported GPU: NVIDIA CUDA 11.7~ equipped GPU
Chat UI: Multi-task compatible web chat UI included as standard. Web and mobile compatible. Various customizations available. Supports flexible conversation tree modification operations such as response regeneration and request re-editing.
Multi-GPU Load Balancing: Supported
→Data Parallel: Distributed LLM serving through data parallelism
Multi-Node Load Balancing: Supports clustering load balancing with multi-node
→Model Parallel: Tensor parallel (PagedAttention, Megatron LLM), large model serving through clustering
→Large Model Scale-out: Further scale-out through model-parallel cluster sets (ChatStream® Pool)
Security: ASN filtering, IP filtering, TLS3.0, CSRF protection, and more
Advanced Features: Multi-task, OAuth user authentication, and more
Operating Environment: Qualiteg GPU Cloud (our GPU environment), AWS, Azure, GCP, On-premises (separate GPU procurement and HPC cluster construction service available), LLM API connection*
Technical Information
ChatStream® technology and how to build LLM chat using Python
View Technical Info