ChatStream®

LLM Serving Solution

LLM Serving Solution for Commercial Services

We develop and provide "ChatStream®", an LLM serving solution for commercial services.

With the evolution of LLMs, there is a high demand for dedicated LLMs such as "domain-specific" and "industry-specific" types tailored to enterprise internal operations, along with the need for information security. Our ChatStream is an LLM distributed inference server (ChatStream® Server) that can host such specialized LLMs.

Through our proprietary algorithm for GPU server load balancing, flexible scale-out is possible, achieving stable performance even under high-volume access to your proprietary LLMs. Additionally, our chat UI (ChatStream® UI) is fully scratch-developed in-house, offering assured quality and high customizability.

Utilizing open and high-performance LLMs such as Llama4 and Mistral, you can build various commercial LLM applications including chat with low code and short delivery times. (We support not only open LLMs but also integration with commercial LLM APIs such as OpenAI's ChatGPT series and Anthropic's Claude)

Furthermore, developed by Japanese engineers, our solution is highly compatible with Japanese language processing (prompt handling, search functions, etc.).

Features

  • Build full-fledged LLM applications with no-code/low-code approach
  • Wide support for proprietary LLMs and open-source LLMs
  • Proprietary scale-out technology with high serving capacity flexibly handles multi-user simultaneous access and high-load environments
  • Standard chat interface with advanced UX, flexibly customizable for various domain operations

Specifications

Python: Runs on Python 3.11, Pytorch 2.~ environment

Supported GPU: NVIDIA CUDA 11.7~ equipped GPU

Chat UI: Multi-task compatible web chat UI included as standard. Web and mobile compatible. Various customizations available. Supports flexible conversation tree modification operations such as response regeneration and request re-editing.

Multi-GPU Load Balancing: Supported
 →Data Parallel: Distributed LLM serving through data parallelism

Multi-Node Load Balancing: Supports clustering load balancing with multi-node
 →Model Parallel: Tensor parallel (PagedAttention, Megatron LLM), large model serving through clustering
 →Large Model Scale-out: Further scale-out through model-parallel cluster sets (ChatStream® Pool)

Security: ASN filtering, IP filtering, TLS3.0, CSRF protection, and more

Advanced Features: Multi-task, OAuth user authentication, and more

Operating Environment: Qualiteg GPU Cloud (our GPU environment), AWS, Azure, GCP, On-premises (separate GPU procurement and HPC cluster construction service available), LLM API connection*

Demo

Try our demo!

Visit ChatStream Demo

Technical Information

ChatStream® technology and how to build LLM chat using Python

View Technical Info

Contact Us

For implementation and details, please contact us:

Contact Us