How to Use DeepSeek AI: A Detailed Guide

deepseek detailed usage guide
DeepSeek-V3 is an advanced Mixture-of-Experts (MoE) language model. It has a total of 671 billion parameters, with 37 billion activated per token. This model is known for its efficient inference and cost-effective training. The architecture is built on the successful DeepSeek-V2, with enhancements like the Multi-head Latent Attention (MLA) and DeepSeekMoE architectures.DeepSeek-V3 is pre-trained on 14.8 trillion diverse tokens, followed by supervised fine-tuning and reinforcement learning stages. It outperforms other open-source models and competes closely with leading closed-source models.

Key Features of DeepSeek-V3

  • Efficient Training: Achieves strong performance with only 2.788M H800 GPU hours.
  • Innovative Load Balancing: Minimizes performance degradation during load balancing using an auxiliary-loss-free strategy.
  • Multi-Token Prediction (MTP): Improves model performance and enables faster inference.
  • Mixed Precision Framework: Uses FP8 mixed precision for efficient training and performance.

How to Use DeepSeek AI

To get started with DeepSeek AI, you need to set up the environment for inference. Below are the steps to run the model locally or use it via the official API.

1. Running DeepSeek Locally

You can run DeepSeek-V3 locally on Linux systems with Python 3.10 or higher. Follow these steps:

  1. Clone the DeepSeek-V3 repository:
    git clone https://github.com/deepseek-ai/DeepSeek-V3.git
  2. Navigate to the inference folder and install the required dependencies:
    cd DeepSeek-V3/inference && pip install -r requirements.txt
  3. Download the model weights from Hugging Face.
  4. Convert the weights to the required format using the following command:
    python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo
  5. Run the model interactively with the following command:
    torchrun --nnodes 2 --nproc-per-node 8 generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --interactive

2. Using SGLang for Inference

SGLang is a recommended framework for inference as it supports FP8 and BF16 modes, along with multi-node tensor parallelism. To use it:

  1. Follow the SGLang setup guide on GitHub to install and configure the environment.
  2. Run DeepSeek-V3 with optimized latency and throughput.
  3. Use SGLang’s multi-node support to deploy the model across machines.

3. Using LMDeploy

LMDeploy offers a flexible framework for running DeepSeek-V3 efficiently. It supports both online and offline deployment:

  1. Install LMDeploy and follow the official guide to integrate DeepSeek-V3.
  2. Use it for batch or interactive inference, tailored for PyTorch-based workflows.

4. Using TensorRT-LLM

TensorRT-LLM can be used for inference with DeepSeek-V3. It supports BF16 and INT4/INT8 precision:

  1. Clone the TRT-LLM repository and follow setup instructions for DeepSeek-V3 support.
  2. Use TensorRT-LLM for high-performance inference with optimized precision modes.

5. Using vLLM

vLLM supports FP8 and BF16 modes, offering pipeline parallelism for DeepSeek-V3 inference:

  1. Set up vLLM as per the official documentation.
  2. Run DeepSeek-V3 on multiple connected machines for efficient processing.

DeepSeek-V3 API

If you prefer not to run DeepSeek locally, you can access DeepSeek-V3 via the official API.

Visit DeepSeek Platform to get started with API access. You can use it in a manner compatible with OpenAI APIs, making it easier to integrate into various applications.

Model Performance Evaluation

DeepSeek-V3 delivers superior performance on various benchmarks, excelling in tasks like math, code, and reasoning. It stands out against both open-source and closed-source models.

BenchmarkDeepSeek-V3Other Models
MMLU (Accuracy)87.1%85.0%
HumanEval (Pass@1)65.2%53.0%
Math (MATH EM)61.6%54.4%

For more information, check out the DeepSeek GitHub repository or visit the official documentation for advanced usage.

Advantages of Using DeepSeek AI

DeepSeek AI offers significant performance advantages, especially in complex tasks such as math and coding. With its impressive 671 billion parameters, it consistently outperforms many models, providing a high level of accuracy and reliability. The model is optimized for both speed and efficiency, making it capable of fast inference through Multi-Token Prediction. This ensures that tasks are completed quickly without sacrificing quality, making it ideal for real-time applications.

  • One of the key strengths of DeepSeek AI is its cost-effective training process. It utilizes only 2.7 million GPU hours, significantly reducing the computational expenses compared to other models with similar capabilities.
  • Despite this efficient training, DeepSeek AI maintains stable performance and doesn’t face major issues during its learning phase. This stability adds to its reliability in long-term use.
  • The flexibility of DeepSeek AI is another major advantage. It supports deployment across various hardware platforms, including NVIDIA, AMD, and Huawei Ascend, making it versatile for a wide range of environments.
  • It is also compatible with popular frameworks and can be run both locally or on the cloud, providing scalability and adaptability for different use cases.

Finally, DeepSeek AI’s open-source nature fosters a collaborative environment, allowing users to contribute, customize, and improve the model. Its advanced reasoning capabilities, powered by distilled knowledge from DeepSeek-R1, enhance its problem-solving abilities, making it an excellent tool for complex decision-making and critical tasks.

Additionally, it supports multiple languages, including Chinese, further expanding its usability in global applications.