LLM Connection

Mailslide supports using local LLMs for inference (such as through llama.cpp), allowing you to ensure email privacy and data security while offline or without relying on external cloud APIs. You can also use an OpenAI API Key to connect to OpenAI's models or OpenAI API-compatible cloud services.

Warning: AI will read your emails. Please ensure the emails processed by AI do not contain personal privacy or business secrets, or use a local model to process them.

Disclaimer: Mailslide only provides a tool and does not guarantee the security of data processed by third-party or local AI models. The author of this project is not responsible for any data leaks, confidentiality breaches, or related damages caused by user input or model services. Please evaluate the risks yourself and adhere to your organizational policies and regulations before use.

1. Using llama.cpp as a Local LLM

Download the llama-server executable for your operating system from the llama.cpp Releases. (For Windows versions without a dedicated graphics card, you can download the CPU version; although it's only suitable for smaller models, small models still perform well for simple text classification tasks.)

2. Start the Server

We recommend using small models like qwen3.5. Parameters of 2B or 4B show good results in email classification scenarios. Small models can also easily run on the CPU; for example, a 2B/4B Q8 quantized version might only occupy about 2-4GB of RAM. However, system prompts need to be adjusted multiple times to achieve the desired classification effect.

You can download Qwen3.5-4B-Q8_0.gguf or other models of your choice from Huggingface.

# Basic startup command
.\llama-server.exe -m .\Qwen3.5-4B-Q8_0.gguf --port 8080

# Disable thinking mode (recommended)
.\llama-server.exe -m .\Qwen3.5-4B-Q8_0.gguf --port 8080 --chat_template_kwargs '{"enable_thinking":false}'

Parameter	Description
`-m`	Model GGUF file path
`--port`	Server connection port (default 8080)
`--chat_template_kwargs`	Additional parameters, such as disabling thinking features to speed up text classification

3. Configure LLM Parameters (llm-config.yaml)

Open the TUI's LLM Settings or edit config/llm-config.yaml:

api_base: "http://localhost:8080/v1"
api_key: "any"
model: "Qwen3.5-4B-Q8_0.gguf"

Parameter description: - api_base: The local address of the llama.cpp server (must include /v1 at the end to be compatible with the OpenAI format). - api_key: For unauthenticated local llama.cpp, entering any string is fine. - model: Set this according to the name of the model you loaded into the server.

The api_key is encrypted and securely stored using Windows DPAPI. Please do not write the api_key directly into the yaml file; use the LLM Settings page in the TUI to configure it.