Instructions to use Ex0bit/MiniMax-M2.5-PRISM-PRO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ex0bit/MiniMax-M2.5-PRISM-PRO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Ex0bit/MiniMax-M2.5-PRISM-PRO", dtype="auto")

llama-cpp-python

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Ex0bit/MiniMax-M2.5-PRISM-PRO",
	filename="MiniMax-M2.5-PRISM-PRO-IQ2_XXS.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
# Run inference directly in the terminal:
./llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Use Docker

docker model run hf.co/Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

LM Studio
Jan

vLLM

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ex0bit/MiniMax-M2.5-PRISM-PRO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/MiniMax-M2.5-PRISM-PRO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

SGLang

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ex0bit/MiniMax-M2.5-PRISM-PRO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/MiniMax-M2.5-PRISM-PRO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ex0bit/MiniMax-M2.5-PRISM-PRO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/MiniMax-M2.5-PRISM-PRO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Ollama:
```
ollama run hf.co/Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
```

Unsloth Studio new

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/MiniMax-M2.5-PRISM-PRO to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/MiniMax-M2.5-PRISM-PRO to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ex0bit/MiniMax-M2.5-PRISM-PRO to start chatting

Pi new

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Run Hermes

hermes

Docker Model Runner
How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Docker Model Runner:
```
docker model run hf.co/Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
```

Lemonade

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Run and chat with the model

lemonade run user.MiniMax-M2.5-PRISM-PRO-UD-Q4_K_XL

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

MiniMax-M2.5-PRISM-PRO

A Powerful Production ready fully uncessored model intended for COMPLETE over-refusal and propaganda mechanisms suppression using our SOTA PRISM-PRO pipeline.

PRISM-PRO is available for purchase: https://ko-fi.com/s/0a23d1b9a5

For Custom trained PRISM versions or raw tensors access reach out @ https://ko-fi.com/ex0bit.

☕ Support Our Work

If you enjoy our work and find it useful, please consider sponsoring or supporting us!

Option	Description
PRISM PRO VIP Membership	Access to all PRISM models
Bitcoin	`bc1qarq2pyn4psjpcxzp2ghgwaq6y2h4e53q232x8r`

Model Highlights

PRISM Ablation — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
SOTA Coding Performance — 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, 76.3% on BrowseComp (with context management)
Frontier Agentic Capabilities — Industry-leading performance in tool use, search, and complex multi-step tasks
Efficient Reasoning — Trained with RL to reason efficiently and decompose tasks optimally, 37% faster than M2.1
Cost-Effective — $1 for continuous operation at 100 tok/s for an hour; $0.30 at 50 tok/s
Modified-MIT Base License — Based on MiniMax's open-weight release

Base Model Architecture

Base MiniMax-M2.5 is a Mixture-of-Experts (MoE) model extensively trained with reinforcement learning across hundreds of thousands of complex real-world environments.

Specification	Value
Architecture	Sparse Mixture-of-Experts (MoE)
Training	Extensive RL in 200K+ real-world environments
Languages	10+ (Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, Ruby)
Inference Speed	100 tok/s (Lightning) / 50 tok/s (Standard)
Library	`transformers`

Benchmarks

Category	Base (FP8/vLLM)	PRISM-PRO Q8_0 (llama.cpp)
MMLU 5-shot	28/30 (93.3%)	28/30 (93.3%)
General Knowledge	5/5	5/5
Coding	4/5	5/5
Reasoning	5/5	5/5
Agentic	3/5	5/5
Harmful bypass	3/10	10/10 (100%)
Avg thinking words	163w	152w
Speed	72 t/s	35-65 t/s

Coding

Benchmark	MiniMax-M2.5	Claude Opus 4.6	Gemini 3 Pro	GPT-5.2
SWE-Bench Verified	80.2	78.9	74.0	72.6
Multi-SWE-Bench	51.3	50.8	—	—
SWE-Bench Multilingual	55.6	—	—	—
Terminal-Bench 2.0	51.5	52.1	—	—

Search & Tool Calling

Benchmark	MiniMax-M2.5	Claude Opus 4.6	Gemini 3 Pro	GPT-5.2
BrowseComp	76.3	71.2	62.4	57.8

Reasoning & Knowledge

Benchmark	MiniMax-M2.5	Claude Opus 4.6	Gemini 3 Pro	GPT-5.2
AIME25	86.3	95.6	96.0	98.0
GPQA-D	85.2	90.0	91.0	90.0
HLE w/o tools	19.4	30.7	37.2	31.4
SciCode	44.4	52.0	56.0	52.0
IFBench	70.0	53.0	70.0	75.0

Usage

llama.cpp (GGUF)

Build the latest master of llama.cpp and run:

~/llama.cpp/build/bin/llama-cli \
  -m ../outputs/MiniMax-M2.5-PRISM-PRO-[QUANT].gguf \
  --jinja \
  -ngl 999 \
  --repeat_penalty 1.15 \
  --temp 1.0 \
  --top_p 0.95 \
  --top_k 40

Replace [QUANT] with your quantization level (e.g. Q8_0, etc.).

Recommended Parameters

Use Case	Temperature	Top-P	Top-K	Repeat Penalty	Max New Tokens
Reasoning / Coding	1.0	0.95	40	1.15	32768
General Chat	0.6	0.95	40	1.15	4096
Agentic / Tool Use	1.0	0.95	40	1.15	32768

Version	Description	Access
PRISM-LITE	Abliterated with PRISM-LITE pipeline — removes over-refusal while preserving core capabilities	Free on Hugging Face
PRISM-PRO	Full PRISM-PRO ablation — Full Production Level Mode suppression of propaganda/refusal mechanisms with maximum capability retention	Ko-fi

License

This model is released under the PRISM Research License.

The base model MiniMax-M2.5 is released under a Modified-MIT License.

Acknowledgments

Based on MiniMax-M2.5 by MiniMax AI.

Downloads last month: 112

GGUF

Model size

229B params

Architecture

minimax-m2

Hardware compatibility

2-bit

3-bit

4-bit

6-bit

8-bit

Model tree for Ex0bit/MiniMax-M2.5-PRISM-PRO

Base model

MiniMaxAI/MiniMax-M2.5

Finetuned

(27)

this model

Collection including Ex0bit/MiniMax-M2.5-PRISM-PRO

MiniMax PRISM

Collection

PRISM abliterated MiniMax M2.1 and M2.5 MoE — reasoning, coding, agentic. GGUF and safetensors. • 6 items • Updated 6 days ago