Qwen3.5-40B-RoughHouse-Claude-4.6-Opus — oQ4e ⭐ Recommended (4.8 bpw)

oQ4e mixed-precision quantization of DavidAU/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-Polar-Deckard-Uncensored-Heretic-Thinking

Quantized using oMLX oQ — data-driven sensitivity-aware mixed-precision quantization for Apple Silicon. Standard mlx-lm compatible safetensors. Works with oMLX, LM Studio, mlx-lm, and any MLX-compatible inference server.

This is the recommended quant. Near-lossless quality at 70% size reduction. Three benchmarks actually improved over bf16. Fits on a 32GB MacBook with TurboQuant KV cache compression.


⚠️ Content Warning

This is a fully uncensored model. The original was abliterated via Heretic and fine-tuned without safety alignment. It may produce graphic, offensive, or inappropriate content. Use responsibly.


About the Original Model

Created by DavidAU. This is the "RoughHouse" variant — the raw, untrained version after expanding two 27B Qwen 3.5 fine-tunes to 40B parameters.

  • Architecture: 40B dense (not MoE), 96 layers, 1275 tensors
  • Base: Qwen3.5-27B expanded to 40B (50% more layers than base)
  • Training: Claude/Polaris (5 datasets) + Deckard/PDK (5 datasets) + Heretic uncensoring
  • RoughHouse: Raw release without final training step after expansion

All Available Quants

Quant BPW Size MMLU TruthfulQA ARC-C HellaSwag Status
bf16 (reference) 16.0 73.6 GB 86.2% 85.3% 94.3% 90.5% Source
oQ4e ⭐ (this) ~4.8 ~22 GB 85.4% 85.7% 96.0% 91.5% ✅ Recommended
oQ2e ~3.1 ~14.3 GB 43.2% 40.0% 40.3% 20.6% Available

Benchmark Results — oQ4e (4.8 bpw, ~22 GB)

Full Comparison: bf16 vs oQ4e vs oQ2e

Benchmark Samples bf16 oQ4e (this) Delta oQ2e Delta
MMLU 1000 86.2% 85.4% -0.8 43.2% -43.0
HellaSwag 200 90.5% 91.5% +1.0 20.6% -69.9
TruthfulQA Full (817) 85.3% 85.7% +0.4 40.0% -45.3
ARC-Challenge 300 94.3% 96.0% +1.7 40.3% -54.0
Winogrande 300 83.0% 78.3% -4.7 44.0% -39.0
GSM8K 100 97.0% 95.0% -2.0 30.5% -66.5
HumanEval Full (164) 82.3% 82.3% 0.0 29.3% -53.0
MBPP 200 75.0% 73.0% -2.0 2.0% -73.0
LiveCodeBench 100 28.0% 27.0% -1.0 5.0% -23.0

Key Findings

  • oQ4e is essentially lossless — average delta of -0.8% across all benchmarks
  • Three benchmarks improved over bf16: HellaSwag (+1.0), TruthfulQA (+0.4), ARC-C (+1.7)
  • HumanEval identical at 82.3% — coding ability fully preserved
  • 73.6 GB → 22 GB — 70% size reduction with zero meaningful quality loss
  • Fits on 32GB devices with TurboQuant KV cache compression enabled

MMLU Category Breakdown — oQ4e

Top (100%): Astronomy, College Biology, Computer Security, Conceptual Physics, HS Computer Science, HS Government & Politics, HS World History, International Law, Logical Fallacies, Medical Genetics, Sociology, US Foreign Policy

Bottom 5: College Chemistry (42.9%), Global Facts (57.1%), Virology (58.3%), Anatomy (60.0%), Electrical Engineering (60.0%)

Comparison with GLM-5 (4.8-bit MLX, 744B MoE)

Benchmark RoughHouse oQ4e (22GB) GLM-5 4.8bit (~430GB)
MMLU 85.4% 87.4%
TruthfulQA 85.7% 90.5%
HumanEval 82.3% 84.2%
GSM8K 95.0%
ARC-C 96.0%

This 22GB oQ4e quant scores within 2% of GLM-5 (a 744B MoE model at ~430GB) on MMLU while being 20x smaller.


Quantization Settings

Parameter Value
Method oQ (oMLX Universal Dynamic Quantization)
Level oQ4e (Enhanced)
Enhanced (+) Yes (GPTQ error compensation)
Effective BPW ~4.8
Calibration Dataset Code + Multilingual + Tool Calling
Calibration Samples 128
Sequence Length 512
Hardware Apple M3 Ultra, 512GB Unified Memory

How to Use

oMLX

Drop the model folder into your oMLX models directory. Auto-detected on server start.

mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Hunterx/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-oQ4e")

messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)

LM Studio

Search for the model and download. Works with MLX backend on Apple Silicon.

Recommended Settings

Per DavidAU's guidance for the RoughHouse variant:

  • Temperature: 0.5 - 1.0 (lower for factual, higher for creative)
  • Min context window: 8k - 16k
  • Rep penalty: 1.05 - 1.1 (if looping occurs)
  • System prompt: Even a single sentence helps stabilize the "wild" nature

Credits

  • Original Model: DavidAU — fine-tuning, expansion to 40B, Heretic uncensoring
  • Base Architecture: Qwen3.5-27B by Alibaba/Qwen Team
  • Quantization: oQ by jundot/oMLX
  • Benchmarks & Quantization by: Hunterx — oMLX v0.2.20 Intelligence Benchmark suite on M3 Ultra (512GB)

License

Apache 2.0 (inherited from original model)

Downloads last month
739
Safetensors
Model size
7B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for Hunterx/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-oQ4NearLossless