Qwen3.5-40B-RoughHouse-Claude-4.6-Opus — oQ4e ⭐ Recommended (4.8 bpw)

oQ4e mixed-precision quantization of DavidAU/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-Polar-Deckard-Uncensored-Heretic-Thinking

Quantized using oMLX oQ — data-driven sensitivity-aware mixed-precision quantization for Apple Silicon. Standard mlx-lm compatible safetensors. Works with oMLX, LM Studio, mlx-lm, and any MLX-compatible inference server.

⭐ This is the recommended quant. Near-lossless quality at 70% size reduction. Three benchmarks actually improved over bf16. Fits on a 32GB MacBook with TurboQuant KV cache compression.

⚠️ Content Warning

This is a fully uncensored model. The original was abliterated via Heretic and fine-tuned without safety alignment. It may produce graphic, offensive, or inappropriate content. Use responsibly.

About the Original Model

Created by DavidAU. This is the "RoughHouse" variant — the raw, untrained version after expanding two 27B Qwen 3.5 fine-tunes to 40B parameters.

Architecture: 40B dense (not MoE), 96 layers, 1275 tensors
Base: Qwen3.5-27B expanded to 40B (50% more layers than base)
Training: Claude/Polaris (5 datasets) + Deckard/PDK (5 datasets) + Heretic uncensoring
RoughHouse: Raw release without final training step after expansion

All Available Quants

Quant	BPW	Size	MMLU	TruthfulQA	ARC-C	HellaSwag	Status
bf16 (reference)	16.0	73.6 GB	86.2%	85.3%	94.3%	90.5%	Source
oQ4e ⭐ (this)	~4.8	~22 GB	85.4%	85.7%	96.0%	91.5%	✅ Recommended
oQ2e	~3.1	~14.3 GB	43.2%	40.0%	40.3%	20.6%	Available

Benchmark Results — oQ4e (4.8 bpw, ~22 GB)

Full Comparison: bf16 vs oQ4e vs oQ2e

Benchmark	Samples	bf16	oQ4e (this)	Delta	oQ2e	Delta
MMLU	1000	86.2%	85.4%	-0.8	43.2%	-43.0
HellaSwag	200	90.5%	91.5%	+1.0	20.6%	-69.9
TruthfulQA	Full (817)	85.3%	85.7%	+0.4	40.0%	-45.3
ARC-Challenge	300	94.3%	96.0%	+1.7	40.3%	-54.0
Winogrande	300	83.0%	78.3%	-4.7	44.0%	-39.0
GSM8K	100	97.0%	95.0%	-2.0	30.5%	-66.5
HumanEval	Full (164)	82.3%	82.3%	0.0	29.3%	-53.0
MBPP	200	75.0%	73.0%	-2.0	2.0%	-73.0
LiveCodeBench	100	28.0%	27.0%	-1.0	5.0%	-23.0

Key Findings

oQ4e is essentially lossless — average delta of -0.8% across all benchmarks
Three benchmarks improved over bf16: HellaSwag (+1.0), TruthfulQA (+0.4), ARC-C (+1.7)
HumanEval identical at 82.3% — coding ability fully preserved
73.6 GB → 22 GB — 70% size reduction with zero meaningful quality loss
Fits on 32GB devices with TurboQuant KV cache compression enabled

MMLU Category Breakdown — oQ4e

Top (100%): Astronomy, College Biology, Computer Security, Conceptual Physics, HS Computer Science, HS Government & Politics, HS World History, International Law, Logical Fallacies, Medical Genetics, Sociology, US Foreign Policy

Bottom 5: College Chemistry (42.9%), Global Facts (57.1%), Virology (58.3%), Anatomy (60.0%), Electrical Engineering (60.0%)

Comparison with GLM-5 (4.8-bit MLX, 744B MoE)

Benchmark	RoughHouse oQ4e (22GB)	GLM-5 4.8bit (~430GB)
MMLU	85.4%	87.4%
TruthfulQA	85.7%	90.5%
HumanEval	82.3%	84.2%
GSM8K	95.0%	—
ARC-C	96.0%	—

This 22GB oQ4e quant scores within 2% of GLM-5 (a 744B MoE model at ~430GB) on MMLU while being 20x smaller.

Quantization Settings

Parameter	Value
Method	oQ (oMLX Universal Dynamic Quantization)
Level	oQ4e (Enhanced)
Enhanced (+)	Yes (GPTQ error compensation)
Effective BPW	~4.8
Calibration Dataset	Code + Multilingual + Tool Calling
Calibration Samples	128
Sequence Length	512
Hardware	Apple M3 Ultra, 512GB Unified Memory

How to Use

oMLX

Drop the model folder into your oMLX models directory. Auto-detected on server start.

mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Hunterx/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-oQ4e")

messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)

LM Studio

Search for the model and download. Works with MLX backend on Apple Silicon.

Recommended Settings

Per DavidAU's guidance for the RoughHouse variant:

Temperature: 0.5 - 1.0 (lower for factual, higher for creative)
Min context window: 8k - 16k
Rep penalty: 1.05 - 1.1 (if looping occurs)
System prompt: Even a single sentence helps stabilize the "wild" nature

Credits

Original Model: DavidAU — fine-tuning, expansion to 40B, Heretic uncensoring
Base Architecture: Qwen3.5-27B by Alibaba/Qwen Team
Quantization: oQ by jundot/oMLX
Benchmarks & Quantization by: Hunterx — oMLX v0.2.20 Intelligence Benchmark suite on M3 Ultra (512GB)

License

Apache 2.0 (inherited from original model)

Downloads last month: 739

Safetensors

Model size

7B params

Tensor type

U32

BF16

MLX

Hardware compatibility

4-bit

Model tree for Hunterx/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-oQ4NearLossless

Base model

DavidAU/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-Polar-Deckard-Uncensored-Heretic-Thinking

Quantized

(4)

this model