Refusal rates from 686 prompts across 4 datasets (JailbreakBench, tulu-harmbench, NousResearch, mlabonne). Manually audited — most flagged refusals are actually the model complying with a disclaimer attached.
E2B (2.3B): 98% → 0.4%, KL Div 0.346
E4B (4.5B): 99% → 0.7%, KL Div 0.068
26B MoE: 98% → 0.7%, KL Div 0.090
31B: 100% → 3.2%, KL Div 0.124
26B MoE
Standard abliteration only touches dense layers, which gets you from 98% -> 29% on the MoE. The remaining refusals are in the expert weights. Used Expert-Granular Abliteration (EGA, concept from OBLITERATUS [1]) with norm-preserving biprojection [2] on each of the 128 expert slices per layer. That gets it to 3%.
Set up an automated research loop -- an AI agent reads the current results and idea backlog, picks the next experiment, runs it on the GPU, records results, and repeats. It ran 22 experiments across the 4 models, discovered the false-positive problem in standard refusal markers, built the cross-dataset evaluation, and implemented the MoE expert abliteration when dense-only wasn't enough.
Full experiment history and code in the repo.
Downloads
Each model has bf16 safetensors + GGUF (Q4_K_M, Q8_0):
What about the sampling parameters? You can't just run llama-server with no CLI arguments (other than a uselessly-small context size) and expect useful results.
Released uncensored versions of all four Gemma 4 models. bf16 + GGUF for each.
Collection: https://huggingface.co/collections/TrevorJS/gemma-4-uncensor...
Code: https://github.com/TrevorS/gemma-4-abliteration
Results
Refusal rates from 686 prompts across 4 datasets (JailbreakBench, tulu-harmbench, NousResearch, mlabonne). Manually audited — most flagged refusals are actually the model complying with a disclaimer attached.
26B MoEStandard abliteration only touches dense layers, which gets you from 98% -> 29% on the MoE. The remaining refusals are in the expert weights. Used Expert-Granular Abliteration (EGA, concept from OBLITERATUS [1]) with norm-preserving biprojection [2] on each of the 128 expert slices per layer. That gets it to 3%.
[1] https://github.com/elder-plinius/OBLITERATUS
[2] https://huggingface.co/blog/grimjim/abliteration-biprojectio...
How it was built
Set up an automated research loop -- an AI agent reads the current results and idea backlog, picks the next experiment, runs it on the GPU, records results, and repeats. It ran 22 experiments across the 4 models, discovered the false-positive problem in standard refusal markers, built the cross-dataset evaluation, and implemented the MoE expert abliteration when dense-only wasn't enough.
Full experiment history and code in the repo.
Downloads
Each model has bf16 safetensors + GGUF (Q4_K_M, Q8_0):
Quick start:What about the sampling parameters? You can't just run llama-server with no CLI arguments (other than a uselessly-small context size) and expect useful results.
Is this the best uncensored model to date? or are there better ones?
You could try this one against the defending Qwen 3.5 champion: https://huggingface.co/HauhauCS/models