Gemma 4 Uncensored (autoresearch results)

(huggingface.co)

4 points | by adefa 2 hours ago ago

4 comments

$adefa 2 hours ago
Released uncensored versions of all four Gemma 4 models. bf16 + GGUF for each.
Collection: https://huggingface.co/collections/TrevorJS/gemma-4-uncensor...
Code: https://github.com/TrevorS/gemma-4-abliteration
Results
Refusal rates from 686 prompts across 4 datasets (JailbreakBench, tulu-harmbench, NousResearch, mlabonne). Manually audited — most flagged refusals are actually the model complying with a disclaimer attached.
```
  E2B (2.3B): 98% → 0.4%, KL Div 0.346
  E4B (4.5B): 99% → 0.7%, KL Div 0.068
  26B MoE:    98% → 0.7%, KL Div 0.090
  31B:       100% → 3.2%, KL Div 0.124
```
26B MoE
Standard abliteration only touches dense layers, which gets you from 98% -> 29% on the MoE. The remaining refusals are in the expert weights. Used Expert-Granular Abliteration (EGA, concept from OBLITERATUS [1]) with norm-preserving biprojection [2] on each of the 128 expert slices per layer. That gets it to 3%.
[1] https://github.com/elder-plinius/OBLITERATUS
[2] https://huggingface.co/blog/grimjim/abliteration-biprojectio...
How it was built
Set up an automated research loop -- an AI agent reads the current results and idea backlog, picks the next experiment, runs it on the GPU, records results, and repeats. It ran 22 experiments across the 4 models, discovered the false-positive problem in standard refusal markers, built the cross-dataset evaluation, and implemented the MoE expert abliteration when dense-only wasn't enough.
Full experiment history and code in the repo.
Downloads
Each model has bf16 safetensors + GGUF (Q4_K_M, Q8_0):
```
  E2B bf16: https://huggingface.co/TrevorJS/gemma-4-E2B-it-uncensored
  E2B GGUF: https://huggingface.co/TrevorJS/gemma-4-E2B-it-uncensored-GGUF
  E4B bf16: https://huggingface.co/TrevorJS/gemma-4-E4B-it-uncensored
  E4B GGUF: https://huggingface.co/TrevorJS/gemma-4-E4B-it-uncensored-GGUF
  26B bf16: https://huggingface.co/TrevorJS/gemma-4-26B-A4B-it-uncensored
  26B GGUF: https://huggingface.co/TrevorJS/gemma-4-26B-A4B-it-uncensored-GGUF
  31B bf16: https://huggingface.co/TrevorJS/gemma-4-31B-it-uncensored
  31B GGUF: https://huggingface.co/TrevorJS/gemma-4-31B-it-uncensored-GGUF
```
Quick start:
```
  llama-server -hf TrevorJS/gemma-4-26B-A4B-it-uncensored-GGUF -c 8192
```
[-]
- $CamperBob2 33 minutes ago
  
  What about the sampling parameters? You can't just run llama-server with no CLI arguments (other than a uselessly-small context size) and expect useful results.
$stochtinkerer 2 hours ago

Is this the best uncensored model to date? or are there better ones?
[-]
- $CamperBob2 an hour ago
  
  You could try this one against the defending Qwen 3.5 champion: https://huggingface.co/HauhauCS/models