Summary of "Run YOUR own UNCENSORED AI & Use it for Hacking"

Overview

Goal: run an “uncensored” large language model (LLM) you control on a cloud VPS so it will answer any prompt (including hacking-related prompts shown in the demo).

High level approach:

Find a compatible model on HuggingFace.
Deploy to a cloud server preconfigured with Olama + Open Web UI.
Download and run the model on that server.
Interact with the model through a browser from any device.

Key platforms, components and concepts

HuggingFace.co Used as the model catalog. Filters used in the video: text-generation, library/app, inference provider, and “uncensored” + Olama compatibility.
Olama LLM runtime/framework used to host and serve models on the server.
Open Web UI Web interface preinstalled on the VPS to manage models, start chats, upload files, use voice/dictation, enable integrations, toggle a code interpreter, etc.
Hostinger VPS (sponsor) Shown because it offers prebuilt images with Olama + Open Web UI. Recommended tiers in the demo: KVM4/KVM8 (32 GB RAM, 8 vCPU) for heavier models.
Model quantization / sizes Trade-offs explained: larger models typically give better quality but require more RAM, disk, CPU/GPU. Example quantized variants mentioned:
- Q4KM medium: ~18.7 GB
- Smaller variants: ~11.9 GB and ~6.5 GB
Hardware compatibility feature on HuggingFace Add your CPU/RAM spec to see which model sizes are feasible locally.
Cloud benefits Offloads compute/storage, provides always-on availability, and is accessible from any browser-enabled device.

Step-by-step process (high-level tutorial)

Browse HuggingFace models → filter for text-generation, Olama-compatible, and “uncensored”.
Choose a model/version (note quantization and required storage/RAM).
Provision a cloud VPS (the video uses Hostinger) with Olama + Open Web UI preinstalled — pick a server tier with sufficient RAM/CPU.
Log into Open Web UI and create an admin account.
In Open Web UI → Admin panel → Settings → Manage Models → paste the model pull/install command (from the HuggingFace/Olama page) to download the model directly to the cloud server.
Wait for the model to download and finalize installation on the server.
In the web UI: rename/edit models, select which model to chat with, interact via text/voice/file upload, and use the code interpreter if available.
Optional: download multiple models (e.g., a coder model + a reasoning model) and switch between them in the interface.

Features demonstrated in the UI

Chat interface with model selection
File upload, screenshots, web page attachments and notes
Voice/dictation input
Code interpreter toggle (if model supports it)
Chain-of-thought / reasoning display for models that expose internal “thinking”
Model management: rename, change icons, view quantization info, unload/eject models to free resources

Models demonstrated (as named in the video)

Default small model: “Lama 3.2” (1B) — included on the instance
Coding model: “Qwen-3-coder” (presenter states ~30B, used for code generation)
Reasoning model: a larger “thinking” model (shows chain-of-thought and slower, reflective responses)

Example sizes shown during selection: 6.5 GB, 11.9 GB, ~18.7 GB (Q4KM quantization).

Demo behavior shown

The presenter downloads two models to the cloud instance and runs chats.
The coding model immediately generates a Python Windows keylogger example (demo shows uncensored output).
The reasoning model displays chain-of-thought style reasoning and produces more reflective (slower) responses.
Emphasis: using the cloud avoids local resource constraints and makes models accessible anywhere through a browser.

Resource and cost notes

Recommended VPS: 32 GB RAM / 8 vCPU (KVM4/KVM8 on Hostinger in the demo).
Model disk/RAM requirements depend on quantization; check HuggingFace hardware compatibility before selecting.
Sponsor claim: certain Hostinger tiers can be cheaper than some subscription-based LLM services (promo code referenced in the video).

Additional content / courses referenced

A “masterclass” and hacking course are mentioned that cover:
- Running models locally
- Using AI for hacking tasks
- Agents that automate actions
- Prompt injection techniques

Main speaker / sources

Presenter: Zade (channel/brand: zSecurity / Gecurity — referred to as “Zade from Gecurity”)
Platforms/technologies shown: HuggingFace.co, Olama, Open Web UI, Hostinger (cloud/VPS)
LLMs referenced: Llama/Lama 3.2, Qwen-3-coder, and a larger reasoning model

Note: subtitles were auto-generated and contain some name/number inconsistencies; model names and sizes above are summarized as presented in the video.