Summary of "Run YOUR own UNCENSORED AI & Use it for Hacking"
Overview
Goal: run an “uncensored” large language model (LLM) you control on a cloud VPS so it will answer any prompt (including hacking-related prompts shown in the demo).
High level approach:
- Find a compatible model on HuggingFace.
- Deploy to a cloud server preconfigured with Olama + Open Web UI.
- Download and run the model on that server.
- Interact with the model through a browser from any device.
Key platforms, components and concepts
-
HuggingFace.co Used as the model catalog. Filters used in the video: text-generation, library/app, inference provider, and “uncensored” + Olama compatibility.
-
Olama LLM runtime/framework used to host and serve models on the server.
-
Open Web UI Web interface preinstalled on the VPS to manage models, start chats, upload files, use voice/dictation, enable integrations, toggle a code interpreter, etc.
-
Hostinger VPS (sponsor) Shown because it offers prebuilt images with Olama + Open Web UI. Recommended tiers in the demo: KVM4/KVM8 (32 GB RAM, 8 vCPU) for heavier models.
-
Model quantization / sizes Trade-offs explained: larger models typically give better quality but require more RAM, disk, CPU/GPU. Example quantized variants mentioned:
- Q4KM medium: ~18.7 GB
- Smaller variants: ~11.9 GB and ~6.5 GB
-
Hardware compatibility feature on HuggingFace Add your CPU/RAM spec to see which model sizes are feasible locally.
-
Cloud benefits Offloads compute/storage, provides always-on availability, and is accessible from any browser-enabled device.
Step-by-step process (high-level tutorial)
- Browse HuggingFace models → filter for text-generation, Olama-compatible, and “uncensored”.
- Choose a model/version (note quantization and required storage/RAM).
- Provision a cloud VPS (the video uses Hostinger) with Olama + Open Web UI preinstalled — pick a server tier with sufficient RAM/CPU.
- Log into Open Web UI and create an admin account.
- In Open Web UI → Admin panel → Settings → Manage Models → paste the model pull/install command (from the HuggingFace/Olama page) to download the model directly to the cloud server.
- Wait for the model to download and finalize installation on the server.
- In the web UI: rename/edit models, select which model to chat with, interact via text/voice/file upload, and use the code interpreter if available.
- Optional: download multiple models (e.g., a coder model + a reasoning model) and switch between them in the interface.
Features demonstrated in the UI
- Chat interface with model selection
- File upload, screenshots, web page attachments and notes
- Voice/dictation input
- Code interpreter toggle (if model supports it)
- Chain-of-thought / reasoning display for models that expose internal “thinking”
- Model management: rename, change icons, view quantization info, unload/eject models to free resources
Models demonstrated (as named in the video)
- Default small model: “Lama 3.2” (1B) — included on the instance
- Coding model: “Qwen-3-coder” (presenter states ~30B, used for code generation)
- Reasoning model: a larger “thinking” model (shows chain-of-thought and slower, reflective responses)
Example sizes shown during selection: 6.5 GB, 11.9 GB, ~18.7 GB (Q4KM quantization).
Demo behavior shown
- The presenter downloads two models to the cloud instance and runs chats.
- The coding model immediately generates a Python Windows keylogger example (demo shows uncensored output).
- The reasoning model displays chain-of-thought style reasoning and produces more reflective (slower) responses.
- Emphasis: using the cloud avoids local resource constraints and makes models accessible anywhere through a browser.
Resource and cost notes
- Recommended VPS: 32 GB RAM / 8 vCPU (KVM4/KVM8 on Hostinger in the demo).
- Model disk/RAM requirements depend on quantization; check HuggingFace hardware compatibility before selecting.
- Sponsor claim: certain Hostinger tiers can be cheaper than some subscription-based LLM services (promo code referenced in the video).
Additional content / courses referenced
- A “masterclass” and hacking course are mentioned that cover:
- Running models locally
- Using AI for hacking tasks
- Agents that automate actions
- Prompt injection techniques
Main speaker / sources
- Presenter: Zade (channel/brand: zSecurity / Gecurity — referred to as “Zade from Gecurity”)
- Platforms/technologies shown: HuggingFace.co, Olama, Open Web UI, Hostinger (cloud/VPS)
- LLMs referenced: Llama/Lama 3.2, Qwen-3-coder, and a larger reasoning model
Note: subtitles were auto-generated and contain some name/number inconsistencies; model names and sizes above are summarized as presented in the video.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.