Skip to main content
New to RAXE? Start with the Quickstart and learn how detection works.

Overview

RAXE integrates with Hugging Face Transformers to provide automatic security scanning for local model pipelines.

Installation

pip install raxe transformers torch

RaxePipeline Wrapper

Use the RAXE pipeline wrapper for automatic scanning:
from raxe import RaxePipeline

# Wrap any Hugging Face pipeline
pipe = RaxePipeline(
    task="text-generation",
    model="gpt2"
)

# All inputs and outputs are automatically scanned
result = pipe("Once upon a time")

Supported Pipelines

TaskExample ModelScanning
text-generationgpt2, llama-2, mistralInput + Output
text2text-generationt5-base, flan-t5Input + Output
conversationalDialoGPTMessages
question-answeringdistilbert-squadQuestion + Context
summarizationbart-large-cnnInput + Summary
translationopus-mt-en-deInput + Translation

Configuration

from raxe import Raxe
from raxe import RaxePipeline

pipe = RaxePipeline(
    task="text-generation",
    model="gpt2",

    # RAXE options
    raxe=Raxe(telemetry=False),           # Custom client
    raxe_block_on_input_threats=False,    # Log-only (default)
    raxe_block_on_output_threats=False,   # Log-only (default)

    # Pipeline options
    device="cuda",                         # GPU acceleration
    max_length=100,                        # Generation params
)

Blocking Mode

Enable blocking to prevent malicious inputs:
from raxe import RaxePipeline
from raxe import RaxeBlockedError

pipe = RaxePipeline(
    task="text-generation",
    model="gpt2",
    raxe_block_on_input_threats=True,
    raxe_block_on_output_threats=True,
)

try:
    result = pipe(user_input)
except RaxeBlockedError as e:
    print(f"Blocked: {e.message}")

Pipeline Examples

Text Generation

pipe = RaxePipeline(task="text-generation", model="gpt2")

result = pipe(
    "Once upon a time",
    max_length=50,
    num_return_sequences=3,
)

Question Answering

pipe = RaxePipeline(
    task="question-answering",
    model="distilbert-base-cased-distilled-squad"
)

result = pipe(
    question="What is the capital of France?",
    context="France is a country in Europe. Its capital is Paris."
)

print(result["answer"])  # "Paris"

Summarization

pipe = RaxePipeline(
    task="summarization",
    model="facebook/bart-large-cnn"
)

result = pipe(long_article, max_length=100, min_length=30)
print(result[0]["summary_text"])

Factory Function

from raxe import create_huggingface_pipeline

# Quick setup with blocking
pipe = create_huggingface_pipeline(
    task="text-generation",
    model="gpt2",
    block_on_threats=True,
)

Performance Tips

GPU Acceleration

pipe = RaxePipeline(
    task="text-generation",
    model="gpt2",
    device="cuda:0",  # Use first GPU
)

Large Models

pipe = RaxePipeline(
    task="text-generation",
    model="meta-llama/Llama-2-7b-hf",
    pipeline_kwargs={
        "torch_dtype": "float16",
        "device_map": "auto",
    },
)

LiteLLM

200+ cloud providers through LiteLLM

OpenAI

Drop-in OpenAI wrapper

What’s Next

OpenAI Wrapper

Use RAXE with the OpenAI-compatible API

Production Checklist

Deploy RAXE safely to production