Getting Started
This guide will help you set up Arandu and run your first pipeline.
Prerequisites
Section titled “Prerequisites”Required
Section titled “Required”- Python 3.13+
- FFmpeg (for audio/video processing)
- uv (recommended) or pip
Optional
Section titled “Optional”- Google Drive credentials (for Drive integration)
- Ollama or OpenAI API key (for QA/KG pipelines)
- Docker (for containerized deployment)
Installation
Section titled “Installation”Using uv (Recommended)
Section titled “Using uv (Recommended)”# Clone repositorygit clone https://github.com/FredDsR/arandu.gitcd arandu
# Install dependenciesuv sync
# Verify installationuv run arandu --helpUsing pip
Section titled “Using pip”# Clone repositorygit clone https://github.com/FredDsR/arandu.gitcd arandu
# Install in editable modepip install -e .
# Verify installationarandu --helpInstall FFmpeg
Section titled “Install FFmpeg”# Ubuntu/Debiansudo apt-get install ffmpeg
# macOSbrew install ffmpeg
# Verify installationffmpeg -versionQuick Start
Section titled “Quick Start”1. Transcribe a Local File
Section titled “1. Transcribe a Local File”arandu transcribe audio.mp32. Check System Info
Section titled “2. Check System Info”arandu infoThis shows your hardware configuration (CPU, GPU, memory).
3. Transcribe with Options
Section titled “3. Transcribe with Options”# Use faster turbo modelarandu transcribe audio.mp3 --model-id openai/whisper-large-v3-turbo
# Use quantization for reduced VRAMarandu transcribe audio.mp3 --quantize
# Force CPU executionarandu transcribe audio.mp3 --cpuGoogle Drive Setup (Optional)
Section titled “Google Drive Setup (Optional)”For processing files from Google Drive:
- Get credentials from Google Cloud Console
- Enable the Google Drive API
- Create OAuth2 credentials and download as
credentials.json - Place in project root
# Transcribe from Google Drivearandu drive-transcribe <file-id> --credentials credentials.jsonLLM Setup (For QA/KG Pipelines)
Section titled “LLM Setup (For QA/KG Pipelines)”Using Ollama (Recommended for Local)
Section titled “Using Ollama (Recommended for Local)”# Install Ollamacurl -fsSL https://ollama.ai/install.sh | sh
# Pull a modelollama pull qwen3:14b
# Start Ollama serverollama serveUsing OpenAI
Section titled “Using OpenAI”export OPENAI_API_KEY=sk-...What’s Next?
Section titled “What’s Next?”| Task | Guide |
|---|---|
| Process multiple files | Transcription Guide |
| Validate transcriptions | Transcription Validation Guide |
| Generate QA pairs | QA Generation Guide |
| Build knowledge graphs | KG Construction Guide |
| Evaluate quality | Evaluation Guide |
| Configure settings | Configuration Reference |
Pipeline Overview
Section titled “Pipeline Overview”Audio/Video Files │ ▼┌──────────────┐ ┌──────────────┐ ┌──────────────┐│ Transcription│ ──▶ │ QA │ ──▶ │ KG ││ Pipeline │ │ Generation │ │ Construction │└──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └────────────────────┴────────────────────┘ │ ▼ ┌──────────────┐ │ Evaluation │ └──────────────┘Troubleshooting
Section titled “Troubleshooting””No module named ‘arandu’"
Section titled “”No module named ‘arandu’"”pip install -e .# oruv sync"FFmpeg not found"
Section titled “"FFmpeg not found"”sudo apt-get install ffmpeg # Linuxbrew install ffmpeg # macOS"CUDA out of memory”
Section titled “"CUDA out of memory””# Use quantizationarandu transcribe audio.mp3 --quantize
# Or force CPUarandu transcribe audio.mp3 --cpuSee also: Transcription | Transcription Validation | Configuration | CLI Reference