Getting Started

This guide will help you set up Arandu and run your first pipeline.

Prerequisites

Required

Python 3.13+
FFmpeg (for audio/video processing)
uv (recommended) or pip

Optional

Google Drive credentials (for Drive integration)
Ollama or OpenAI API key (for QA/KG pipelines)
Docker (for containerized deployment)

Installation

Using uv (Recommended)

# Clone repository
git clone https://github.com/FredDsR/arandu.git
cd arandu

# Install dependencies
uv sync

# Verify installation
uv run arandu --help

Using pip

# Clone repository
git clone https://github.com/FredDsR/arandu.git
cd arandu

# Install in editable mode
pip install -e .

# Verify installation
arandu --help

Install FFmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg

# Verify installation
ffmpeg -version

Quick Start

1. Transcribe a Local File

arandu transcribe audio.mp3

2. Check System Info

arandu info

This shows your hardware configuration (CPU, GPU, memory).

3. Transcribe with Options

# Use faster turbo model
arandu transcribe audio.mp3 --model-id openai/whisper-large-v3-turbo

# Use quantization for reduced VRAM
arandu transcribe audio.mp3 --quantize

# Force CPU execution
arandu transcribe audio.mp3 --cpu

Google Drive Setup (Optional)

For processing files from Google Drive:

Get credentials from Google Cloud Console
Enable the Google Drive API
Create OAuth2 credentials and download as credentials.json
Place in project root

# Transcribe from Google Drive
arandu drive-transcribe <file-id> --credentials credentials.json

LLM Setup (For QA/KG Pipelines)

Using Ollama (Recommended for Local)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull qwen3:14b

# Start Ollama server
ollama serve

Using OpenAI

export OPENAI_API_KEY=sk-...

What’s Next?

Task	Guide
Process multiple files	Transcription Guide
Validate transcriptions	Transcription Validation Guide
Generate QA pairs	QA Generation Guide
Build knowledge graphs	KG Construction Guide
Evaluate quality	Evaluation Guide
Configure settings	Configuration Reference

Pipeline Overview

Audio/Video Files
       │
       ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Transcription│ ──▶ │      QA      │ ──▶ │      KG      │
│   Pipeline   │     │  Generation  │     │ Construction │
└──────────────┘     └──────────────┘     └──────────────┘
       │                    │                    │
       └────────────────────┴────────────────────┘
                           │
                           ▼
                   ┌──────────────┐
                   │  Evaluation  │
                   └──────────────┘

Troubleshooting

”No module named ‘arandu’"

pip install -e .
# or
uv sync

"FFmpeg not found"

sudo apt-get install ffmpeg  # Linux
brew install ffmpeg          # macOS

"CUDA out of memory”

# Use quantization
arandu transcribe audio.mp3 --quantize

# Or force CPU
arandu transcribe audio.mp3 --cpu