Arandu
Composable pipelines for transcription, QA generation, and knowledge graph construction — designed for ethnographic media collections and climate-impact research.
What is Arandu?
Section titled “What is Arandu?”Arandu (Guarani: ara = time + endu = to hear/feel — “wisdom through perceiving time”) is a research pipeline for processing ethnographic media collections.
It combines audio/video transcription, cognitively-scaffolded QA generation, and knowledge graph construction into composable, checkpoint-resilient pipelines.
Key Features
Section titled “Key Features”| Feature | Description |
|---|---|
| 🎙️ Transcription | Whisper-based transcription with Google Drive integration, parallel workers, and automatic quality validation |
| 🧠 CEP QA Generation | Bloom’s Taxonomy-scaffolded QA pairs with LLM-as-a-Judge validation |
| 🕸️ KG Construction | AutoSchemaKG-powered entity and relation extraction with GraphML export |
| ✅ Evaluation | Retrieval-strategy benchmarking using CEP QA pairs as ground truth |
Quick Start
Section titled “Quick Start”# Installgit clone https://github.com/FredDsR/arandu.git && cd aranduuv sync
# Transcribe a filearandu transcribe audio.mp3
# Generate QA pairsarandu generate-cep-qa results/See the Getting Started guide for full setup instructions.