Skip to content

Arandu

Composable pipelines for transcription, QA generation, and knowledge graph construction — designed for ethnographic media collections and climate-impact research.

Arandu (Guarani: ara = time + endu = to hear/feel — “wisdom through perceiving time”) is a research pipeline for processing ethnographic media collections.

It combines audio/video transcription, cognitively-scaffolded QA generation, and knowledge graph construction into composable, checkpoint-resilient pipelines.

FeatureDescription
🎙️ TranscriptionWhisper-based transcription with Google Drive integration, parallel workers, and automatic quality validation
🧠 CEP QA GenerationBloom’s Taxonomy-scaffolded QA pairs with LLM-as-a-Judge validation
🕸️ KG ConstructionAutoSchemaKG-powered entity and relation extraction with GraphML export
EvaluationRetrieval-strategy benchmarking using CEP QA pairs as ground truth
Terminal window
# Install
git clone https://github.com/FredDsR/arandu.git && cd arandu
uv sync
# Transcribe a file
arandu transcribe audio.mp3
# Generate QA pairs
arandu generate-cep-qa results/

See the Getting Started guide for full setup instructions.