Dependencies
Complete documentation of all project dependencies for the Knowledge Graph Construction Pipeline.
Table of Contents
Section titled “Table of Contents”- Existing Dependencies
- New Dependencies
- Dependency Groups
- Installation Instructions
- Version Compatibility
- License Information
Existing Dependencies
Section titled “Existing Dependencies”Dependencies from the original transcription pipeline:
Core Dependencies
Section titled “Core Dependencies”| Package | Version | Purpose |
|---|---|---|
accelerate | >=1.12.0 | Model acceleration and device management |
bitsandbytes | >=0.49.1 | Quantization and memory optimization |
google-api-python-client | >=2.100.0 | Google Drive API integration |
google-auth-httplib2 | >=0.1.0 | Google authentication |
google-auth-oauthlib | >=1.0.0 | OAuth2 flow |
pydantic | >=2.0.0 | Data validation, schemas, and JSON serialization |
pydantic-settings | >=2.0.0 | Configuration management with env var support |
rich | >=13.0.0 | Terminal UI and formatting |
sentencepiece | >=0.2.1 | Tokenization for Whisper |
tenacity | >=8.0.0 | Retry logic with exponential backoff |
transformers | >=4.57.3 | Hugging Face transformers |
typer[all] | >=0.9.0 | CLI framework |
ML/AI Dependencies
Section titled “ML/AI Dependencies”| Package | Version | Purpose |
|---|---|---|
torch | (via uv sources) | PyTorch deep learning framework (CUDA 12.4) |
torchvision | (via uv sources) | Vision processing utilities |
torchaudio | (via uv sources) | Audio processing |
New Dependencies
Section titled “New Dependencies”Dependencies added for P2 functionality:
LLM Integration
Section titled “LLM Integration”| Package | Version | Purpose | Used By |
|---|---|---|---|
openai | >=1.0.0 | OpenAI API client (also supports Ollama and other OpenAI-compatible endpoints) | llm_client.py |
httpx | >=0.27.0 | HTTP client (used by openai SDK) | llm_client.py |
Planned Dependencies (Not Yet Added)
Section titled “Planned Dependencies (Not Yet Added)”The following dependencies are planned for future phases but are not currently in pyproject.toml:
Knowledge Graph Construction (Planned for KG Phase)
Section titled “Knowledge Graph Construction (Planned for KG Phase)”| Package | Version | Purpose | Planned Module |
|---|---|---|---|
atlas-rag | >=0.0.5 | AutoSchemaKG framework | kg_builder.py |
networkx | >=3.1 | Graph data structures and algorithms | kg_builder.py, metrics.py |
Evaluation and Metrics (Planned for Evaluation Phase)
Section titled “Evaluation and Metrics (Planned for Evaluation Phase)”| Package | Version | Purpose | Planned Module |
|---|---|---|---|
scikit-learn | >=1.3.0 | Machine learning metrics (F1, etc.) | metrics.py |
sentence-transformers | >=2.2.0 | Semantic embeddings | metrics.py, evaluator.py |
nltk | >=3.8.0 | NLP utilities (tokenization) | metrics.py |
sacrebleu | >=2.3.0 | BLEU score calculation | metrics.py |
Note: These dependencies will be added to
pyproject.tomlwhen their respective implementation phases begin.
Dependency Groups
Section titled “Dependency Groups”Production Dependencies
Section titled “Production Dependencies”Required for running the application:
[project]dependencies = [ "accelerate>=1.12.0", "bitsandbytes>=0.49.1", "google-api-python-client>=2.100.0", "google-auth-httplib2>=0.1.0", "google-auth-oauthlib>=1.0.0", "pydantic>=2.0.0", "pydantic-settings>=2.0.0", "rich>=13.0.0", "sentencepiece>=0.2.1", "tenacity>=8.0.0", "transformers>=4.57.3", "typer[all]>=0.9.0", # LLM Integration (OpenAI SDK supports Ollama and other compatible endpoints) "openai>=1.0.0", "httpx>=0.27.0",]Development Dependencies
Section titled “Development Dependencies”Dependencies for code quality and linting:
[dependency-groups]dev = [ "ruff>=0.8.0",]Testing Dependencies
Section titled “Testing Dependencies”Dependencies for running tests:
[dependency-groups]test = [ "pytest>=8.0.0", "pytest-cov>=5.0.0", "pytest-mock>=3.14.0",]Installation Instructions
Section titled “Installation Instructions”Build System
Section titled “Build System”This project uses uv as the build backend and package manager:
[build-system]requires = ["uv_build>=0.9.26,<0.10.0"]build-backend = "uv_build"Basic Installation
Section titled “Basic Installation”Install production dependencies:
uv syncDevelopment Installation
Section titled “Development Installation”Install with development dependencies:
uv sync --group devTesting Installation
Section titled “Testing Installation”Install with test dependencies:
uv sync --group testComplete Installation
Section titled “Complete Installation”Install all dependency groups:
uv sync --all-groupsAlternative: pip Installation
Section titled “Alternative: pip Installation”If not using uv:
pip install -e .Docker Installation
Section titled “Docker Installation”Dependencies are automatically installed in Docker:
docker compose buildVersion Compatibility
Section titled “Version Compatibility”Python Version
Section titled “Python Version”Required: Python >= 3.13
Tested on:
- Python 3.13.0
- Python 3.13.1
PyTorch Version
Section titled “PyTorch Version”CUDA Support (Configured via [tool.uv.sources]):
The project is configured to use PyTorch with CUDA 12.4 support. This is managed through uv’s custom index configuration:
[[tool.uv.index]]name = "pytorch-cu124"url = "https://download.pytorch.org/whl/cu124"priority = "supplemental"
[tool.uv.sources]torch = { index = "pytorch-cu124" }torchvision = { index = "pytorch-cu124" }torchaudio = { index = "pytorch-cu124" }When running uv sync, PyTorch packages are automatically installed from the CUDA 12.4 index.
Manual Installation (if not using uv):
# CUDA 12.4pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# ROCm (AMD GPUs)pip install torch torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
# CPU Onlypip install torch torchaudio --index-url https://download.pytorch.org/whl/cpuTransformers Version Constraints
Section titled “Transformers Version Constraints”For general use:
transformers>=4.57.3
bitsandbytes Version
Section titled “bitsandbytes Version”Purpose: Quantization and memory optimization for LLMs
Minimum version: >=0.49.1
Features Used:
- 8-bit and 4-bit quantization
- Memory-efficient model loading
- CUDA optimization
Detailed Dependency Information
Section titled “Detailed Dependency Information”pydantic (>=2.0.0)
Section titled “pydantic (>=2.0.0)”Purpose: Data validation, schema definitions, and JSON serialization
Features Used:
BaseModel- All data schemas (QAPair, QARecord, EvaluationReport, etc.)Field()- Validation constraints (ge, le, pattern, default_factory)@field_validator- Custom field validation@model_validator- Cross-field validation@computed_field- Derived/calculated propertiesmodel_dump_json()- JSON serializationmodel_validate_json()- JSON deserializationmodel_json_schema()- JSON Schema export
Why Pydantic over dataclasses:
- Built-in validation with declarative constraints
- Automatic JSON serialization/deserialization with datetime support
- Better error messages with field paths
- Ecosystem alignment (OpenAI SDK uses Pydantic)
- Computed fields for derived values
- Rust-based validation core for performance (Pydantic v2)
Documentation: https://docs.pydantic.dev/latest/
pydantic-settings (>=2.0.0)
Section titled “pydantic-settings (>=2.0.0)”Purpose: Configuration management with environment variable support
Features Used:
BaseSettings- Configuration class with env var loadingSettingsConfigDict- Configuration for env prefix, .env file support- Automatic type coercion from string env vars
Documentation: https://docs.pydantic.dev/latest/concepts/pydantic_settings/
openai (>=1.0.0)
Section titled “openai (>=1.0.0)”Purpose: OpenAI API client for GPT models
Features Used:
- Chat completions API
- Streaming responses
- Token usage tracking
API Models:
gpt-4o(recommended)gpt-4o-mini(cost-effective)gpt-4gpt-3.5-turbo
Authentication: Requires OPENAI_API_KEY environment variable
Documentation: https://platform.openai.com/docs/api-reference
httpx (>=0.27.0)
Section titled “httpx (>=0.27.0)”Purpose: HTTP client for Ollama API
Features Used:
- Async requests
- Timeout handling
- Connection pooling
Default URL: http://localhost:11434
Documentation: https://www.python-httpx.org/
Planned Dependency Information
Section titled “Planned Dependency Information”Note: The following packages are not yet installed but are documented for future implementation phases.
atlas-rag (>=0.0.5)
Section titled “atlas-rag (>=0.0.5)”Purpose: AutoSchemaKG framework for knowledge graph construction
Features Planned:
- Triple extraction
- Dynamic schema induction
- Graph construction
- NetworkX integration
Key Modules:
atlas_rag.kg_constructionatlas_rag.llm_generatoratlas_rag.utils
Documentation: https://hkust-knowcomp.github.io/AutoSchemaKG/
Paper: https://arxiv.org/abs/2505.23628
networkx (>=3.1)
Section titled “networkx (>=3.1)”Purpose: Graph data structures and algorithms
Features Planned:
- Graph creation and manipulation
- Node and edge attributes
- Graph algorithms (connectivity, density)
- JSON serialization
- GraphML export
Documentation: https://networkx.org/documentation/stable/
scikit-learn (>=1.3.0)
Section titled “scikit-learn (>=1.3.0)”Purpose: Machine learning metrics
Features Planned:
- F1 score calculation
- Precision and recall
- Token-level comparison
Documentation: https://scikit-learn.org/stable/
sentence-transformers (>=2.2.0)
Section titled “sentence-transformers (>=2.2.0)”Purpose: Semantic embeddings for text
Features Planned:
- Sentence embeddings
- Semantic similarity
- Coherence scoring
Models Planned:
all-MiniLM-L6-v2(384 dims, fast)all-mpnet-base-v2(768 dims, high quality)
Documentation: https://www.sbert.net/
nltk (>=3.8.0)
Section titled “nltk (>=3.8.0)”Purpose: Natural language processing utilities
Features Planned:
- Tokenization
- Word counting
- Text preprocessing
Data Required:
import nltknltk.download('punkt')nltk.download('stopwords')Documentation: https://www.nltk.org/
sacrebleu (>=2.3.0)
Section titled “sacrebleu (>=2.3.0)”Purpose: BLEU score calculation
Features Planned:
- Sentence-level BLEU
- Corpus-level BLEU
- Multiple references
Documentation: https://github.com/mjpost/sacrebleu
License Information
Section titled “License Information”MIT Licensed
Section titled “MIT Licensed”openaihttpxpydantictyperrich
Apache 2.0 Licensed
Section titled “Apache 2.0 Licensed”transformerstorch
BSD Licensed
Section titled “BSD Licensed”acceleratesentencepiece
Planned Dependency Licenses
Section titled “Planned Dependency Licenses”Note: For dependencies not yet added to the project.
MIT Licensed (Planned)
Section titled “MIT Licensed (Planned)”networkxnltksacrebleuatlas-rag
Apache 2.0 Licensed (Planned)
Section titled “Apache 2.0 Licensed (Planned)”sentence-transformersscikit-learn
AutoSchemaKG Citation (When Added)
Section titled “AutoSchemaKG Citation (When Added)”If using atlas-rag for published research, citation is required:
@article{huang2025autoschemakg, title={AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora}, author={Huang, Haoyu and others}, journal={arXiv preprint arXiv:2505.23628}, year={2025}}Dependency Size Information
Section titled “Dependency Size Information”Installation Sizes (Current Dependencies)
Section titled “Installation Sizes (Current Dependencies)”| Package | Disk Space |
|---|---|
torch | ~2.5 GB |
transformers | ~500 MB |
bitsandbytes | ~50 MB |
openai | ~5 MB |
accelerate | ~30 MB |
| Other packages | ~100 MB |
| Total | ~3.2 GB |
Additional Sizes (Planned Dependencies)
Section titled “Additional Sizes (Planned Dependencies)”| Package | Disk Space |
|---|---|
sentence-transformers | ~400 MB |
atlas-rag | ~50 MB |
networkx | ~10 MB |
scikit-learn | ~40 MB |
nltk | ~20 MB + data |
| Total (with planned) | ~3.7 GB |
Model Sizes (Downloaded at Runtime)
Section titled “Model Sizes (Downloaded at Runtime)”| Model | Size | Used By |
|---|---|---|
| Whisper Large V3 | ~3 GB | Transcription |
| all-MiniLM-L6-v2 | ~80 MB | Evaluation |
| Llama 3.1 8B (Ollama) | ~4.7 GB | QA/KG (if using Ollama) |
Troubleshooting Dependencies
Section titled “Troubleshooting Dependencies”Common Issues
Section titled “Common Issues”Issue: torch installation fails with CUDA
Solution:
# With uv (automatic via uv sources)uv sync
# Manual installationpip install torch --index-url https://download.pytorch.org/whl/cu124Issue: bitsandbytes installation fails
Solution:
# Ensure CUDA is availablenvidia-smi
# Reinstallpip install bitsandbytes --upgradeDependency Conflicts
Section titled “Dependency Conflicts”Conflict: pydantic v1 vs v2
Resolution: Project requires Pydantic v2
pip install "pydantic>=2.0.0" --upgradeConflict: transformers version mismatch
Resolution:
pip install "transformers>=4.57.3" --upgradeUpdating Dependencies
Section titled “Updating Dependencies”Check for Updates
Section titled “Check for Updates”uv lock --upgradeUpdate All Dependencies
Section titled “Update All Dependencies”uv sync --upgradeUpdate Specific Package
Section titled “Update Specific Package”uv add openai --upgradeLock Dependencies
Section titled “Lock Dependencies”For reproducible builds, uv.lock is automatically maintained by uv.
Document Version: 1.0 Last Updated: 2026-01-14