CoGenAI

CoGenAI is a collaborative Generative AI platform designed for research labs and enterprises that need their own secure, tailored AI environment. It vulgarizes research-grade AI through an intuitive UI and pre-made API calls, enabling non-NLP experts and non-coders to build, deploy, and serve custom Small Language Models (SLMs) and RAG pipelines.

CoGenAI is in active development and internal testing. If you're interested in testing it or collaborating, get in touch.

Platform architecture, seven horizontal layers from users to model lifecycle

Features

Data Ingestion & Storage

Upload documents (PDF, DOCX, PPTX, images) to MinIO object storage with bucket/category organization
Web document discovery, search and download from arXiv, PubMed, Semantic Scholar, Brave Search directly into storage
MCP server integration for deep paper analysis: full-text extraction, citation graphs, code repo linking, patent search
Shared datasets cache with sync management, link datasets to projects without duplicating files

Processing Pipeline

Six sequential steps, fully versioned with lineage tracking
Text extraction with per-file quality metrics (readability, gibberish detection, language confidence)
Cleaning & filtering with multiple filter versions from the same extracted text
Chunking, 4 strategies: fixed-size, sentence, paragraph, semantic (registry-based, extensible)
Tokenization, 3 strategies: HuggingFace BPE, character, whitespace
Embedding, sentence-transformers + OpenAI Ada, stored in pgvector (PostgreSQL native vector search)
QA pair generation from chunked text with configurable LLM provider and model

RAG & Retrieval

End-to-end RAG pipeline: document retrieval → context assembly → LLM answer generation
FAISS/pgvector retrieval with similarity scoring and chunk display
Real metrics: retrieval latency, chunks retrieved, answer generation latency
Push metrics to Prometheus/Grafana for monitoring

Chat & Collaboration

Multi-turn chat with configurable provider + model + RAG + Agent toggles, session persistence
Side-by-side model comparison with delta metrics (latency, tokens, chunks, relevance, agent iterations)
QA evaluation mode: rate answered QA pairs from generated sets with progress tracking
Summarize, regenerate, delete messages; RAG chunks panel with similarity scores
Optimistic send, stop generation, markdown rendering

Agent Framework

Simple Chat, baseline direct LLM call
RAG Assistant, context retrieval + grounded synthesis with reasoning steps
ReAct Agent, Thought → Action → Observation loop with configurable iterations
Conversational Refinement, clarity evaluation → MCQ clarification → refined query → answer
Agent template generator for custom agents (registry-based, extensible)
Fully extensible: drop in custom RAG strategies, chunking algorithms, embedding models, or agent workflows without modifying core code

Model Management

HuggingFace search & download with GGUF quantization support, progress tracking, validation
Tokenizer + embedding model registry with bulk operations
Local inference: Ollama + LM Studio auto-discovery with health indicators
API providers: OpenAI, Anthropic, DeepSeek, Google Gemini, NVIDIA NIM, Zhipu GLM
Cloud deployment: GCP Vertex AI, AWS SageMaker, Azure ML
Missing/broken model detection with re-download

Fine-Tuning & Training

Unsloth-powered SFT/LoRA/DPO training with real GPU support
Simulation fallback for development without GPU
Training job management: config, progress, cancel, logs, Grafana dashboard per job
Post-training evaluation: perplexity, sample generation, base vs fine-tuned comparison
Fine-tuned models auto-registered in model catalog
Training metrics history in DB + Prometheus for visualization

QA Validation & Human-in-the-Loop

Contributor role with limited access for QA validation
Thumbs up/neutral/down rating with comments
Admin task assignment, assign contributors to projects
Refined datasets: export high-rated QA pairs for training (SFT, DPO, RLHF)
Full feedback loop: validation → refined datasets → fine-tuning

Deployment

Local deployment: vLLM / Ollama Docker containers
Cloud deployment: GCP Vertex AI, AWS SageMaker, Azure ML
Model inference testing via built-in chat interface

Metrics & Monitoring

Prometheus + Pushgateway + Grafana stack with 11 registered metrics across training, RAG, and processing
3 auto-provisioned Grafana dashboards: Training (loss curves, duration, progress), RAG (latency, chunks, queries/sec), System (health, scrape status)
Modular @register_metric decorator, add a new metric by dropping a class in metrics/builtin/
Management API: list registered metrics, query Prometheus via PromQL proxy, purge Pushgateway data
Frontend dashboard with 5 tabs (Overview, Values, Pushgateway, PromQL, Purge) + status cards

Platform Infrastructure

Role-based access control: super_admin > admin > researcher > contributor
Backup & restore: PostgreSQL, MinIO, workspaces with scheduler and memo
Database: PostgreSQL 16 with pgvector extension
Task queue: Celery + Redis for all heavy processing
Containerized: Docker Compose with 15+ services including 4 MCP sidecars
Responsive frontend with collapsible sidebar, gradient stat cards, design system abstraction layer

Request a Demo