CoGenAI

CoGenAI is a collaborative Generative AI platform designed for research labs and enterprises that need their own secure, tailored AI environment. It vulgarizes research-grade AI through an intuitive UI and pre-made API calls, enabling non-NLP experts and non-coders to build, deploy, and serve custom Small Language Models (SLMs) and RAG pipelines.

CoGenAI is in active development and internal testing. If you're interested in testing it or collaborating, get in touch.

CoGenAI platform, layered architecture

Platform architecture, seven horizontal layers from users to model lifecycle

Features

Data Ingestion & Storage

  • Upload documents (PDF, DOCX, PPTX, images) to MinIO object storage with bucket/category organization
  • Web document discovery, search and download from arXiv, PubMed, Semantic Scholar, Brave Search directly into storage
  • MCP server integration for deep paper analysis: full-text extraction, citation graphs, code repo linking, patent search
  • Shared datasets cache with sync management, link datasets to projects without duplicating files

Processing Pipeline

  • Six sequential steps, fully versioned with lineage tracking
  • Text extraction with per-file quality metrics (readability, gibberish detection, language confidence)
  • Cleaning & filtering with multiple filter versions from the same extracted text
  • Chunking, 4 strategies: fixed-size, sentence, paragraph, semantic (registry-based, extensible)
  • Tokenization, 3 strategies: HuggingFace BPE, character, whitespace
  • Embedding, sentence-transformers + OpenAI Ada, stored in pgvector (PostgreSQL native vector search)
  • QA pair generation from chunked text with configurable LLM provider and model

RAG & Retrieval

  • End-to-end RAG pipeline: document retrieval → context assembly → LLM answer generation
  • FAISS/pgvector retrieval with similarity scoring and chunk display
  • Real metrics: retrieval latency, chunks retrieved, answer generation latency
  • Push metrics to Prometheus/Grafana for monitoring

Chat & Collaboration

  • Multi-turn chat with configurable provider + model + RAG + Agent toggles, session persistence
  • Side-by-side model comparison with delta metrics (latency, tokens, chunks, relevance, agent iterations)
  • QA evaluation mode: rate answered QA pairs from generated sets with progress tracking
  • Summarize, regenerate, delete messages; RAG chunks panel with similarity scores
  • Optimistic send, stop generation, markdown rendering

Agent Framework

  • Simple Chat, baseline direct LLM call
  • RAG Assistant, context retrieval + grounded synthesis with reasoning steps
  • ReAct Agent, Thought → Action → Observation loop with configurable iterations
  • Conversational Refinement, clarity evaluation → MCQ clarification → refined query → answer
  • Agent template generator for custom agents (registry-based, extensible)
  • Fully extensible: drop in custom RAG strategies, chunking algorithms, embedding models, or agent workflows without modifying core code

Model Management

  • HuggingFace search & download with GGUF quantization support, progress tracking, validation
  • Tokenizer + embedding model registry with bulk operations
  • Local inference: Ollama + LM Studio auto-discovery with health indicators
  • API providers: OpenAI, Anthropic, DeepSeek, Google Gemini, NVIDIA NIM, Zhipu GLM
  • Cloud deployment: GCP Vertex AI, AWS SageMaker, Azure ML
  • Missing/broken model detection with re-download

Fine-Tuning & Training

  • Unsloth-powered SFT/LoRA/DPO training with real GPU support
  • Simulation fallback for development without GPU
  • Training job management: config, progress, cancel, logs, Grafana dashboard per job
  • Post-training evaluation: perplexity, sample generation, base vs fine-tuned comparison
  • Fine-tuned models auto-registered in model catalog
  • Training metrics history in DB + Prometheus for visualization

QA Validation & Human-in-the-Loop

  • Contributor role with limited access for QA validation
  • Thumbs up/neutral/down rating with comments
  • Admin task assignment, assign contributors to projects
  • Refined datasets: export high-rated QA pairs for training (SFT, DPO, RLHF)
  • Full feedback loop: validation → refined datasets → fine-tuning

Deployment

  • Local deployment: vLLM / Ollama Docker containers
  • Cloud deployment: GCP Vertex AI, AWS SageMaker, Azure ML
  • Model inference testing via built-in chat interface

Metrics & Monitoring

  • Prometheus + Pushgateway + Grafana stack with 11 registered metrics across training, RAG, and processing
  • 3 auto-provisioned Grafana dashboards: Training (loss curves, duration, progress), RAG (latency, chunks, queries/sec), System (health, scrape status)
  • Modular @register_metric decorator, add a new metric by dropping a class in metrics/builtin/
  • Management API: list registered metrics, query Prometheus via PromQL proxy, purge Pushgateway data
  • Frontend dashboard with 5 tabs (Overview, Values, Pushgateway, PromQL, Purge) + status cards

Platform Infrastructure

  • Role-based access control: super_admin > admin > researcher > contributor
  • Backup & restore: PostgreSQL, MinIO, workspaces with scheduler and memo
  • Database: PostgreSQL 16 with pgvector extension
  • Task queue: Celery + Redis for all heavy processing
  • Containerized: Docker Compose with 15+ services including 4 MCP sidecars
  • Responsive frontend with collapsible sidebar, gradient stat cards, design system abstraction layer
Request a Demo