Advancing Medical AI Science

Our research spans large language model training, biomedical NLP, genomic AI, and safe clinical reasoning. We publish openly and collaborate with leading institutions.

Where we push boundaries

🧬
Biomedical NLP
Named entity recognition, relation extraction, and semantic reasoning across clinical and biomedical text corpora.
🔬
LLM Alignment
Medical-specific RLHF and DPO pipelines ensuring safe, grounded, and factually accurate model outputs.
🧪
Genomic AI
Deep learning on genomic sequences, structural variant calling, and gene-phenotype association modeling.
💊
Drug Discovery AI
Generative and predictive AI for molecular property optimization, ADMET modeling, and target identification.
🏥
Clinical Reasoning
Chain-of-thought clinical decision support, differential diagnosis generation, and evidence-based reasoning.
👁️
Medical Vision
Multimodal models for pathology slide analysis, radiology interpretation, and visual clinical grounding.

How we build trustworthy models

Every DeepCog model follows a rigorous 4-stage pipeline from data curation to clinical validation.

01 //
📚
Data Curation
Multi-source biomedical corpus assembly with quality filtering, deduplication, and expert annotation.
02 //
⚙️
Pre-training
Domain-specific continued pre-training on curated corpora with medical tokenizer optimization.
03 //
🎯
DPO Alignment
Direct Preference Optimization using expert clinician preference data for safe, accurate outputs.
04 //
Clinical Eval
Benchmark evaluation on MedQA, USMLE, PubMedQA, and internal clinical validation sets.

Recent research papers

ARXIV · 2026 · BIOMEDICAL NLP
OpenBioLLM: Advancing Open-Source Biomedical Large Language Models with Expert-Curated Preference Data
DeepCog Research Team · IIT Madras Collaboration
We introduce OpenBioLLM-70B, a state-of-the-art open-source biomedical LLM achieving 91.2% on MedQA. Our novel DPO pipeline leverages expert medical preference annotations across 120K instruction pairs, surpassing GPT-4 on 7 of 9 medical benchmarks.
MedQA 91.2% USMLE 89.4% Open Source
NATURE METHODS · 2025 · GENOMICS
GenomicLLM: A Domain-Specific Language Model for Variant Interpretation and Gene-Disease Association
DeepCog Genomics Lab · Chennai AI Research Center
We present GenomicLLM-7B, trained on 180M genomic sequences from NCBI and Ensembl. The model achieves 87% accuracy on clinical variant classification tasks, enabling automated interpretation of VCF files and generation of clinical genomics reports.
Genomics Variant Calling
NeurIPS 2025 · LLM ALIGNMENT
MedDPO: Scaling Direct Preference Optimization for Clinical Safety in Medical Language Models
DeepCog AI Research · Anna University
We introduce MedDPO, a clinical-safety-focused DPO framework that reduces medical hallucination by 68% while maintaining benchmark performance. Our 120K preference dataset is annotated by board-certified physicians across 24 specialties.
Safety Alignment Hallucination ↓68%
ICLR 2025 · DRUG DISCOVERY
MolLLM: Unified Molecular Language Modeling for ADMET Prediction and Lead Optimization
DeepCog Chemistry AI Lab
MolLLM bridges natural language and molecular representations using a unified tokenizer for SMILES, InChI, and IUPAC names. Achieves top performance on 14 ADMET benchmarks from TDC, with 3x speedup over existing graph neural network approaches.
Drug Discovery ADMET Molecules