AIDec 13, 2025Code
AI Transparency Atlas: Framework, Scoring, and Real-Time Model Card Evaluation PipelineAkhmadillo Mamirov, Faiaz Azmain, Hanyu Wang
AI model documentation is fragmented across platforms and inconsistent in structure, preventing policymakers, auditors, and users from reliably assessing safety claims, data provenance, and version-level changes. We analyzed documentation from five frontier models (Gemini 3, Grok 4.1, Llama 4, GPT-5, and Claude 4.5) and 100 Hugging Face model cards, identifying 947 unique section names with extreme naming variation. Usage information alone appeared under 97 distinct labels. Using the EU AI Act Annex IV and the Stanford Transparency Index as baselines, we developed a weighted transparency framework with 8 sections and 23 subsections that prioritizes safety-critical disclosures (Safety Evaluation: 25%, Critical Risk: 20%) over technical specifications. We implemented an automated multi-agent pipeline that extracts documentation from public sources and scores completeness through LLM-based consensus. Evaluating 50 models across vision, multimodal, open-source, and closed-source systems cost less than $3 in total and revealed systematic gaps. Frontier labs (xAI, Microsoft, Anthropic) achieve approximately 80% compliance, while most providers fall below 60%. Safety-critical categories show the largest deficits: deception behaviors, hallucinations, and child safety evaluations account for 148, 124, and 116 aggregate points lost, respectively, across all evaluated models.
DCDec 4, 2025
Reducing Fragmentation and Starvation in GPU Clusters through Dynamic Multi-Objective SchedulingAkhmadillo Mamirov
GPU clusters have become essential for training and deploying modern AI systems, yet real deployments continue to report average utilization near 50%. This inefficiency is largely caused by fragmentation, heterogeneous workloads, and the limitations of static scheduling policies. This work presents a systematic evaluation of these issues and introduces three specialized dynamic schedulers: Hybrid Priority (HPS), Predictive Backfill (PBS), and Smart Batch (SBS). These schedulers are designed to improve utilization, fairness, and overall throughput in multi-tenant GPU clusters. We evaluate all schedulers using a controlled simulation of 1,000 AI jobs on a 64-GPU, 8-node cluster that includes a realistic mix of training, inference, and research workloads. Static baselines (FIFO, SJF, Shortest, Shortest-GPU) achieve 45 to 67% GPU utilization and 12.5 to 18.3 jobs per hour and experience severe starvation, with as many as 156 jobs waiting longer than 30 minutes. The dynamic schedulers significantly outperform these policies. HPS achieves the highest utilization (78.2%), highest throughput (25.8 jobs per hour), and the lowest fairness variance among dynamic methods (457), reducing starvation to 12 jobs. PBS improves fragmentation handling and reaches 76.1% utilization, while SBS increases efficiency for structurally similar jobs and reaches 74.6% utilization. Across all key metrics, including throughput, job wait times, fairness variance, and starvation, dynamic multi-objective schedulers consistently outperform single-objective heuristics. These results show that targeted and transparent scheduling strategies can meaningfully increase GPU efficiency in heterogeneous AI clusters and provide a practical foundation for future production scheduling frameworks.