Efficient LLM Serving Through Priority-based Scheduling

Aug 2024 - Present

Serving large language models can be costly due to the infrastructure requirements and ongoing operational expenses. Maximizing hardware utilization while ensuring compliance with application-specific service level agreements (SLAs) is crucial. This research focuses on optimizing GPU cluster-level system throughput while consistently meeting SLA requirements.

Low Contrast Brain CT Segmentation with Group Equivariant Network

July 2023 - Present

Non-contrast CT (NCCT) is critical for stroke diagnosis in emergencies due to its accessibility and cost-effectiveness. However, its lower soft tissue contrast and higher noise limit its use in brain tissue segmentation. To improve segmentation accuracy in NCCT, we introduce a novel 3D group equivariant network into a 3D U-Net architecture to achieve robust performance under various conditions.