View my Google Scholar profile
Co-authored research paper on long video-audio understanding via late-fusion of LMMs/LLMs + ASR; reported +38.75% improvement on VideoMME (w/ subtitles) vs baselines like VideoLLaMA2 and InternVL2. Accepted at the 28th International Conference on Information Fusion (FUSION 2025), Rio de Janeiro, Brazil.
Feel free to reach out to me through the following channels: