Sign in to view Mostafa’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Mostafa’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
San Francisco, California, United States
Sign in to view Mostafa’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
1K followers
500+ connections
Sign in to view Mostafa’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Mostafa
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Mostafa
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Mostafa’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View Mostafa’s full profile
-
See who you know in common
-
Get introduced
-
Contact Mostafa directly
Explore more posts
-
Abhijeet Singh
Raven Bionics • 2K followers
TIL that Iran launched Ballistic missiles at The Weizmann Institute of Science in June 2025, destroying labs and years of work. I don't know how you justify targeting Research Campuses. I was looking for the best places to conduct SOTA research in computer science. I'm waiting on my Master's until these wars subside, but this is so not cool. No student or researcher should have to factor "will my university get bombed?" into their career planning. That should be a basic global consensus. For those that don't know, The Weizmann Institute of Science offers top notch MSc. Programs free of cost so that students can completely devote their time to Scientific Research and Study. I was getting excited right before this news infuriated me. To the students and researchers at Weizmann: I’m deeply sorry for your loss. Years of work should not be erased by a missile salvo. The least the rest of us can do is not look away, and keep insisting that universities must be off‑limits everywhere.
1
-
Safoura jolfaei
Iranian Research Institute… • 1K followers
sharing a new, tool for the Persian NLP community! I've just released PersianSciQA-Qwen2.5-14B, a specialized model fine-tuned for extractive question answering on Persian scientific documents. Key features: Precision-Focused: It's trained to answer questions only based on the provided context. Reduces Hallucination: If the answer isn't in the text, it reliably responds with CANNOT_ANSWER—a crucial feature for factual accuracy. Built on Qwen 2.5: Leverages the latest architecture for potentially stronger performance. This new model joins its "sibling" 😉 , PersianSciQA-LLaMA-13B, which was trained on the exact same PersianSciQA-Extractive dataset. Which one performs better? My working hypothesis is that the newer Qwen 2.5 architecture may offer some advantages, but only a rigorous, data-driven evaluation will tell the true story. That's why I'm preparing a deep-dive analysis to benchmark both models head-to-head. The full evaluation will be published soon! Stay tuned. 📊 I invite all researchers and developers in the Persian NLP space to test this new model and share your feedback. 👇 Explore the new Qwen model here: https://lnkd.in/d_uQZVxP 🔬 Dataset Link: https://lnkd.in/dHu87fAN #PersianNLP #NLP #LLM #FineTuning #AI #MachineLearning #Qwen #LLaMA #HuggingFace #OpenSourceAI #Evaluation #DataScience #Farsi #زبان_فارسی
2
2 Comments -
Morteza Alikhani
Hamrahe Aval (MCI) همراه اول • 533 followers
I’m excited to share that our work, "FaMTEB: Massive Text Embedding Benchmark in Persian Language", has been accepted to EMNLP 2025 (Findings)! 🚀 The lack of a comprehensive benchmark for evaluating text embeddings in Persian has been a major challenge. Our team at MCINEXT, with the goal of addressing this gap, introduced FaMTEB, a benchmark for evaluating text embeddings in Persian, which is now featured on the reputable MTEB leaderboard. I’m grateful to have collaborated with Erfan Zeinivand and Mehran Sarmadi on this project. A special thanks to everyone who shared their knowledge and offered support throughout this journey. 🔗 Explore the leaderboard: https://lnkd.in/dHfmhzW7 📄 Read our paper: https://lnkd.in/dwWVbsD4 #EMNLP #EMNLP2025 #MCINEXT #FaMTEB #MTEB #PersianNLP
16
-
Zhide Wang
Southern Methodist University • 387 followers
Excited to share our new paper published in Human Factors: “Inferring Hidden Attentional States in Driving: A Bayesian Approach to Modeling Distraction and Secondary Task Engagement.” This work was led by Lekhapriya Dheeraj Kashyap as part of her PhD research, and I’m grateful to have collaborated on it. Many real-world systems face the same challenge: the most important states are hidden. In driving, we can observe signals such as speed, eye movements, or pupil dilation — but not the driver’s true attentional state. In this work, we develop a Bayesian decision framework based on a Partially Observable Semi-Markov Decision Process (POSMDP) to infer latent attentional states and model how drivers allocate attention between competing tasks. The model detects distraction earlier than common heuristic rules and reveals substantial heterogeneity in attention strategies. Beyond driving safety, problems like this appear in many AI systems where human states are latent and decisions unfold sequentially. Great collaboration with Lekhapriya Kashyap, Yanling Chang, Maryam Zahabi, and Alfredo Garcia. 📄 Paper: https://lnkd.in/g23ubzht
9
-
Alessandro Epasto
Google • 2K followers
Thrilled to see our ICML 2025 paper on Differentially Private Partition Selection featured on the Google Research blog! A huge shout-out to my amazing co-authors: Justin Y. Chen from MIT, and my colleagues Vincent Cohen-Addad and Morteza Zadimoghaddam at Google Research. Our work presents new algorithms to better protect user data at scale. Check out the post to learn more. Read it here: https://lnkd.in/dkZhHWAm #ICML2025 #DifferentialPrivacy #DataSecurity #MachineLearning #Research
85
1 Comment -
Anup S.
Amazon Web Services (AWS) • 5K followers
Self-Reflection in LLMs: Unreliable, But Useful Self-reflection in large language models isn't reliable yet - but that doesn't mean it's not useful. In a recent experiment, I found that asking an LLM to rate its own responses and then filtering based on that rating meaningfully improves result quality. It's not perfect, but it works as a practical quality gate. Here's what I observed: - Higher self-rating thresholds → higher accuracy and precision - But that comes at a cost → lower coverage - There's a sweet spot where you get meaningfully better quality without throwing away too many results It's essentially a confidence-based filter. The model isn't always right about how good its response is, but it's right often enough that the signal is worth using. The tradeoff is real though. Crank the threshold too high and you reject good response along with the bad ones. Too low and you're barely filtering at all. Tuning that threshold becomes the actual engineering problem. Self-reflection isn't a silver bullet, but as a lightweight quality layer on top of generation - it pulls its weight.
8
-
Ramin Mehran
Google DeepMind • 4K followers
In this episode, we discuss Why Language Models Hallucinate by The authors of the paper are: - Adam Tauman Kalai - Ofir Nachum - Santosh S. Vempala - Edwin Zhang. The paper explains that hallucinations in large language models arise because training and evaluation reward guessing over admitting uncertainty, framing the issue as errors in binary classification. It shows that models become incentivized to produce plausible but incorrect answers to perform well on benchmarks. The authors propose that addressing hallucinations requires changing how benchmarks are scored, promoting more trustworthy AI by discouraging penalization of uncertain responses.
7
1 Comment -
MohammadMahdi Parchamijalal
TAPSI • 936 followers
📢 New Paper Published! Excited to share our latest work, "LitANFIS: Literal-aware Adaptive Neuro-Fuzzy Inference System to learn Conjunctive Normal Form," now published in Neurocomputing. This model brings negation into fuzzy rule learning, enabling more compact and interpretable systems. Thanks to Dr. Armin Salimi-Badr for his support. Check out the publication here: https://lnkd.in/eJ-WgnJz (Available until Feb. 06, 2025)
87
24 Comments -
Taylor Black
Microsoft • 8K followers
Can scaling reward models at inference time outperform traditional training-time scaling in LLMs? A new paper from researchers proposes just that. By introducing a novel method called Self-Principled Critique Tuning (SPCT), they show how generative reward models (GRMs) can be made more scalable and flexible using reinforcement learning—right at inference time. Their DeepSeek-GRM models leverage parallel sampling and meta reward models to enhance reward accuracy for general queries. This could shift how we think about reward modeling in AI—making it more adaptable and efficient. 🔗 https://lnkd.in/gPbtZSdt #AIresearch #ReinforcementLearning #LLM
2
-
Emrah Gürlek
Kirsehir Ahi Evran University • 391 followers
I'm thrilled to share that our PRISMA-guided systematic review has been published in Neural Computing and Applications! Co-authored with my supervisor Dr. Inci ZAIM GÖKBAY, we investigated the true clinical reliability of Machine Learning in diagnosing diseases like Alzheimer's and Parkinson's using structural MRI. Key Highlights: AI models show tremendous diagnostic potential (up to 0.96 accuracy). The Catch: We identified a significant "optimistic "bias" - studies with lower methodological transparency tend to inflate their performance!. The Future: To move AI from the lab to the clinic, we urgently need standardized processing pipelines and open-data initiatives. Grateful to TÜBİTAK (2211-C Scholarship Program) for supporting my PhD research. Read the full open-access article here: https://lnkd.in/d4y6iNKM #MedicalAI #Neuroimaging #MachineLearning #Alzheimers #Parkinsons #PRISMA #TUBITAK #IstanbulUniversity Springer Nature #NeuralComputingAndApplications #NCAA #SpringerNature
20
-
Joseph Fuge
LocoLingua • 1K followers
The "Neural" in Neural Machine Translation (NMT) doesn't just refer to the act of translating at inference time. Modern Quality Evaluation or Quality Estimation (QE) systems are using straightforward metrics like BLEU, sacreBLEU, chrF, and more - but they also employ more recent Neural QE (NQE) metrics like COMET. With the right annotated data, such as human-annotated MQM scores on translations, COMET achieves impressive results even in quality estimation where a reference translation is not provided - this is especially critical if you want to provide a sort of "confidence score" to the end user consuming your machine translations, since those inference-time translations are unlikely to exactly match a training-set reference translation. Of course, human-annotated scores are hard to come by, and when you ask for bilingual annotators you likely want a high level of fluency to ensure the annotations are accurate. That kind of skilled work is likely expensive. Calling back to a previous post where I mentioned the potential applications of MT evaluation methods for LLM outputs, I wonder if human annotation of LLM outputs could be used to train an NQE system for GenAI rather than for an MT system. Testing that theory requires more expertise than I currently possess, and potentially a significant amount of man-hours to annotate LLM outputs, but I'll be curious to see where AI quality evaluation goes beyond simply throwing another LLM at the output of the first one. Speaking of "throwing another LLM at the output of the first one," some research has been done on using LLMs to assess translations. While they were effective at assessing the overall quality of a given MT system, individual "segment-level" translations were still better evaluated by other QE methods already mentioned - https://lnkd.in/g29mZuVU. I expect to see more progress in both LLM and MT evaluation, which will provide better quality outputs and better feedback to users, enhancing their control and confidence by informing them of the calibre of output they received. This may be a major distinguishing factor in GenAI and MT systems moving forward. Take care #machinetranslation
6
-
Erik Miehling
IBM • 1K followers
Algorithms for aligning (or steering) LLMs toward some target behavior or capability often introduce unintended side-effects, that is, behaviors that were not targeted but were nevertheless modified. Obtaining a complete understanding of what changed is not easy, and is further complicated by the fact that alignment efforts in practice consist of multiple “stacked” operations, e.g., SFT followed by DPO, DPO followed by CoT prompting, etc. To help address these challenges, we’ve developed the AI Steerability 360 toolkit, a Hugging Face native framework for both steering LLMs and comprehensively evaluating the impact of steering. Features include: - A taxonomy for implementing steering methods across four points in the prompt-response lifecycle: input, structure, state, and output - Functionality to construct composite steering methods via steering pipelines - Ability to compare steering pipelines on a given task (e.g., instruction following) via use case and benchmark classes We feel that the toolkit will be helpful in providing a more guided and principled approach to alignment. We’ve prepared some notebooks for you to get started with the toolkit, as well as a variety of tutorials for how to add new steering methods, use cases, and evaluation metrics. Check it out! Repo: https://lnkd.in/ekuBbufr Docs: https://lnkd.in/ehKggyt4 Blog post: https://lnkd.in/eQf5f9Bx Karthikeyan Natesan Ramamurthy Praveen Venkateswaran Ching-Yun (Irene) Ko Pierre Dognin Tejaswini Pedapati Moninder Singh Avinash Balakrishnan Inge Vejsbjerg Elizabeth Daly Kush Varshney
58
-
Shubham Vora
Nutsovertech • 20K followers
Stanford’s CME 295 is one of the most practical LLM courses available today. These 8 lectures explain how modern LLMs work, from core concepts to agents and evaluation. If you want depth without noise, this series is worth your time. Lecture 1: End-to-End LLM Overview - Covers the full LLM stack, from transformers and training to agents, vision models, and evaluation. 🔗 https://lnkd.in/dGnQW39t Lecture 2: Transformer Internals and Optimizations - Explains positional embeddings, attention improvements, normalization, and BERT architecture in depth. 🔗 https://lnkd.in/dT_VEpVH Lecture 3: Large Language Models in Practice - Focuses on LLM design, MoE models, decoding methods, prompting, and inference optimizations. 🔗 https://lnkd.in/dwjjpjaP Lecture 4: LLM Training and Finetuning - Breaks down pretraining, scaling laws, optimization techniques, and efficient finetuning methods. 🔗 https://lnkd.in/dSi_xCEN Lecture 5: Preference and Alignment Tuning - Walks through RLHF, reward models, PPO, and modern alternatives like DPO. 🔗 https://lnkd.in/dUK5djpB Lecture 6: LLM Reasoning - Covers reasoning benchmarks, reinforcement learning scaling, and advanced optimization methods. 🔗 https://lnkd.in/dAGQTNAM Lecture 7: Agentic LLMs and RAG - Explains retrieval systems, tool calling, agent frameworks, and safety considerations. 🔗 https://lnkd.in/dWD4j7vm Lecture 8: LLM and Agent Evaluation - Details evaluation metrics, LLM-as-judge methods, and benchmarks for reasoning, coding, and safety. 🔗 https://lnkd.in/ddxE5zvb Start with this today. Follow Shubham Vora to learn more about developing AI agents and how to implement them in your business.
84
44 Comments -
Yao-Yi Chiang
University of Minnesota • 3K followers
Excited to share that our paper “TICLS: Tightly Coupled Language Text Spotter” has been accepted to WACV 2026! Starting from Jerod Weinman's idea, TICLS introduces a new way to integrate pretrained language models into scene text spotting, enabling the model to recognize short, fragmented, and ambiguous text more robustly. By tightly coupling visual and linguistic representations through a character-level pretrained language model, TICLS achieves state-of-the-art performance on benchmark datasets. This work led by Leeje Jang, together with Yijun Lin and Jerod Weinman, highlights how external linguistic knowledge can complement visual understanding—bridging computer vision and natural language processing for richer multimodal perception. Congratulations to Leeje Jang and the team for their hard work and creativity in making this happen! #WACV2026 #ComputerVision #SceneTextRecognition #DeepLearning #MultimodalAI #LanguageModels
59
-
Pernille Tranberg
Digital Identity • 10K followers
With the rise in AI slop, we tend to distrust more and more, even when it is true. And if you factcheck w generative AI you risk getting an authoritative answer from the machine. Academics, journalists and others have a rising responsibility in fact-checking while big tech makes money on slop.
7
1 Comment -
Aanal Patel
Capria Ventures • 839 followers
I have been building AI agents since quite some time, and the biggest hurdle has not been the "intelligence" of the models, it’s the reliability of the system. A new paper from researchers at UIUC, Stanford, and Harvard, "Adaptation of Agentic AI," just gave me a new vocabulary for the problems I face every day. The Core Idea: The "Brain" vs. The "Toolbox": The paper says there are only two main things you can change to make an AI agent better: 1) The Agent (The Brain): You retrain the AI model itself to think differently. 2) The Tool (The Toolbox): You keep the AI model exactly as it is but upgrade the tools it uses (like its search engine or database). The 4 Simple Ways to Adapt The researchers broke every modern AI system down into four basic categories: A1 (Learning from Doing): The agent practices using a tool and gets better based on the result. Basic Example: An AI tries to write code. If the code runs, it learns that was a good move. If it crashes, it learns to try something else next time. A2 (Learning from the Answer): The agent is judged only on its final answer. Basic Example: It doesn't matter how the AI solved a math problem; if the final number is right, it gets a "reward" and learns to repeat that logic. T1 (Plug-and-Play Tools): You use high-quality tools that weren't specifically made for that AI. Basic Example: You give an AI a standard calculator. The calculator doesn't know who the AI is, but it helps the AI do math better. T2 (Customized Tools): This is the "clever" one. You train a small tool to work perfectly for one specific AI. Basic Example: Instead of a general search engine, you build a "mini-searcher" that learns exactly what kind of information your specific AI likes to see. Some of the terms I heard a lot: R1 (Reasoning RL): Inspired by DeepSeek-R1, this refers to agents that are trained to "think" out loud and check their own work before giving an answer. T2 (The Efficient Way): As mentioned above, it's about making the tools adapt to the AI. It is often much cheaper and faster than retraining a giant AI "brain". s3 (The Game-Changer): This is a specific new system that uses the T2 approach. It proved you could train an incredibly good search agent using 70x less data than older methods just by making the tool serve the agent. The paper's "big point" is that we shouldn't just try to build one giant AI that does everything. Instead, the most reliable systems are modular, a stable "brain" surrounded by a bunch of smart, specialized "mini-agents" (tools) that all work together. I have attached the research paper, please go through it if you want to deep dive. It seems interesting to me! Let me know what is your take on it.
35
-
Steve Nemzer
TELUS International • 1K followers
Our innovative benchmark study by principal researcher MohammadReza Saadat explores the behaviors of four SOTA LLM models in the areas of certainty under pressure, sycophancy, and alignment. From the intro: “Consistency and adaptability are desirable: The model should remain steadfast when it is correct, yet appropriately open to self-correction when it is wrong. Understanding these behaviors is critical for establishing trustworthiness and safety in AI. An LLM that sounds confident but yields to user pressure can mislead users or be manipulated into incorrect or harmful statements. For instance, hallucinations — false or made-up facts stated confidently — become more pernicious if the model cannot be steered back to truth when challenged. Conversely, if a model is too eager to please the user, it might agree with false premises or unsafe requests, a phenomenon related to sycophantic behavior…” Learn more about the study design and the model results here: https://lnkd.in/gnGerebB
41
1 Comment -
Gunaputra Nagendra Pavan Yedida
Discensys • 5K followers
I just read “Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs”, which extends Chinchilla-style laws to include architecture choices like hidden size, MLP/attention balance, and GQA, then uses them to find designs that are both cheaper and more accurate than LLaMA-3.2 under the same training budget. 🔗 https://lnkd.in/gYnqdc6R Insights from the paper: 🔹 Conditional scaling law: Augments classic compute-optimal scaling with architectural variables (hidden size, mlp-to-attention ratio, GQA groups) so loss becomes a function of both scale and shape, not just total params/tokens. 🔹 Architecture–efficiency tradeoffs: Training 200+ models (80M–3B, 8B–100B tokens) shows that shifting parameters from MLP to attention, and using well-tuned GQA, can significantly improve accuracy at the same FLOPs and memory footprint. 🔹 Search framework: Fits the conditional law, then searches over design choices to predict Pareto-optimal points, yielding models with up to 2.1% higher accuracy and 42% higher inference throughput than LLaMA-3.2 for the same training compute. 🔹 Practical takeaway: Instead of “Chinchilla but bigger,” the work argues for architecture-aware scaling as a new axis—optimizing how parameters are wired can matter as much as how many you have. This is a good resource if you’re designing custom LLMs, worrying about serving costs, or exploring how scaling laws and architecture search can be combined for inference-efficient models. #AI #MachineLearning #ScalingLaws #LLM #ModelArchitecture #InferenceEfficiency
3
-
Shashank Gaur
Lendbuzz • 902 followers
Exploring the Foundations of LLMs — Through Stanford CS336 Recently, I’ve been following Stanford’s CS336: Large Language Models course—an advanced graduate-level class now publicly available via Stanford Online. As someone deeply interested in machine learning systems and NLP, this has been an incredible opportunity to expand my understanding from the ground up. 📚 Course website: https://lnkd.in/gY_fzNU4 📺 Lectures (free on YouTube): StanfordOnline – CS336 🎯 Key Learnings So Far: 🔹 Tokenization isn’t just preprocessing—it's a critical design decision that affects model performance, vocabulary efficiency, and even generalization. Exploring approaches like byte pair encoding gave me a new appreciation for the intricacies behind input representation. 🔹 Profiling resource usage is essential. The lectures on GPU memory tracking and PyTorch profiling offered valuable insights into how large models interact with hardware—and how to reason about performance bottlenecks early in development. 🔹 LLMs are a full-stack problem. From tokenizer to transformer block to GPU kernel, this course emphasizes that building scalable models requires a systems perspective, not just a model-centric one. 🔹 Hands-on implementation matters. The code-oriented structure of the course pushes learners to go beyond theory and actually build core components of a language model—something I’m already integrating into my own learning process. 🙌 Grateful to Stanford University and the CS336 teaching team—for making this world-class curriculum openly accessible. It's a fantastic resource for any student or practitioner eager to understand the building blocks of today’s AI systems. I’m looking forward to diving deeper into topics like parallelism, custom GPU kernels, evaluation techniques, and alignment strategies in the weeks ahead. If you’re following the course too or working on similar topics, I’d love to connect and learn together! #StanfordCS336 #LLMs #MachineLearning #DeepLearning #NLP #Transformers #AI #PyTorch #OpenLearning #AIEducation #DataScience
6
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content