University of Pennsylvania scientists are using AI to mine Reddit threads to evaluate the safety and side effects of prescription drugs.

Their approach takes advantage of recent advances in large language models (LLMs), a subset of artificial intelligence, to rapidly analyze self-reported information on side effects. This allowed the team to assess 410,198 Reddit posts on GLP-1 drugs like Ozempic between May 2019 to June 2025, as they were becoming wildly popular for weight loss and diabetes.

The study in which they published their findings last month in the scientific journal Nature Health showcased how technology can provide larger-scale, longer-term information on the medication’s real-world usage.

This model is being tested across a wide range of drugs and treatments. Surfing social media for medical concerns still faces limitations such as access to platforms and bias in who is using them. Reddit, for example, skews male, educated, and U.S.-based.

“The accuracy and efficiency and speed at which you can do this has really changed recently,” said Neil Sehgal, first author of the study and a third-year PhD student in computer science.

Researchers aim to complement the Food and Drug Administration’s existing reporting system for unexpected negative outcomes, or adverse events. Regulators primarily rely on mandatory reports from manufacturers and voluntary reports from healthcare professionals and consumers.

The FDA itself is leveraging advances in AI to accelerate early-phase clinical trials, by extracting study data in real-time from electronic health records in a pilot announced this week involving Penn Medicine.

And online search data has been used in public health to study seasonal flu and COVID-19 trends.

Sehgal is leading projects at Penn that analyze social media data on GLP-1 agonists and related drugs. His recently published study, looking at semaglutide and tirzepatide, found that among 67,008 users, 43.5% self-reported at least one side effect. Most were gastrointestinal-related.

That’s in line with existing research, which has found gastrointestinal adverse events to affect 40 to 70% of patients in clinical trials.

» READ MORE: Penn GLP-1 study finds small increase in risk of osteoporosis and gout

He was intrigued by a very small number of people posting about menstrual changes, a potential side effect that’s not been well studied.

His team found that almost 4% of the GLP-1 users with side effects on the Reddit threads analyzed reported symptoms related to menstrual changes like heavy bleeding and irregular cycles.

But that’s not to say the side effects are valid, he said, explaining that the people posting on social media may not represent all GLP-1 users. People could also be lying online, or inadvertently spreading misinformation.

He sees the findings as signals for study using more rigorous research methods.

“It just deserves more research,” Sehgal said.

A complement

AI-powered research on social media posts about a drug is very different than the clinical trials typically required for FDA approval.

A standard randomized controlled trial would assign participants to two groups, where one takes the drug and the other doesn’t. Outcomes are closely monitored and compared to evaluate safety and efficacy.

“Clinical trials are definitely the gold standard, but they have limitations,” Sehgal said.

Most are relatively short, Sehgal said, and may miss longer-term symptoms.

They’re also pretty expensive, usually on the order of millions of dollars, which can limit the number of people enrolled.

One of the larger semaglutide trials had about 17,000 people, Sehgal said. About half the participants were getting the placebo rather than the drug.

His Reddit study, by comparison, captured roughly 70,000 people using the drugs in real life, which can look different than the tightly-controlled environment of a study.

“It’s just a very large sample, and so we can potentially see things that in a clinical trial are very rare,” Sehgal said.

Many online users discuss how they use their drugs, how they get them, and how they might be combining them with other drugs.

Still, it’s hard to draw conclusions from social media data, where those posting do not necessarily represent the range of ages, gender, and racial diversity of the patients who will use the medication.

Researchers also cannot establish causality for outcomes, since they would not have a placebo group.

Potential impact

Graciela Gonzalez-Hernandez, who directs the health artificial intelligence PhD program at Cedars-Sinai Medical Center in Los Angeles, tapped social media research to explore reducing the dosage of an addiction treatment.

The medical guidelines at the time called for cutting pills in halves or fourths, a challenge for patients.

Through Reddit posts, her physician collaborator discovered patients were advising each other to create mini doses by pulverizing their pills, dissolving them in water, and taking little drops.

The patients’ technique ended up being adapted into the guidelines.

“It was literally from the patients to the clinicians to the guidelines, rather than the other way around,” Gonzalez-Hernandez said.

Now Gonzalez-Hernandez is analyzing what patients post on online forums like Reddit and WebMD to understand why patients stop taking medications and how they veer from use instructions when taking GLP-1 drugs and those used for heart and gastrointestinal diseases, bladder cancer, and HIV.

Despite AI advances, social media researchers increasingly face roadblocks in access. Platforms like X (formerly Twitter), Facebook, and Instagram have largely restricted data sharing in recent years, Sehgal said.

Reddit remains one of the few large social media sites that is still readily accessible for research.

In some cases, the Penn team recruits participants online to donate their data. That’s particularly useful for studying ChatGPT interactions, as important health conversations continue to move to chatbots.

“You could imagine a lot of people are talking about their GLP-1 side effects straight to ChatGPT instead of posting online or going to Google,” Sehgal said.