Improving the models — smaller data, bigger dreams
Part of the Learning with the Machines feature on how UIC educators and researchers are exploring the impact of large language models
For 30 years, Barbara Di Eugenio has studied natural-language processing, the computer science research field that examines how computers make sense of text. She’s developed new methods and studied its applications for tutoring students, summarizing complex health information and answering patients’ medical questions through chatbots. But when the current wave of large language models caused a wave of media hype, she had mixed feelings.
“I’ve been doing NLP all my life. So, I should have been happy when ChatGPT came out, because it became so much easier to explain to people what NLP is about,” said Di Eugenio, professor of computer science at UIC. “I’m not against using this latest technology, but the black box aspect of it really bothers me.”
Her misgivings stem from the way these models operate. The model beneath the current iteration of ChatGPT was trained on sources from the internet and digitized books. Through machine learning — the statistical process of learning associations between points of data — it assigns billions of parameters to those bits of text. It then creates new language through a complicated process of prediction, guessing what the correct next word is given the preceding text.
Asking these massive models a simple question is a bit of data overkill — using a model trained on everything to come up with an answer on one specific topic. How it arrives at its response using those billions of parameters is almost impossible to explain, which limits its use in high-stakes situations.
“I want to know why they come to a certain conclusion,” Di Eugenio said. “That’s very important to understand, especially in a health care setting. I don’t want to use a black box. I want to know what’s going in and how it was trained.”
In Di Eugenio’s work developing AI assistants for heart failure patients, the chatbots are trained only on ecologically valid data — interactions between patient educators and real patients. Because of the diverse patient population served by UI Health, these conversations reflect a wide range of educational, socioeconomic and cultural backgrounds that can be drowned out in a less discriminate language model trained on text directed at primarily highly educated audiences. Her approach emphasizes the value of small data in an era of big data.
“ChatGPT can provide you answers on the low-salt diet; it’s very well written, it feels like you’re reading a website, right? It is like a lecture. It’s not like an interactive conversation,” Di Eugenio said. “We want to personalize these interactions for different populations and be effective for these populations.”
Another common criticism of large language models is their tendency to “hallucinate” — providing fake citations, quoting fake statistics and confidently stating incorrect facts. Scientists hope that continued data training and feedback from users telling the model when it’s wrong will eventually eliminate these errors. But Moontae Lee, assistant professor of information and decision sciences in the UIC College of Business Administration, wants to revisit this tendency of plausible but not perfectly correct responses, expanding the utility of generative AI.
“For the creative process, hallucination could be our friend because one can generate something new from already existing pieces,” Lee said.
This phenomenon has already been embraced by some users of generative models for creating images, where the unexpected results of a particular prompt can produce fascinating weirdness. In text models, Lee sees a role of these unanticipated outputs for idea generation. A scientist, for example, could query a chatbot for new experiments to explore or counterfactual questions — interpolation or extrapolation generated by large language models might turn out to be a promising hypothesis.
The trick is creating language models that can be tuned to a user’s expectations, whether they want well-supported facts or more imaginative replies — what Lee calls the insightful process versus the inspirational process.
“For information seeking or scientific reasoning, hallucination is obviously not our friend,” Lee said. “These two different things are going to be nontrivial to be implemented by one model. So, that’s why my goal is to build a new model that can perform adaptively on both and possibly various scenarios.”
While UIC computer scientists look to the future of language models in their research, they must also grapple with what they mean for education in the field. Since the models have already proven themselves adept at writing working code for a variety of functions, the quest is how to make students more than just the person who feeds commands into AI.
For Chris Kanich, associate professor of computer science, the solution is making courses more engaging and inspiring, while focusing on the higher-level thinking that is supported by tools such as ChatGPT, rather than teaching students to merely become “AI drivers.”
“When somebody is an engineer, we don’t call them a calculator operator. But you will inevitably need to learn how to use GPT, just like you had to learn how to use a calculator in order to accomplish things before,” Kanich said. “I want us to be able to train people to do that special, unique thing that’s above that, because they’re going to be the ones that are driving knowledge and research forward.”