Modern Australian
Times Advertising

PolyU-led research reveals that sensory and motor inputs help large language models represent complex concepts

HONG KONG SAR - Media OutReach Newswire - 9 June 2025 - Can one truly understand what "flower" means without smelling a rose, touching a daisy or walking through a field of wildflowers? This question is at the core of a rich debate in philosophy and cognitive science.

While embodied cognition theorists argue that physical, sensory experience is essential to concept formation, studies of the rapidly evolving large language models (LLMs) suggest that language alone can build deep, meaningful representations of the world.

A research team led by Prof. Li Ping, Sin Wai Kin Foundation Professor in Humanities and Technology, Dean of the PolyU Faculty of Humanities and Associate Director of the PolyU-Hangzhou Technology and Innovation Research Institute, explored the similarities between large language models and human representations, shedding new light on the extent to which language alone can shape the formation and learning of complex conceptual knowledge.
A research team led by Prof. Li Ping, Sin Wai Kin Foundation Professor in Humanities and Technology, Dean of the PolyU Faculty of Humanities and Associate Director of the PolyU-Hangzhou Technology and Innovation Research Institute, explored the similarities between large language models and human representations, shedding new light on the extent to which language alone can shape the formation and learning of complex conceptual knowledge.

By exploring the similarities between LLMs and human representations, researchers at The Hong Kong Polytechnic University (PolyU) and their collaborators have shed new light on the extent to which language alone can shape the formation and learning of complex conceptual knowledge. Their findings also revealed how the use of sensory input for grounding or embodiment – connecting abstract with concrete concepts during learning – affects the ability of LLMs to understand complex concepts and form human-like representations. The study, in collaboration with scholars from Ohio State University, Princeton University and City University of New York, was recently published in Nature Human Behaviour.

Led by Prof. LI Ping, Sin Wai Kin Foundation Professor in Humanities and Technology, Dean of the PolyU Faculty of Humanities and Associate Director of the PolyU-Hangzhou Technology and Innovation Research Institute, the research team selected conceptual word ratings produced by state-of-the-art LLMs, namely ChatGPT (GPT-3.5, GPT-4) and Google LLMs (PaLM and Gemini). They compared them with human-generated word ratings of around 4,500 words across non-sensorimotor (e.g., valence, concreteness, imageability), sensory (e.g., visual, olfactory, auditory) and motor domains (e.g., foot/leg, mouth/throat) from the highly reliable and validated Glasgow Norms and Lancaster Norms datasets.

The research team first compared pairs of data from individual humans and individual LLM runs to discover the similarity between word ratings across each dimension in the three domains, using results from human-human pairs as the benchmark. This approach could, for instance, highlight to what extent humans and LLMs agree that certain concepts are more concrete than others. However, such analyses might overlook how multiple dimensions jointly contribute to the overall representation of a word. For example, the word pair "pasta" and "roses" might receive equally high olfactory ratings, but "pasta" is in fact more similar to "noodles" than to "roses" when considering appearance and taste. The team therefore conducted representational similarity analysis of each word as a vector along multiple attributes of non-sensorimotor, sensory and motor dimensions for a more complete comparison between humans and LLMs.

The representational similarity analyses revealed that word representations produced by the LLMs were most similar to human representations in the non-sensorimotor domain, less similar for words in sensory domain and most dissimilar for words in motor domain. This highlights LLM limitations in fully capturing humans' conceptual understanding. Non-sensorimotor concepts are understood well but LLMs fall short when representing concepts involving sensory information like visual appearance and taste, and body movement. Motor concepts, which are less described in language and rely heavily on embodied experiences, are even more challenging to LLMs than sensory concepts like colour, which can be learned from textual data.

In light of the findings, the researchers examined whether grounding would improve the LLMs' performance. They compared the performance of more grounded LLMs trained on both language and visual input (GPT-4, Gemini) with that of LLMs trained on language alone (GPT-3.5, PaLM). They discovered that the more grounded models incorporating visual input exhibited a much higher similarity with human representations.

Prof. Li Ping said, "The availability of both LLMs trained on language alone and those trained on language and visual input, such as images and videos, provides a unique setting for research on how sensory input affects human conceptualisation. Our study exemplifies the potential benefits of multimodal learning, a human ability to simultaneously integrate information from multiple dimensions in the learning and formation of concepts and knowledge in general. Incorporating multimodal information processing in LLMs can potentially lead to a more human-like representation and more efficient human-like performance in LLMs in the future."

Interestingly, this finding is also consistent with those of previous human studies indicating the representational transfer. Humans acquire object-shape knowledge through both visual and tactile experiences, with seeing and touching objects activating the same regions in human brains. The researchers pointed out that – as in humans – multimodal LLMs may use multiple types of input to merge or transfer representations embedded in a continuous, high-dimensional space. Prof. Li added, "The smooth, continuous structure of embedding space in LLMs may underlie our observation that knowledge derived from one modality could transfer to other related modalities. This could explain why congenitally blind and normally sighted people can have similar representations in some areas. Current limits in LLMs are clear in this respect".

Ultimately, the researchers envision a future in which LLMs are equipped with grounded sensory input, for example, through humanoid robotics, allowing them to actively interpret the physical world and act accordingly. Prof. Li said, "These advances may enable LLMs to fully capture embodied representations that mirror the complexity and richness of human cognition, and a rose in LLM's representation will then be indistinguishable from that of humans."

Hashtag: #PolyU #HumanCognition #LargeLanguageModels #LLMs #GenerativeAI

The issuer is solely responsible for the content of this announcement.

The Ultimate Guide to Automating Your Weekend Yard Chores

We all look forward to the weekend as a chance to unwind after a long week of work. You probably picture yourself relaxing on the patio with a cold ...

How Ignoring Regular Car Servicing Can Lead to Costly Repairs

Owning a car gives you a sweet sense of freedom and comfort. You can go wherever you want, whenever you want. But with that freedom comes responsibili...

Someone Trips at Your Fundraiser. Now What? Understanding Public Liability for NFPs

Three months of planning. Volunteers giving up their weekends. Sponsorships chased, catering sorted, tables decorated. And then, about an hour into ...

Stainless Steel Tube: A Complete Specification Guide for Engineers, Project Managers, and Industrial Buyers

Few materials in the industrial and manufacturing world are as universally relied upon — or as frequently misspecified — as stainless steel tube...

How to Choose the Right Barber Shears Scissors for Professional Results

Since a barber is only as good as their tool, choosing the right barber shear scissor must not be taken lightly. Most barbers end up buying the first ...

Why Commercial Construction Companies Play A Critical Role In Modern Urban Development

Urban development requires highly organised planning, engineering expertise, and professional construction teams capable of delivering complex build...

Essential Features for Comfortable Family Caravan Trips

Choosing the right van for family travel requires careful consideration of how the space will be used on a daily basis. Families have specific needs...

Chatswood Tutor: Helping Students Achieve Academic Success With Personalised Learning

Education plays a crucial role in shaping a student’s future, and many students benefit from additional academic support outside the classroom. A pr...

How External Consulting Can Guide Enterprise IT Strategy and Procurement

Internal IT teams carry deep operational knowledge, but that familiarity can create blind spots in strategic decisions. An external IT consultant br...

Why Sports Nutrition Australia Is Important for Performance and Recovery

Athletes and fitness enthusiasts place significant demands on their bodies during training and competition. Maintaining energy levels, supporting mu...

How Body Contouring Bundoora Helps Improve Shape And Confidence

Modern aesthetic treatments have made it possible to refine body shape without the need for invasive surgery. One of the most popular non-surgical o...

Why Plantation Shutters Are a Stylish and Practical Choice for Modern Homes

Window coverings play a major role in the comfort, privacy, and overall design of a home. Homeowners often look for solutions that provide both visu...

Why a Retractable Hose Reel Is Essential for Efficient Water Management

Managing hoses efficiently is important for both residential and commercial environments. Whether watering gardens, cleaning outdoor areas, or maint...

Best Ways to Trade In Your Old Tech for Cash in Australia

Upgrading your mobile is exciting, but many Australians are left wondering what to do with the device they no longer use. Instead of leaving it in a...

Why Doctors in Bundoora Play an Important Role in Community Health

Access to quality healthcare is essential for maintaining a healthy lifestyle and managing medical conditions effectively. Visiting experienced doctor...

Backyard Aesthetics Decoded: Mediterranean, Coastal, Retro, Rustic, and Beyond

Backyard design has come a long way from a patch of lawn, a barbecue in the corner, and a few chairs chosen purely for practicality. Today, outdoor ...

What Stops a Home From Feeling Flat-Pack Generic

There is nothing wrong with convenience. Flat-pack furniture, fast styling decisions, and online checkouts have made it easier than ever to furnish ...

5 Best Dental Clinics in Beecroft, NSW

The best dental clinics in Beecroft, NSW are Beecroft Smiles Dental Surgery, Beecroft Elegant Dental Clinic, McConnell Dental, Dentistry for Life, a...