How I Perceive It augments machine visual interpretation with hu-
man memory, shifting from superficial seeing to deep perceiving.
Although today’s vision language models (VLMs) can generate im-
age captions with a degree of subjectivity, they still struggle to
explain the underlying reasons or experiential basis for such sub-
jectivity. Machines can see, but they do not perceive as humans do,
who link perception with prior experience and memory. To bridge
this gap, this paper introduces a visual interpretation system that
integrates individual memory into machine perception, founded on
structure-mapping theory. By merging what the machine sees with
what the individual remembers, the system produces individual-
ized interpretations that uncover more insightful meanings among
visual elements that are not immediately visible on the surface.
How I Perceive It: Human Memory-Augmented Analogical Reasoning for Machine Visual Interpretation
Zhuodi Cai. 2025. How I Perceive It: Human Memory-Augmented Analogical Reasoning for Machine Visual Interpretation. In SA ’25: SIGGRAPH Asia 2025 Art Papers, Hong Kong, Hong Kong. ACM, New York, NY, USA. https://doi.org/10.1145/3757369.3767595