Data visualizations and real-world scenes differ fundamentally in information type, perception modes, and
communicative goals.
This divergence creates tensions in their coordination:
- Semantic gaps between abstract data encoding and concrete scene
semantics.
- Perceptual competition when visual channels overlap.
To address these issues, we conducted a formative study to analyze design components in data
visualizations and real-world scenes,
and to derive coordination relationships from both visual and semantic perspectives.
We collected 54 data videos that integrate visualizations with real-world imagery, identifying recurring
patterns and common coordination strategies.
The full set of cases and coding results is available in the following online table.
Based on corpus analysis, we identified key visual and semantic components from both data visualizations
and real-world scenes and organized them into a design space structured along two main dimensions:
- Visual alignment, which ensures spatial and perceptual consistency.
- Semantic coherence, which maintains meaningful links between data
content and scene context.
Building on prior insights, we developed SceneLoom, a prototype system that implements coordination
strategies through a structured workflow:
- Data preparation: SceneLoom takes narrative text, structured data, and
real-world images as input, extracts narrative features, generates candidate visualizations, and filters
segmented image elements for design coordination.
- Visual perception: A specification format is introduced to encode
visual and semantic properties of both data visualizations and image elements, enabling consistent
interpretation by VLMs.
- Reasoning and mapping: The system aligns visual components using
spatial and semantic cues, supports both data-level and view-level adjustments, invokes tools through
structured prompts, and evaluates results based on accuracy, clarity, and salience.