SceneLoom: Communicating Data with Scene Context

Introduction

In data-driven storytelling contexts such as data journalism and data videos, data visualizations are often presented alongside real-world imagery to support narrative context. However, these visualizations and contextual images typically remain separated, limiting their combined narrative expressiveness and engagement. Achieving this is challenging due to the need for fine-grained alignment and creative ideation. To address this, we present SceneLoom, a Vision-Language Model (VLM)-powered system that facilitates the coordination of data visualization with real-world imagery based on narrative intents.

Methods

Data visualizations and real-world scenes differ fundamentally in information type, perception modes, and communicative goals. This divergence creates tensions in their coordination:

Semantic gaps between abstract data encoding and concrete scene semantics.
Perceptual competition when visual channels overlap.

Formative Study

To address these issues, we conducted a formative study to analyze design components in data visualizations and real-world scenes, and to derive coordination relationships from both visual and semantic perspectives.

We collected 54 data videos that integrate visualizations with real-world imagery, identifying recurring patterns and common coordination strategies. The full set of cases and coding results is available in the following online table.

Based on corpus analysis, we identified key visual and semantic components from both data visualizations and real-world scenes and organized them into a design space structured along two main dimensions:

Visual alignment, which ensures spatial and perceptual consistency.
Semantic coherence, which maintains meaningful links between data content and scene context.

Prototype System

Building on prior insights, we developed SceneLoom, a prototype system that implements coordination strategies through a structured workflow:

Data preparation: SceneLoom takes narrative text, structured data, and real-world images as input, extracts narrative features, generates candidate visualizations, and filters segmented image elements for design coordination.
Visual perception: A specification format is introduced to encode visual and semantic properties of both data visualizations and image elements, enabling consistent interpretation by VLMs.
Reasoning and mapping: The system aligns visual components using spatial and semantic cues, supports both data-level and view-level adjustments, invokes tools through structured prompts, and evaluates results based on accuracy, clarity, and salience.

BibTeX

@misc{gao2025sceneloomcommunicatingdatascene,
      title={SceneLoom: Communicating Data with Scene Context}, 
      author={Lin Gao and Leixian Shen and Yuheng Zhao and Jiexiang Lan and Huamin Qu and Siming Chen},
      year={2025},
      eprint={2507.16466},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
      url={https://arxiv.org/abs/2507.16466}, 
}