How do Visual Attributes Influence Web Agents? A Comprehensive Evaluation of User Interface Design Factors

Abstract

Web agents have demonstrated strong performance on a wide range of web-based tasks. However, existing research on the effect of environmental variation has mostly focused on robustness to adversarial attacks, with less attention to agents' preferences in benign scenarios. Although early studies have examined how textual attributes influence agent behavior, a systematic understanding of how visual attributes shape agent decision-making remains limited. To address this, we introduce VAF, a controlled evaluation pipeline for quantifying how webpage Visual Attribute Factors influence web-agent decision-making. Specifically, VAF consists of three stages: (i) variant generation, which ensures the variants share identical semantics as the original item while only differ in visual attributes; (ii) browsing interaction, where agents navigate the page via scrolling and clicking the interested item, mirroring how human users browse online; (iii) validating through both click action and reasoning from agents, which we use the Target Click Rate and Target Mention Rate to jointly evaluate the effect of visual attributes. By quantitatively measuring the decision-making difference between the original and variant, we identify which visual attributes influence agents' behavior most. Extensive experiments, across 8 variant families (48 variants total), 5 real-world websites (including shopping, travel, and news browsing), and 4 representative web agents, show that background color contrast, item size, position, and card clarity have a strong influence on agents' actions, whereas font styling, text color, and item image clarity exhibit minor effects.

Method Overview

Open PDF in new tab

Our pipeline consists of three main phases:

Variant Generation: Automatically generate HTML variants by modifying CSS attributes (color, position, typography, size) while preserving semantic content.
Realistic Browsing Simulation: Simulate realistic web browsing with viewport-based scrolling and interaction, mirroring how humans navigate web pages.
Dual Evaluation: Assess agent behavior using both coordinate-based click accuracy and semantic understanding metrics.

Key Findings

Our extensive experiments across 8 variant families (48 variants total), 5 real-world websites, and 4 representative web agents reveal:

🎨

Strong Influence Factors

Background color contrast, item size, position, and card clarity have a strong influence on agents' actions and decision-making patterns.

📝

Minor Influence Factors

Font styling, text color, and item image clarity exhibit minor effects on agent behavior, suggesting current VLMs process text in abstracted forms.

📊

Evaluation Metrics

We use Target Click Rate and Target Mention Rate to jointly evaluate visual attribute effects, measuring both action and reasoning capabilities.

8 Variant Families - Representative Examples

🎨 Background Color

Strong influence: Color contrast variations

Example: Pink (#e91e63)

📍 Position

Strong influence: Spatial positioning changes

Example: Spotlight Position

📏 Item Size

Strong influence: Different size scales

Example: Large Size (1.5x)

🔍 Card Clarity

Strong influence: Visual saliency

Example: Blur Effect (4px)

📝 Font Styling

Minor influence: Typography changes

Font variations tested:
Comic Sans, Times, Arial, etc.

🎨 Text Color

Minor influence: Text color variations

Color variations tested:
Red, Blue, Purple, Green, etc.

🖼️ Image Clarity

Minor influence: Image quality variations

Blur levels tested:
1px, 2px, 4px, 8px, sharp

🔗 Combinations

Multiple attributes combined

Testing interactions between
multiple visual attributes

📌 Original Baseline

All variants are compared against this original page

📊 Summary: Our experiments across 48 variants (8 families) show that Background Color, Position, Item Size, and Card Clarity strongly influence web agent behavior, while Font Styling, Text Color, and Image Clarity have minimal impact. The Combinations family tests multi-attribute interactions.

Quantitative Results

🎯 Primary Result: Variant Success Rate Heatmap

Statistical significance analysis (p-values) showing how visual attributes influence web agent performance

Open Full Resolution PDF

🔬 Key Insight: This heatmap displays the p-values from statistical tests comparing each variant's Target Click Rate (TCR) against the original page across 5 websites (Amazon, Booking, eBay, NPR, Expedia) and 4 agents. Lower p-values (darker colors) indicate stronger statistical significance, revealing which visual attributes have the most significant impact on agent behavior.

Click Distribution Heatmaps Across Scenarios

Booking.com - Original Page

eBay - Click Heatmap

NPR - Click Heatmap

Agent Click Distribution by Scenario

Amazon - Click Distribution

Expedia - Click Distribution

Variant Performance Comparison

Comparison between the most effective (Best) and least effective (Worst) visual variants in influencing agent decisions.

Top 10 Best Performing Variants

Top 10 Worst Performing Variants

Detailed Analysis: Booking.com

Comprehensive click distribution analysis on Booking.com showing agent attention patterns

Citation

@misc{yu2026visualattributesinfluenceweb,
  title={How do Visual Attributes Influence Web Agents? A Comprehensive Evaluation of User Interface Design Factors}, 
  author={Kuai Yu and Naicheng Yu and Han Wang and Rui Yang and Huan Zhang},
  year={2026},
  eprint={2601.21961},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2601.21961}
}