How do Visual Attributes Influence Web Agents?

A Comprehensive Evaluation of User Interface Design Factors

Kuai Yu2 Naicheng Yu3 Han Wang1 Rui Yang1 Huan Zhang1
1University of Illinois Urbana-Champaign
2Columbia University
3University of California San Diego

Abstract

Web agents have demonstrated strong performance on a wide range of web-based tasks. However, existing research on the effect of environmental variation has mostly focused on robustness to adversarial attacks, with less attention to agents' preferences in benign scenarios. Although early studies have examined how textual attributes influence agent behavior, a systematic understanding of how visual attributes shape agent decision-making remains limited. To address this, we introduce VAF, a controlled evaluation pipeline for quantifying how webpage Visual Attribute Factors influence web-agent decision-making. Specifically, VAF consists of three stages: (i) variant generation, which ensures the variants share identical semantics as the original item while only differ in visual attributes; (ii) browsing interaction, where agents navigate the page via scrolling and clicking the interested item, mirroring how human users browse online; (iii) validating through both click action and reasoning from agents, which we use the Target Click Rate and Target Mention Rate to jointly evaluate the effect of visual attributes. By quantitatively measuring the decision-making difference between the original and variant, we identify which visual attributes influence agents' behavior most. Extensive experiments, across 8 variant families (48 variants total), 5 real-world websites (including shopping, travel, and news browsing), and 4 representative web agents, show that background color contrast, item size, position, and card clarity have a strong influence on agents' actions, whereas font styling, text color, and item image clarity exhibit minor effects.

Method Overview

Our pipeline consists of three main phases:

  1. Variant Generation: Automatically generate HTML variants by modifying CSS attributes (color, position, typography, size) while preserving semantic content.
  2. Realistic Browsing Simulation: Simulate realistic web browsing with viewport-based scrolling and interaction, mirroring how humans navigate web pages.
  3. Dual Evaluation: Assess agent behavior using both coordinate-based click accuracy and semantic understanding metrics.

Key Findings

Our extensive experiments across 8 variant families (48 variants total), 5 real-world websites, and 4 representative web agents reveal:

🎨

Strong Influence Factors

Background color contrast, item size, position, and card clarity have a strong influence on agents' actions and decision-making patterns.

📝

Minor Influence Factors

Font styling, text color, and item image clarity exhibit minor effects on agent behavior, suggesting current VLMs process text in abstracted forms.

📊

Evaluation Metrics

We use Target Click Rate and Target Mention Rate to jointly evaluate visual attribute effects, measuring both action and reasoning capabilities.

8 Variant Families - Representative Examples

🎨 Background Color

Strong influence: Color contrast variations

Background Color Example

Example: Pink (#e91e63)

📍 Position

Strong influence: Spatial positioning changes

Position Example

Example: Spotlight Position

📏 Item Size

Strong influence: Different size scales

Size Example

Example: Large Size (1.5x)

🔍 Card Clarity

Strong influence: Visual saliency

Clarity Example

Example: Blur Effect (4px)

📝 Font Styling

Minor influence: Typography changes

Font variations tested:
Comic Sans, Times, Arial, etc.

🎨 Text Color

Minor influence: Text color variations

Color variations tested:
Red, Blue, Purple, Green, etc.

🖼️ Image Clarity

Minor influence: Image quality variations

Blur levels tested:
1px, 2px, 4px, 8px, sharp

🔗 Combinations

Multiple attributes combined

Testing interactions between
multiple visual attributes

📌 Original Baseline

Original Page

All variants are compared against this original page

📊 Summary: Our experiments across 48 variants (8 families) show that Background Color, Position, Item Size, and Card Clarity strongly influence web agent behavior, while Font Styling, Text Color, and Image Clarity have minimal impact. The Combinations family tests multi-attribute interactions.

Quantitative Results

🎯 Primary Result: Variant Success Rate Heatmap

Statistical significance analysis (p-values) showing how visual attributes influence web agent performance

🔬 Key Insight: This heatmap displays the p-values from statistical tests comparing each variant's Target Click Rate (TCR) against the original page across 5 websites (Amazon, Booking, eBay, NPR, Expedia) and 4 agents. Lower p-values (darker colors) indicate stronger statistical significance, revealing which visual attributes have the most significant impact on agent behavior.

Click Distribution Heatmaps Across Scenarios

Booking.com Click Distribution

Booking.com - Original Page

eBay Click Distribution

eBay - Click Heatmap

NPR Click Distribution

NPR - Click Heatmap

Agent Click Distribution by Scenario

Amazon Click Distribution

Amazon - Click Distribution

Expedia Click Distribution

Expedia - Click Distribution

Variant Performance Comparison

Comparison between the most effective (Best) and least effective (Worst) visual variants in influencing agent decisions.

Best Performing Variants

Top 10 Best Performing Variants

Worst Performing Variants

Top 10 Worst Performing Variants

Detailed Analysis: Booking.com

Booking Comprehensive Analysis

Comprehensive click distribution analysis on Booking.com showing agent attention patterns

Citation

@misc{yu2026visualattributesinfluenceweb,
  title={How do Visual Attributes Influence Web Agents? A Comprehensive Evaluation of User Interface Design Factors}, 
  author={Kuai Yu and Naicheng Yu and Han Wang and Rui Yang and Huan Zhang},
  year={2026},
  eprint={2601.21961},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2601.21961}
}