Web Scraping Functions

Overview

SheetMagic provides powerful web scraping functions to extract content from any webpage. These functions handle JavaScript-rendered pages and include intelligent caching for performance.

VISIT

Extract content from any webpage for research and analysis. Returns clean Markdown by default, or raw HTML if specified. Use a CSS selector to extract specific elements from HTML.

Syntax

=VISIT(url, [format], [selector])

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	Website URL to extract content from
`format`	string	No	Output format: `"markdown"` (default) or `"html"` for raw HTML. Case-insensitive.
`selector`	string	No	CSS selector to extract specific elements. Only works with `"html"` format.

Returns

Content from the webpage (up to 50,000 characters) in the specified format:

Markdown (default): Clean, readable text with Markdown formatting
HTML: Raw HTML content from the page
HTML with selector: Inner HTML of matching elements (multiple matches separated by newlines)

Examples

Basic usage (returns Markdown):

=VISIT("https://example.com/article")

Get raw HTML:

=VISIT("https://example.com/article", "html")

Extract specific elements with CSS selector:

=VISIT("https://example.com", "html", "div.content > p.intro")

Extract all prices from a page:

=VISIT(A1, "html", ".price")

Extract product titles:

=VISIT(A1, "html", "h2.product-title")

From cell reference:

=VISIT(A1)

From cell reference with HTML output:

=VISIT(A1, "HTML")

Features

Handles JavaScript-rendered pages
30-minute user-specific caching
Flexible output format (Markdown or HTML)
CSS selector support for extracting specific elements
Removes navigation, ads, and boilerplate (in Markdown mode)

CSS Selector Examples

Selector	Description
`.classname`	Elements with specific class
`#idname`	Element with specific ID
`div.product`	Div elements with product class
`div.content > p`	Direct child `<p>` elements of div.content
`[data-price]`	Elements with specific attribute

The selector parameter only works with "html" format. If used with "markdown", the selector is ignored.

Use Cases

Content research
Competitor analysis
Article extraction
Documentation scraping
Data collection
HTML parsing and analysis
Price monitoring
Targeted element extraction

GETSELECTOR

Get specific elements from a page using a CSS selector.

Syntax

=GETSELECTOR(url, selector)

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	URL to scrape
`selector`	string	Yes	CSS selector to match elements

Returns

An array of text content from all matching elements.

Supported Selectors

Selector Type	Example	Description
Class	`.classname`	Elements with specific class
ID	`#idname`	Element with specific ID
Element	`div`, `span`, `p`	All elements of that type
Attribute	`[data-price]`	Elements with specific attribute
Combined	`div.product`	Div elements with product class
Nested	`div.product h2`	H2 inside product divs

Examples

Get all prices:

=GETSELECTOR("https://example.com/products", ".price")

Get product titles:

=GETSELECTOR(A1, "h2.product-title")

Get elements by attribute:

=GETSELECTOR(A1, "[data-product-id]")

Get nested elements:

=GETSELECTOR(A1, ".product-card .description")

Use Cases

Price monitoring
Product data extraction
Review scraping
Contact information extraction
Structured data collection

GETIMG

Extract all images from a webpage.

Syntax

=GETIMG(url)

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	URL to scrape images from

Returns

An array of image URLs (capped at 1,000 images).

Features

Handles responsive images (srcset, data-srcset)
Lazy loading support (data-src)
Automatic relative URL resolution
Selects largest image from srcset

Examples

Get all images from a page:

=GETIMG("https://example.com/gallery")

From cell reference:

=GETIMG(A1)

Use Cases

Image collection
Asset auditing
Visual content analysis
Media library building

Caching Behavior

All web scraping functions include intelligent caching to improve performance and reduce redundant requests.

Cache Type	Duration	Description
User Cache	30 minutes	Results cached per user
Script Cache	10 minutes	Shared across all users

To force a fresh fetch, append a unique query parameter to the URL:

=VISIT(A1 & "?refresh=" & NOW())

Best Practices

1. Use Specific Selectors

// Less effective - too broad
=GETSELECTOR(A1, "div")

// More effective - specific class
=GETSELECTOR(A1, "div.product-price")

2. Handle Multiple Results

GETSELECTOR returns an array. Use with other Sheets functions:

// Get first result
=INDEX(GETSELECTOR(A1, ".price"), 1)

// Count results
=COUNTA(GETSELECTOR(A1, ".price"))

3. Combine with AI Functions

// Summarize scraped content
=AITEXT("Summarize: " & VISIT(A1))

// Analyze extracted data
=AITEXT("Analyze these prices: " & JOIN(", ", GETSELECTOR(A1, ".price")))

4. Respect Rate Limits

Avoid scraping too many URLs simultaneously. Space out requests to prevent blocking.

Error Handling

Common errors and solutions:

Error	Cause	Solution
`#ERROR!`	URL not accessible	Verify URL is valid and publicly accessible
`#BLOCKED!`	Site blocking requests	Some sites block automated access
`#TIMEOUT!`	Request timed out	Try again later or check URL
`#EMPTY!`	No content found	Verify selector matches elements

Limitations

JavaScript-heavy sites: Some dynamic content may not be captured
Authentication: Cannot access pages requiring login
Rate limiting: Sites may block excessive requests
Size limits: Content truncated at 50,000 characters
Image limits: Maximum 1,000 images returned

Web Scraping Functions

On this page