Web Scraping Functions
Extract content from webpages with VISIT, GETSELECTOR, and GETIMG functions
Overview
SheetMagic provides powerful web scraping functions to extract content from any webpage. These functions handle JavaScript-rendered pages and include intelligent caching for performance.
VISIT
Extract content from any webpage for research and analysis. Returns clean Markdown by default, or raw HTML if specified. Use a CSS selector to extract specific elements from HTML.
Syntax
=VISIT(url, [format], [selector])Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Website URL to extract content from |
format | string | No | Output format: "markdown" (default) or "html" for raw HTML. Case-insensitive. |
selector | string | No | CSS selector to extract specific elements. Only works with "html" format. |
Returns
Content from the webpage (up to 50,000 characters) in the specified format:
- Markdown (default): Clean, readable text with Markdown formatting
- HTML: Raw HTML content from the page
- HTML with selector: Inner HTML of matching elements (multiple matches separated by newlines)
Examples
Basic usage (returns Markdown):
=VISIT("https://example.com/article")Get raw HTML:
=VISIT("https://example.com/article", "html")Extract specific elements with CSS selector:
=VISIT("https://example.com", "html", "div.content > p.intro")Extract all prices from a page:
=VISIT(A1, "html", ".price")Extract product titles:
=VISIT(A1, "html", "h2.product-title")From cell reference:
=VISIT(A1)From cell reference with HTML output:
=VISIT(A1, "HTML")Features
- Handles JavaScript-rendered pages
- 30-minute user-specific caching
- Flexible output format (Markdown or HTML)
- CSS selector support for extracting specific elements
- Removes navigation, ads, and boilerplate (in Markdown mode)
CSS Selector Examples
| Selector | Description |
|---|---|
.classname | Elements with specific class |
#idname | Element with specific ID |
div.product | Div elements with product class |
div.content > p | Direct child <p> elements of div.content |
[data-price] | Elements with specific attribute |
The selector parameter only works with "html" format. If used with "markdown", the selector is ignored.
Use Cases
- Content research
- Competitor analysis
- Article extraction
- Documentation scraping
- Data collection
- HTML parsing and analysis
- Price monitoring
- Targeted element extraction
GETSELECTOR
Get specific elements from a page using a CSS selector.
Syntax
=GETSELECTOR(url, selector)Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to scrape |
selector | string | Yes | CSS selector to match elements |
Returns
An array of text content from all matching elements.
Supported Selectors
| Selector Type | Example | Description |
|---|---|---|
| Class | .classname | Elements with specific class |
| ID | #idname | Element with specific ID |
| Element | div, span, p | All elements of that type |
| Attribute | [data-price] | Elements with specific attribute |
| Combined | div.product | Div elements with product class |
| Nested | div.product h2 | H2 inside product divs |
Examples
Get all prices:
=GETSELECTOR("https://example.com/products", ".price")Get product titles:
=GETSELECTOR(A1, "h2.product-title")Get elements by attribute:
=GETSELECTOR(A1, "[data-product-id]")Get nested elements:
=GETSELECTOR(A1, ".product-card .description")Use Cases
- Price monitoring
- Product data extraction
- Review scraping
- Contact information extraction
- Structured data collection
GETIMG
Extract all images from a webpage.
Syntax
=GETIMG(url)Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to scrape images from |
Returns
An array of image URLs (capped at 1,000 images).
Features
- Handles responsive images (srcset, data-srcset)
- Lazy loading support (data-src)
- Automatic relative URL resolution
- Selects largest image from srcset
Examples
Get all images from a page:
=GETIMG("https://example.com/gallery")From cell reference:
=GETIMG(A1)Use Cases
- Image collection
- Asset auditing
- Visual content analysis
- Media library building
Caching Behavior
All web scraping functions include intelligent caching to improve performance and reduce redundant requests.
| Cache Type | Duration | Description |
|---|---|---|
| User Cache | 30 minutes | Results cached per user |
| Script Cache | 10 minutes | Shared across all users |
To force a fresh fetch, append a unique query parameter to the URL:
=VISIT(A1 & "?refresh=" & NOW())Best Practices
1. Use Specific Selectors
// Less effective - too broad
=GETSELECTOR(A1, "div")
// More effective - specific class
=GETSELECTOR(A1, "div.product-price")2. Handle Multiple Results
GETSELECTOR returns an array. Use with other Sheets functions:
// Get first result
=INDEX(GETSELECTOR(A1, ".price"), 1)
// Count results
=COUNTA(GETSELECTOR(A1, ".price"))3. Combine with AI Functions
// Summarize scraped content
=AITEXT("Summarize: " & VISIT(A1))
// Analyze extracted data
=AITEXT("Analyze these prices: " & JOIN(", ", GETSELECTOR(A1, ".price")))4. Respect Rate Limits
Avoid scraping too many URLs simultaneously. Space out requests to prevent blocking.
Error Handling
Common errors and solutions:
| Error | Cause | Solution |
|---|---|---|
#ERROR! | URL not accessible | Verify URL is valid and publicly accessible |
#BLOCKED! | Site blocking requests | Some sites block automated access |
#TIMEOUT! | Request timed out | Try again later or check URL |
#EMPTY! | No content found | Verify selector matches elements |
Limitations
- JavaScript-heavy sites: Some dynamic content may not be captured
- Authentication: Cannot access pages requiring login
- Rate limiting: Sites may block excessive requests
- Size limits: Content truncated at 50,000 characters
- Image limits: Maximum 1,000 images returned