@mcp-b/smart-dom-reader extracts DOM structure optimized for AI/LLM consumption. It provides stable CSS selectors, interactive element maps, and semantic page structure while minimizing token usage. Zero runtime dependencies.
Installation
Minimal example
Two extraction approaches
Full extraction (SmartDOMReader)
Single-pass extraction of all elements. Use when you need everything upfront.
Progressive extraction (ProgressiveExtractor)
Step-by-step extraction for token-sensitive AI workflows.
extractStructure accepts a Document or an Element to scope the extraction.
Extraction modes
| Mode | Includes |
|---|---|
interactive | Buttons, links, form inputs, clickable elements, form structures |
full | Everything in interactive plus headings, images, tables, lists, articles, metadata |
Options
| Option | Type | Default | Description |
|---|---|---|---|
mode | 'interactive' | 'full' | 'interactive' | Extraction mode |
maxDepth | number | 5 | Maximum DOM traversal depth |
includeHidden | boolean | false | Include hidden elements |
includeShadowDOM | boolean | true | Traverse shadow DOM roots |
includeIframes | boolean | false | Traverse iframes |
viewportOnly | boolean | false | Only extract elements in the visible viewport |
mainContentOnly | boolean | false | Focus on the detected main content area |
customSelectors | string[] | [] | Additional CSS selectors to extract |
Output structure
SmartDOMResult
ExtractedElement
true, saving tokens.
Selector ranking
Selectors are ranked by stability (higher score = more reliable):| Strategy | Score | Example |
|---|---|---|
| ID | 100 | #unique-id |
data-testid | 90 | [data-testid="submit"] |
| ARIA (role + label) | 80 | [role="button"][aria-label="Submit"] |
| Name/ID attributes | 70 | input[name="email"] |
| Class paths | 50 | .form-container .submit-btn |
Exported modules
| Export | Kind | Description |
|---|---|---|
SmartDOMReader | Class | Full single-pass extraction |
ProgressiveExtractor | Class | Step-by-step extraction |
SelectorGenerator | Class | CSS/XPath selector generation with ranking |
ContentDetection | Class | Landmark detection and main content identification |
MarkdownFormatter | Class | Format extraction results as Markdown |
MarkdownFormatOptions | Type | Options for Markdown formatting |
SmartDOMResult | Type | Full extraction result shape |
ExtractedElement | Type | Single extracted element |
ExtractionOptions | Type | Options accepted by the reader |
ExtractionMode | Type | 'interactive' | 'full' |
FormInfo | Type | Form metadata (selector, inputs, buttons, action, method) |
PageState | Type | Current page state (URL, title, loading, errors) |
PageLandmarks | Type | Detected page landmarks |
Bundle string export
A self-contained IIFE bundle is available for injection into pages (e.g. viachrome.userScripts.execute):
MCP server
An optional MCP server returns XML-wrapped Markdown. Output format:| Tool | Parameters |
|---|---|
browser_connect | headless?, executablePath? |
browser_navigate | url |
dom_extract_structure | selector?, detail?, maxTextLength?, maxElements? |
dom_extract_region | selector, options? |
dom_extract_content | selector, options? |
dom_extract_interactive | selector?, options? |
browser_screenshot | path?, fullPage? |
browser_close | (none) |
dom_extract_structureto get the page outlinedom_extract_regionto get selectors for a target area- Write a script using those selectors
- Optionally
dom_extract_contentfor readable text
Related
- @mcp-b/extension-tools (uses
smart-dom-readerfor DOM extraction tools) - @mcp-b/chrome-devtools-mcp (DevTools-based browser automation)
