Reading the page
Sometimes a script needs to look at the page rather than change it — to feed an LLM the current state, diff a layout between two snapshots, figure out what's under a coordinate, or read back what the user highlighted. WRC ships a small set of read-only methods for exactly that. They all return data and never poke the page.
GetObservationis the easy on-ramp: a compact, agent-friendly summary of what's visible right now.GetDOMis the full CDP DOM tree as JSON — use when you need every node, not when you need a quick overview.GetDOMHashis a 16-char fingerprint of the DOM tree — pair it withGetDOMto skip unchanged snapshots.InspectAtPosition,HighlightNodeandGetSelectioncover the live-UI cases: hit-test, debug overlay, copy what's selected.- Reading methods don't wait. Pair them with
Waitwhenever you depend on something specific being there.
GetObservation — the agent-friendly summary
The first thing to reach for, especially in agent/LLM loops. The server filters the page down to elements that are actually visible and interactable, then hands back two views of that same data:
Text— a compact human-readable rendering, ideal for stuffing straight into a prompt.Json— the same data structured (frames, elements, counts, limits), ideal for code that needs to walk it.
obs, err := browser.GetObservation(ctx, 0, 0) // 0/0 = server defaults
if err != nil {
log.Fatal(err)
}
fmt.Println(obs.Text) // for the LLM
_ = obs.Json // structured payloadconst obs = await browser.getObservation(); // all defaults
console.log(obs.text); // for the LLM
const _structured = obs.json; // structured payloadTwo knobs cap the size of the output so big pages don't blow up your prompt budget:
maxElementsPerFrame— hard cap on how many elements the server emits per frame. Default 500.maxTextLength— per-element text content cap, in characters. Default 240.
// Keep it tight for a chat model.
obs, _ := browser.GetObservation(ctx, 200, 80)// Keep it tight for a chat model.
const obs = await browser.getObservation({
maxElementsPerFrame: 200,
maxTextLength: 80,
});Every element in the observation carries a backendNodeId and the
frameId it lives in — exactly what you need to act on it afterwards
with Node(...):
// Parse the JSON, pick the element your agent decided on,
// then act on it directly — no second lookup required.
_, _ = browser.Click(ctx, wrc.Node(decision.BackendNodeId).InFrame(decision.FrameId))// Parse the JSON, pick the element your agent decided on,
// then act on it directly — no second lookup required.
await browser.click(node(decision.backendNodeId).inFrame(decision.frameId));GetDOM — the full CDP tree
When the observation isn't enough — you need every node, the
nesting, the full attributes — switch to GetDOM. The payload is a
JSON string in standard CDP DOM.Node shape, with same-origin
<iframe> / <frame> / <object> children inlined into the same
tree. Cross-origin (out-of-process) frames stop the tree; call
GetDOM again with that frame's frameId to descend.
// Full tree of the main frame.
domJson, err := browser.GetDOM(ctx, "", -1)
if err != nil {
log.Fatal(err)
}
// domJson is a CDP DOM.Node tree — feed it into anything that speaks CDP.// Full tree of the main frame.
const { dom } = await browser.getDOM();
// dom is a CDP DOM.Node tree — feed it into anything that speaks CDP.The return shape differs across the two SDKs in one cosmetic way: Go
gives you the JSON string directly, TypeScript wraps it in a
{ dom, hash } object where hash is reserved for future use and is
not populated by this call. Either way, when you want the
fingerprint, call GetDOMHash.
Two parameters tune the call:
frameId(Go: positional, TS: positional) — empty / omitted targets the main frame. Use a specific frame's id to descend into an OOPIF.depth(Go: positional, TS:opts.depth) —-1for the full tree,0for the root only,Nfor the root plus N descendant levels.
// Just the top two levels — cheap probe before pulling the whole thing.
shallow, _ := browser.GetDOM(ctx, "", 2)// Just the top two levels — cheap probe before pulling the whole thing.
const { dom } = await browser.getDOM("", { depth: 2 });GetDOMHash — the change detector
GetDOMHash returns the first 8 bytes of sha256(dom) as a 16-char
hex string. Computing the hash on the server is dramatically cheaper
than transferring the full tree, which makes it the right primitive
for "did anything change since I last looked?" polling loops.
var lastHash string
for {
hash, err := browser.GetDOMHash(ctx, "")
if err != nil {
return err
}
if hash != lastHash {
lastHash = hash
domJson, _ := browser.GetDOM(ctx, "", -1)
process(domJson)
}
time.Sleep(500 * time.Millisecond)
}let lastHash = "";
while (running) {
const hash = await browser.getDOMHash();
if (hash !== lastHash) {
lastHash = hash;
const { dom } = await browser.getDOM();
process(dom);
}
await sleep(500);
}Don't reach for this as a wait substitute. If you're trying to wait
for "the page to stop changing", use Wait with a CSS or JS
condition for the actual element you care about — see
Waiting.
InspectAtPosition — what's under (x, y)?
A hit-test at a viewport-relative pixel coordinate, returning the
topmost element under that point. Elements with
pointer-events: none are skipped, so the result is the actual click
target — not the visually-topmost node. This is what the live-UI
hover overlay calls under the hood.
res, err := browser.InspectAtPosition(ctx, 200, 300)
if err != nil {
log.Fatal(err)
}
fmt.Println(res.TagName, res.TextContent)
// res.BackendNodeId == 0 means nothing was found.const r = await browser.inspectAtPosition(200, 300);
console.log(r.tagName, r.textContent);
// r.backendNodeId === 0 means nothing was found.The result carries everything you need to act on the element next:
BackendNodeId, FrameId, TagName, trimmed TextContent,
post-scroll IsVisible and the bounding rect.
Typical use cases:
- Stream-based UIs where the user clicks on the video feed and the script needs to translate the click into a real DOM target.
- Coordinate-driven recipes (canvas/captcha tile, HTML5 game)
where you want to verify what's actually there before firing a
Click(at(...)).
HighlightNode — the debug overlay
The visual companion to InspectAtPosition. Paints a coloured
overlay on top of the node identified by backendNodeId. The overlay
stays until the next call — pass a non-positive backendNodeId to
clear it.
r, _ := browser.InspectAtPosition(ctx, 400, 250)
_ = browser.HighlightNode(ctx, r.BackendNodeId, r.FrameId)
// ... screenshot or just watch the live stream ...
_ = browser.HighlightNode(ctx, 0, "") // clear the overlayconst r = await browser.inspectAtPosition(400, 250);
await browser.highlightNode(r.backendNodeId, r.frameId);
// ... screenshot or just watch the live stream ...
await browser.highlightNode(0); // clear the overlayPure debugging affordance — HighlightNode only paints visuals, it
doesn't change the page's behaviour. Use it freely in development; in
production scripts there's usually no reason to call it.
GetSelection — read what's highlighted
Walks every frame and returns the first non-empty text selection it
finds. Returns "" when nothing is selected anywhere. Useful for
"copy what the user highlighted" flows and for tests that exercise
selection-based UI (e.g. "did our right-click translate this text?").
sel, err := browser.GetSelection(ctx)
if err != nil {
log.Fatal(err)
}
if sel == "" {
fmt.Println("nothing selected")
} else {
fmt.Println("user selected:", sel)
}const sel = await browser.getSelection();
if (sel === "") {
console.log("nothing selected");
} else {
console.log("user selected:", sel);
}When to reach for which
| You want… | Use |
|---|---|
| A short summary an LLM can read | GetObservation |
| Every node and attribute | GetDOM |
| "Did the DOM change since I last looked?" | GetDOMHash (then GetDOM if it did) |
| The element under a specific pixel | InspectAtPosition |
| A visual marker on a node while debugging | HighlightNode |
| The text the user is currently highlighting | GetSelection |
| To wait for something to appear | Not these — use Wait |
Gotchas
- They don't wait. None of these methods polls for an element to
appear. If you call
GetObservationbefore the page has rendered, you get whatever was visible at that instant. UseWaitfirst for the anchor element you care about. - TS
getDOMreturns{ dom, hash }; Go returns just the DOM string. Cosmetic wrapper difference only — in TS thehashfield is reserved for future use and is not populated bygetDOM. CallgetDOMHashexplicitly in both SDKs. - OOPIFs stop the DOM tree.
GetDOMinlines same-origin frames but stops at cross-origin ones. Recurse by callingGetDOMagain with the OOPIF'sframeId(look it up viaGetPages— covered in Frames & iframes). InspectAtPositionreturnsBackendNodeId == 0for misses. Treat that as "nothing there" rather than an error.HighlightNodeis sticky. The overlay persists until you call again — make sure your cleanup path clears it (backendNodeId = 0) or your screenshots will keep showing stale highlights.
- Evaluation — when you need to run JavaScript instead of (or after) reading the structure.
- Waiting — the explicit pause to put in front of any read that depends on a specific element.
- API reference: Go DOM helpers · TS DOM helpers.