Documentation

Reading the page

Sometimes a script needs to look at the page rather than change it — to feed an LLM the current state, diff a layout between two snapshots, figure out what's under a coordinate, or read back what the user highlighted. WRC ships a small set of read-only methods for exactly that. They all return data and never poke the page.

TL;DR
  • GetObservation is the easy on-ramp: a compact, agent-friendly summary of what's visible right now.
  • GetDOM is the full CDP DOM tree as JSON — use when you need every node, not when you need a quick overview.
  • GetDOMHash is a 16-char fingerprint of the DOM tree — pair it with GetDOM to skip unchanged snapshots.
  • InspectAtPosition, HighlightNode and GetSelection cover the live-UI cases: hit-test, debug overlay, copy what's selected.
  • Reading methods don't wait. Pair them with Wait whenever you depend on something specific being there.

GetObservation — the agent-friendly summary

The first thing to reach for, especially in agent/LLM loops. The server filters the page down to elements that are actually visible and interactable, then hands back two views of that same data:

  • Text — a compact human-readable rendering, ideal for stuffing straight into a prompt.
  • Json — the same data structured (frames, elements, counts, limits), ideal for code that needs to walk it.
obs, err := browser.GetObservation(ctx, 0, 0)   // 0/0 = server defaults
if err != nil {
  log.Fatal(err)
}
fmt.Println(obs.Text)        // for the LLM
_ = obs.Json                 // structured payload

Two knobs cap the size of the output so big pages don't blow up your prompt budget:

  • maxElementsPerFrame — hard cap on how many elements the server emits per frame. Default 500.
  • maxTextLength — per-element text content cap, in characters. Default 240.
// Keep it tight for a chat model.
obs, _ := browser.GetObservation(ctx, 200, 80)

Every element in the observation carries a backendNodeId and the frameId it lives in — exactly what you need to act on it afterwards with Node(...):

// Parse the JSON, pick the element your agent decided on,
// then act on it directly — no second lookup required.
_, _ = browser.Click(ctx, wrc.Node(decision.BackendNodeId).InFrame(decision.FrameId))

GetDOM — the full CDP tree

When the observation isn't enough — you need every node, the nesting, the full attributes — switch to GetDOM. The payload is a JSON string in standard CDP DOM.Node shape, with same-origin <iframe> / <frame> / <object> children inlined into the same tree. Cross-origin (out-of-process) frames stop the tree; call GetDOM again with that frame's frameId to descend.

// Full tree of the main frame.
domJson, err := browser.GetDOM(ctx, "", -1)
if err != nil {
  log.Fatal(err)
}
// domJson is a CDP DOM.Node tree — feed it into anything that speaks CDP.

The return shape differs across the two SDKs in one cosmetic way: Go gives you the JSON string directly, TypeScript wraps it in a { dom, hash } object where hash is reserved for future use and is not populated by this call. Either way, when you want the fingerprint, call GetDOMHash.

Two parameters tune the call:

  • frameId (Go: positional, TS: positional) — empty / omitted targets the main frame. Use a specific frame's id to descend into an OOPIF.
  • depth (Go: positional, TS: opts.depth) — -1 for the full tree, 0 for the root only, N for the root plus N descendant levels.
// Just the top two levels — cheap probe before pulling the whole thing.
shallow, _ := browser.GetDOM(ctx, "", 2)

GetDOMHash — the change detector

GetDOMHash returns the first 8 bytes of sha256(dom) as a 16-char hex string. Computing the hash on the server is dramatically cheaper than transferring the full tree, which makes it the right primitive for "did anything change since I last looked?" polling loops.

var lastHash string
for {
  hash, err := browser.GetDOMHash(ctx, "")
  if err != nil {
      return err
  }
  if hash != lastHash {
      lastHash = hash
      domJson, _ := browser.GetDOM(ctx, "", -1)
      process(domJson)
  }
  time.Sleep(500 * time.Millisecond)
}

Don't reach for this as a wait substitute. If you're trying to wait for "the page to stop changing", use Wait with a CSS or JS condition for the actual element you care about — see Waiting.

InspectAtPosition — what's under (x, y)?

A hit-test at a viewport-relative pixel coordinate, returning the topmost element under that point. Elements with pointer-events: none are skipped, so the result is the actual click target — not the visually-topmost node. This is what the live-UI hover overlay calls under the hood.

res, err := browser.InspectAtPosition(ctx, 200, 300)
if err != nil {
  log.Fatal(err)
}
fmt.Println(res.TagName, res.TextContent)
// res.BackendNodeId == 0 means nothing was found.

The result carries everything you need to act on the element next: BackendNodeId, FrameId, TagName, trimmed TextContent, post-scroll IsVisible and the bounding rect.

Typical use cases:

  • Stream-based UIs where the user clicks on the video feed and the script needs to translate the click into a real DOM target.
  • Coordinate-driven recipes (canvas/captcha tile, HTML5 game) where you want to verify what's actually there before firing a Click(at(...)).

HighlightNode — the debug overlay

The visual companion to InspectAtPosition. Paints a coloured overlay on top of the node identified by backendNodeId. The overlay stays until the next call — pass a non-positive backendNodeId to clear it.

r, _ := browser.InspectAtPosition(ctx, 400, 250)
_ = browser.HighlightNode(ctx, r.BackendNodeId, r.FrameId)
// ... screenshot or just watch the live stream ...
_ = browser.HighlightNode(ctx, 0, "")             // clear the overlay

Pure debugging affordance — HighlightNode only paints visuals, it doesn't change the page's behaviour. Use it freely in development; in production scripts there's usually no reason to call it.

GetSelection — read what's highlighted

Walks every frame and returns the first non-empty text selection it finds. Returns "" when nothing is selected anywhere. Useful for "copy what the user highlighted" flows and for tests that exercise selection-based UI (e.g. "did our right-click translate this text?").

sel, err := browser.GetSelection(ctx)
if err != nil {
  log.Fatal(err)
}
if sel == "" {
  fmt.Println("nothing selected")
} else {
  fmt.Println("user selected:", sel)
}

When to reach for which

You want…Use
A short summary an LLM can readGetObservation
Every node and attributeGetDOM
"Did the DOM change since I last looked?"GetDOMHash (then GetDOM if it did)
The element under a specific pixelInspectAtPosition
A visual marker on a node while debuggingHighlightNode
The text the user is currently highlightingGetSelection
To wait for something to appearNot these — use Wait

Gotchas

  • They don't wait. None of these methods polls for an element to appear. If you call GetObservation before the page has rendered, you get whatever was visible at that instant. Use Wait first for the anchor element you care about.
  • TS getDOM returns { dom, hash }; Go returns just the DOM string. Cosmetic wrapper difference only — in TS the hash field is reserved for future use and is not populated by getDOM. Call getDOMHash explicitly in both SDKs.
  • OOPIFs stop the DOM tree. GetDOM inlines same-origin frames but stops at cross-origin ones. Recurse by calling GetDOM again with the OOPIF's frameId (look it up via GetPages — covered in Frames & iframes).
  • InspectAtPosition returns BackendNodeId == 0 for misses. Treat that as "nothing there" rather than an error.
  • HighlightNode is sticky. The overlay persists until you call again — make sure your cleanup path clears it (backendNodeId = 0) or your screenshots will keep showing stale highlights.
See also
  • Evaluation — when you need to run JavaScript instead of (or after) reading the structure.
  • Waiting — the explicit pause to put in front of any read that depends on a specific element.
  • API reference: Go DOM helpers · TS DOM helpers.