Documentation

Interaction

Interaction is where a script stops describing the page and starts changing it. WRC's action surface is deliberately small — eight verbs do everything from "click that button" to "drag a slider 120 px right". The point of this guide is to give you a feel for which verb fits what, what each one does for you behind the scenes, and the handful of options that come up often enough to be worth memorising.

TL;DR
  • Every action takes a Locator (see Targeting elements) and dispatches exactly once — there is no implicit retry.
  • Pre-flight: the server scrolls the element into view, then animates the cursor along a human-like path before the actual gesture fires.
  • Go splits the bare and the customised variant into two methods (Click / ClickWith). TypeScript keeps one method and an optional opts argument.
  • Three keyboard-level escape hatches sit alongside the element verbs: InsertText (paste at caret), PressKey / ReleaseKey (raw down/up events).

The shared shape

Every element action follows the same pattern, so once you know one you know all of them:

  1. You pass a Locator — built with CSS(...) / JS(...) / Node(...) / At(...). The locator's modifiers (.InFrame(...), .InAllFrames()) are honoured.
  2. The server resolves the element, scrolls it into view if needed (walking nested scroll containers and out-of-process iframe chains), then animates the mouse cursor along a Perlin-noise path to a random point inside the element's bounding rect.
  3. The gesture (click, fill, drag…) fires once. No retry, no implicit wait — if the element wasn't there, you get ELEMENT_NOT_FOUND straight away.
  4. You get back an ElementResult (or a DragResult / a SelectOptionResult) carrying the resolved frameId, backendNodeId, post-scroll isVisible, the element's bounds and the root-viewport coordinates (rootX / rootY) where the gesture actually landed.

The naming pattern differs across the two SDKs. Whenever an action has options, Go ships two methods — the bare one and a …With(opts) variant — so you don't pay for an empty struct in the common case. TypeScript keeps the single method and tucks the options behind a trailing optional argument.

// Bare call — no options.
_, _ = browser.Click(ctx, wrc.CSS("button.submit"))

// Customised call — same target, right double-click.
_, _ = browser.ClickWith(ctx, wrc.CSS("li.menu"), wrc.ClickOpts{
  Button:     "right",
  ClickCount: 2,
})

Click

The workhorse. By default: scroll into view, hand-shaped mouse path, full mouseDown + mouseUp at a randomised point inside the element's bounding rect.

ClickOpts covers the three things you might want to vary:

  • Button"left" (default), "right", "middle".
  • ClickCount1 (default) or 2 for a double-click.
  • Action"click" (default, full press + release), "press" (mouseDown only), "release" (mouseUp only — no mouse movement, fires at the current cursor position).

The press / release pair is the escape hatch for long-press gestures, custom drag implementations, or any flow where you need to hold the button down across other actions:

// Long-press: down, hold while doing something else, up.
_, _ = browser.ClickWith(ctx, wrc.CSS(".thumb"), wrc.ClickOpts{Action: "press"})
// ... do other things while the button is held ...
_, _ = browser.ClickWith(ctx, wrc.At(0, 0), wrc.ClickOpts{Action: "release"})

Note that release ignores any coordinates you pass and fires at wherever the cursor currently is — exactly the behaviour you need to close out a press started elsewhere on the page.

Click is the one action that also accepts At(x, y) (along with MoveTo). Coordinate clicks are how you reach canvas elements, captcha tiles, and HTML5-game targets that aren't addressable via the DOM.

Fill

Anywhere you'd type into an input: <input>, <textarea>, contenteditable divs. Under the hood: scroll into view, mouse path, click to focus, optionally clear (Ctrl+A then Delete), then character-by-character key events with QWERTZ keyboard simulation and human-like timing.

The single option worth knowing is NoClear / noClear. The default is to clear, which is what you want 95 % of the time — append-only is the surprise:

// Default: clear the field first, then type.
_, _ = browser.Fill(ctx, wrc.CSS("input[name=email]"), "user@example.com")

// Keep the existing value, append to it.
_, _ = browser.FillWith(ctx, wrc.CSS("textarea"), " — appended", wrc.FillOpts{
  NoClear: true,
})

Fill only accepts element locators — At(x, y) is rejected because typing needs an actual focus target.

Because every keystroke goes through a real keydown / keypress / keyup cycle, any JS handler the page has on the input (autocomplete, live validation, inline search…) fires exactly as if a person typed the value. That's the whole point — but it also means Fill is the slowest action in WRC. For pasting large blocks of text without firing per-key events, jump straight to InsertText (further down).

MoveTo

Move the mouse cursor over an element (or to coordinates) without clicking. The same scroll-into-view + Perlin-noise animation as Click, just stopping at the hover.

Useful for triggering CSS / JS hover states (dropdown menus, tooltips), warming up a page that only reveals an action on mouseover, or staging a cursor before a Click(at(...)):

// Hover to reveal a dropdown.
_, _ = browser.MoveTo(ctx, wrc.CSS("nav .menu-trigger"))
_, _ = browser.Wait(ctx, wrc.CSS("nav .submenu"))
_, _ = browser.Click(ctx, wrc.CSS("nav .submenu li:first-child"))

Like Click, MoveTo also accepts At(x, y).

ScrollTo

Bring an element into the viewport without clicking it. WRC actions already scroll into view as part of their pre-flight, so ScrollTo exists for the two cases that don't:

  • You want the element visible for a GetDOM / GetObservation / Evaluate pass that won't trigger a scroll on its own.
  • You're chaining multiple At(x, y) coordinate clicks and need to position the page relative to known coordinates first.
_, _ = browser.ScrollTo(ctx, wrc.CSS("#footer"))
// Footer is now in view — go inspect, evaluate, or click coordinates.

The server walks nested scroll containers (the inner div that actually scrolls inside a paginated UI) and out-of-process iframe chains (so an element deep inside an OOPIF still gets brought into the root viewport) automatically. You don't have to think about it.

At(x, y) is rejected here — scrolling needs an actual element to target.

Drag

Two variants depending on whether you want to drop relative to the pickup (slider handles, range inputs) or at a fixed page position (reordering cards, file-tree nodes).

// Drag by an offset — slider 120 px to the right.
_, _ = browser.DragBy(ctx, wrc.CSS(".slider .handle"), 120, 0)

// Drag to absolute root-viewport coordinates — reorder a card.
_, _ = browser.DragTo(ctx, wrc.CSS(".card.draggable"), 800, 400)

Both variants go through the same gesture: mouse-move to the pickup point inside the element, mouseDown, drag along a Perlin-noise path to the target, mouseUp. The DragResult carries both endpoints — startX / startY for the pickup, endX / endY for the drop — so you can verify where the gesture actually started and ended.

Drag only accepts element locators. To drag between specific coordinate pairs, fall back to two Click(At(...)) calls in press / release mode.

Select (for <select> elements)

Three variants depending on what you know about the option to pick. Same <select> Locator, different option key:

// Zero-based index.
_, _ = browser.SelectByIndex(ctx, wrc.CSS("select#country"), 2)

// Match the option's value attribute.
_, _ = browser.SelectByValue(ctx, wrc.CSS("select#country"), "DE")

// Match the option's visible text (trimmed, exact match).
_, _ = browser.SelectByText(ctx, wrc.CSS("select#country"), "Germany")

Same pre-flight as every other action: the <select> is scrolled into view and the cursor animates over to it along a human-like path — picking an option goes through the same motions a real user would. Once the option is chosen, the standard input and change events fire so the page's listeners run exactly as if a person had picked it.

Opt out of the events for the (rare) case where you want to mutate state silently — note that Go's flag is NoEvents=true (negative polarity), while TS uses fireEvents=false:

// Silent — no input/change events.
_, _ = browser.SelectByIndexWith(ctx, wrc.CSS("select#hidden"), 0, wrc.SelectOpts{
  NoEvents: true,
})

All three variants return a SelectOptionResult with the resolved SelectedIndex, SelectedValue and SelectedText — useful as a sanity check that the right option was actually chosen.

Keyboard fallbacks

For the cases that don't fit into a single element action, three keyboard-level primitives target whatever currently has focus:

InsertText — fast paste

Commits the entire string at once via the browser's IME path. No per-character key events fire — which is what you want for big text blocks where you don't need to trigger autocomplete or live validation:

// Focus the field first…
_, _ = browser.Click(ctx, wrc.CSS("textarea.bio"))
// …then paste the whole block in one shot.
_ = browser.InsertText(ctx, "Lorem ipsum dolor sit amet, …")

Because no keyboard events fire, anything the page listens for on keydown / keyup will not see this input. That's the trade-off: fast and silent, or slow and observable.

PressKey / ReleaseKey — raw key events

For keyboard shortcuts (Ctrl+A, Ctrl+S, Enter to submit) and any flow where you need a key event to be the actual signal. Both fire a single event — pair them for a full press cycle.

The modifier bitmask is Alt=1, Ctrl=2, Meta=4, Shift=8. Combine with | for multi-modifier shortcuts.

// Ctrl+A on the focused element.
_ = browser.PressKey(ctx, "a", "KeyA", 2, 0)
_ = browser.ReleaseKey(ctx, "a", "KeyA", 2, 0)

// Just an Enter — to submit a form without clicking the button.
_ = browser.PressKey(ctx, "Enter", "Enter", 0, 0)
_ = browser.ReleaseKey(ctx, "Enter", "Enter", 0, 0)

The key value follows the DOM KeyboardEvent.key standard ("Enter", "ArrowLeft", "a", …). code follows KeyboardEvent.code ("Enter", "KeyA", "ArrowLeft", …) and falls back to key when omitted.

Auto-behaviours, at a glance

A quick recap of what the server does for you so you don't have to:

ActionScrolls into viewAnimates cursorFires events
Click, Fillyesyes (human-like path)full DOM event stream
MoveToyesyes (human-like path)mouseover, mouseenter
Drag*yes (pickup)yes (pickup → drop)mousedown / mousemove / mouseup
ScrollToyes (the whole point)scroll events on the container
Select*yesyes (human-like path)input + change (unless suppressed)
InsertTextinput (no key events)
PressKey / ReleaseKeykeydown / keyup

If you find yourself reaching for a manual ScrollTo before every Click, you don't need to — Click already does it.

Gotchas

  • Actions don't wait. Pair every action with a Wait for the element you're about to touch. See Waiting for the full discussion.
  • Validation is client-side, not server-side. Drag / Fill / Select* / ScrollTo reject At(x, y) before the request ever leaves your machine — you get an immediate, descriptive error rather than a confusing server-side failure.
  • Polarity surprises on the option flags. Go uses negative polarity throughout (NoClear, NoEventstrue opts out). TS splits: noClear is negative (true opts out), but fireEvents is positive (false opts out). The defaults are identical across both SDKs — only the spelling differs.
  • InsertText and PressKey need focus. Neither targets an element — they go to whatever currently has focus on the page. Do a Click first if you need a specific input to receive them.
  • release ignores its target. A Click with Action: "release" fires at the current cursor position regardless of the locator you pass — that's the whole point, so a long-press started elsewhere can be closed cleanly.
See also