Skip to main content

crawl

Creates, updates, deletes, gets or lists a crawl resource.

Overview

Namecrawl
TypeResource
Idcloudflare.browser_rendering.crawl

Fields

The following fields are returned by SELECT queries:

Returns the result of a crawl job.

NameDatatypeDescription
idstringCrawl job ID.
browserSecondsUsednumberTotal seconds spent in browser so far.
cursorstringCursor for pagination.
finishednumberTotal number of URLs that have been crawled so far.
recordsarrayList of crawl job records.
skippednumberTotal number of URLs that were skipped due to include/exclude/subdomain filters. Skipped URLs are included in records but are not counted toward total/finished.
statusstringCurrent crawl job status.
totalnumberTotal current number of URLs in the crawl job.

Methods

The following methods are available for this resource:

NameAccessible byRequired ParamsOptional ParamsDescription
getselectaccount_id, job_idcacheTTL, status, cursor, limitReturns the result of a crawl job.
createinsertaccount_id, urlcacheTTLStarts a crawl job for the provided URL and its children. Check available options like gotoOptions and waitFor* to control page load behaviour.
deletedeleteaccount_id, job_idCancels an ongoing crawl job by setting its status to cancelled and stopping all queued URLs.

Parameters

Parameters can be passed in the WHERE clause of a query. Check the Methods section to see which parameters are required or optional for each operation.

NameDatatypeDescription
account_idstringThe Cloudflare account ID.
job_idstringThe job ID.
cacheTTLnumberCache TTL default is 5s. Set to 0 to disable.
cursornumberCursor for pagination.
limitnumberLimit for pagination.
statusstringFilter by URL status.

SELECT examples

Returns the result of a crawl job.

SELECT
id,
browserSecondsUsed,
cursor,
finished,
records,
skipped,
status,
total
FROM cloudflare.browser_rendering.crawl
WHERE account_id = '{{ account_id }}' -- required
AND job_id = '{{ job_id }}' -- required
AND cacheTTL = '{{ cacheTTL }}'
AND status = '{{ status }}'
AND cursor = '{{ cursor }}'
AND limit = '{{ limit }}'
;

INSERT examples

Starts a crawl job for the provided URL and its children. Check available options like gotoOptions and waitFor* to control page load behaviour.

INSERT INTO cloudflare.browser_rendering.crawl (
actionTimeout,
addScriptTag,
addStyleTag,
allowRequestPattern,
allowResourceTypes,
authenticate,
bestAttempt,
cookies,
crawlPurposes,
depth,
emulateMediaType,
formats,
gotoOptions,
jsonOptions,
limit,
maxAge,
modifiedSince,
options,
rejectRequestPattern,
rejectResourceTypes,
render,
setExtraHTTPHeaders,
setJavaScriptEnabled,
source,
url,
viewport,
waitForSelector,
waitForTimeout,
account_id,
cacheTTL
)
SELECT
{{ actionTimeout }},
'{{ addScriptTag }}',
'{{ addStyleTag }}',
'{{ allowRequestPattern }}',
'{{ allowResourceTypes }}',
'{{ authenticate }}',
{{ bestAttempt }},
'{{ cookies }}',
'{{ crawlPurposes }}',
{{ depth }},
'{{ emulateMediaType }}',
'{{ formats }}',
'{{ gotoOptions }}',
'{{ jsonOptions }}',
{{ limit }},
{{ maxAge }},
{{ modifiedSince }},
'{{ options }}',
'{{ rejectRequestPattern }}',
'{{ rejectResourceTypes }}',
{{ render }},
'{{ setExtraHTTPHeaders }}',
{{ setJavaScriptEnabled }},
'{{ source }}',
'{{ url }}' /* required */,
'{{ viewport }}',
'{{ waitForSelector }}',
{{ waitForTimeout }},
'{{ account_id }}',
'{{ cacheTTL }}'
RETURNING
errors,
result,
success
;

DELETE examples

Cancels an ongoing crawl job by setting its status to cancelled and stopping all queued URLs.

DELETE FROM cloudflare.browser_rendering.crawl
WHERE account_id = '{{ account_id }}' --required
AND job_id = '{{ job_id }}' --required
;