Node.js Example
Learn how to build a Worker using Node.js.
GitHub Repository
Section titled “GitHub Repository”Node.js Script Demo Repository: Node-Worker-Demo
Required Files (Project Root Directory)
Section titled “Required Files (Project Root Directory)”├── main.js # Script entry file├── package.json # Node.js dependencies├── input_schema.json # Input form configuration├── output_schema.json # Output table configuration├── sdk.js # CoreClaw SDK - Core functionality├── sdk_pb.js # Data processing module└── sdk_grpc_pb.js # Network communication moduleFile Overview
Section titled “File Overview”| File | Description |
|---|---|
| main.js | Script entry file (execution entry), must be named main |
| package.json | Node.js dependency management file |
| input_schema.json | UI input form configuration file |
| output_schema.json | Output table structure configuration file |
| sdk.js | Core SDK functionality |
| sdk_pb.js | Enhanced data processing module |
| sdk_grpc_pb.js | Network communication module |
These three SDK files (sdk.js, sdk_pb.js, sdk_grpc_pb.js) are required and must be placed in the root directory of the project. Together they form the script’s toolbox, providing all essential capabilities for Worker execution and interaction with the platform backend.
Core SDK Usage
Section titled “Core SDK Usage”The CoreClaw SDK (coresdk) provides three core capabilities that every Worker needs:
1. Parameter Retrieval — Get Input Configuration
Section titled “1. Parameter Retrieval — Get Input Configuration”When a Worker starts, the platform passes input parameters (such as URLs, keywords, etc.). Use the following method to retrieve them:
const coresdk = require('./sdk')
// Get all input parameters as a JSON objectconst inputJson = await coresdk.parameter.getInputJSONObject()
// Example: retrieve a specific parameterconst url = inputJson?.urlUse case: Pass different parameters for different tasks without modifying code.
2. Logging — Record Execution Progress
Section titled “2. Logging — Record Execution Progress”Record different levels of log messages during execution. These logs appear in the Console, making it easy to monitor status and debug issues:
// Debug info (most detailed, for troubleshooting)await coresdk.log.debug("Connecting to target website...")
// General info (normal process recording)await coresdk.log.info("Successfully retrieved 10 data items")
// Warning (notable but non-error situations)await coresdk.log.warn("Slow network connection, may affect speed")
// Error (execution failures)await coresdk.log.error("Cannot access target website")Log levels:
- debug — Most detailed, ideal for development
- info — Normal process recording, recommended for key steps
- warn — Warning, indicates potential issues
- error — Error, requires attention
3. Result Output — Push Data Back to Platform
Section titled “3. Result Output — Push Data Back to Platform”After collecting data, push it back to the platform in two steps:
Step 1: Set Table Headers
Section titled “Step 1: Set Table Headers”Define the table structure before pushing data, similar to defining column headers in a spreadsheet:
const headers = [ { label: "Title", key: "title", format: "text" }, { label: "URL", key: "url", format: "text" }, { label: "Category", key: "category", format: "text" },]await coresdk.result.setTableHeader(headers)Field descriptions:
- label — Column title displayed to users
- key — Unique identifier used in code (match with pushData keys)
- format — Data type:
"text","integer","boolean","array","object"
Step 2: Push Data Row by Row
Section titled “Step 2: Push Data Row by Row”Push each collected data item individually:
for (const item of collectedData) { const obj = { title: item.title, url: item.url, category: item.category, } await coresdk.result.pushData(obj)}Important:
- Set table headers before pushing data
- Keys in pushData must match keys in table headers exactly
- Data must be pushed one row at a time
- Add logging after each push to track progress
Step 3: Upsert Data (Update or Insert)
Section titled “Step 3: Upsert Data (Update or Insert)”Use upsertData to update existing records or insert new ones based on a unique key. This is useful when you need to re-scrape and update previously collected data:
const data = { id: "test-1", title: "Updated Title", description: "Updated description",}await coresdk.result.upsertData(data, 'id')How it works:
- If a record with the same unique key exists, it will be updated
- If no matching record is found, a new record will be inserted
- The unique key must exist in the data object
- Important: The unique key field must also be defined in
output_schema.json, or the platform cannot match and update rows correctly
Script Entry File (main.js)
Section titled “Script Entry File (main.js)”Complete Example
Section titled “Complete Example”#!/usr/bin/env node'use strict'
const coresdk = require('./sdk')
async function run() { try { // 1. Get input parameters const inputJson = await coresdk.parameter.getInputJSONObject() await coresdk.log.debug(`Input parameters: ${JSON.stringify(inputJson)}`)
// 2. Proxy configuration (read from environment variables) const proxyAuth = process.env.PROXY_AUTH || null await coresdk.log.info(`Proxy auth: ${proxyAuth}`)
// 3. Business logic const url = inputJson?.url await coresdk.log.info(`Processing URL: ${url}`)
const result = { url, status: 'success', }
// 4. Set table headers const headers = [ { label: 'URL', key: 'url', format: 'text' }, { label: 'Status', key: 'status', format: 'text' }, ] await coresdk.result.setTableHeader(headers)
// 5. Push result data await coresdk.result.pushData(result)
await coresdk.log.info('Script execution completed') } catch (err) { await coresdk.log.error(`Execution error: ${err.message}`)
await coresdk.result.pushData({ error: err.message, error_code: '500', status: 'failed', }) throw err }}
run()How It Works
Section titled “How It Works”The script follows four stages:
- Receive instructions — Get input parameters (URLs, keywords, etc.) from the platform
- Network setup — Configure proxy via
PROXY_AUTHenvironment variable for accessing external websites - Execute task — Run the core scraping logic on target pages
- Report results — Set table headers first, then push collected data back to the platform
Node.js Dependency Management (package.json)
Section titled “Node.js Dependency Management (package.json)”This file declares all Node.js dependencies required to run the script. The platform automatically installs all dependencies specified in this file.
Example
Section titled “Example”{ "name": "node", "version": "1.0.0", "main": "main.js", "type": "commonjs", "dependencies": { "@grpc/grpc-js": "^1.13.4", "google-protobuf": "^4.0.0" }}Important Notes
Section titled “Important Notes”Required Dependencies
Section titled “Required Dependencies”- @grpc/grpc-js and google-protobuf are required (needed by the SDK)
- All third-party libraries must be listed in
dependencies
Versioning
Section titled “Versioning”- Use fixed versions (e.g.
"1.13.4") for core dependencies to ensure stability - Use caret ranges (e.g.
"^1.13.4") for compatible updates
Installation
Section titled “Installation”- Dependencies are installed automatically by the platform
- The
typefield should be set to"commonjs"(the SDK uses CommonJS modules) - The
mainfield must point to your entry file (main.js)
Q: Why must I use CommonJS?
A: The CoreClaw SDK uses CommonJS (require) module format. If you use ES modules (import), the SDK will not load correctly.
Q: How do I add new dependencies?
A: Add the package to the dependencies field in package.json and re-upload the ZIP package. The platform will install them on the next run.
Q: What if installation fails? A: Check that the package name and version are correct. Verify network connectivity or try an alternative version.
Q: What is the difference between pushData and upsertData?
A: pushData always appends a new row. upsertData updates an existing row if the unique key matches, or inserts a new row if no match is found. Use upsertData when you need to update previously scraped data.