Selenium
Selenium is a browser automation framework based on the WebDriver protocol. By using Remote WebDriver, it can control real browsers running remotely.
Positioning
Section titled “Positioning”Selenium is a browser automation framework based on the WebDriver protocol with:
- Control of real Chromium browsers
- Page loading and JavaScript execution
- DOM querying and basic event simulation
- Support for connecting to remote fingerprint browser clusters
Selenium does not simulate browser HTTP requests. It drives a real browser to execute actual page logic via the WebDriver protocol.
Connecting to Remote Fingerprint Browser
Section titled “Connecting to Remote Fingerprint Browser”import osfrom selenium import webdriverfrom selenium.webdriver.support.ui import WebDriverWait
# Get browser authauth = os.environ.get("PROXY_AUTH")
# WebDriver endpointchrome_http = os.environ.get("ChromeHttp") or "chrome-http-inner.coreclaw.com"browser_url = f'http://{auth}@{chrome_http}'
# Configure Chrome optionschrome_options = webdriver.ChromeOptions()chrome_options.add_argument('--no-sandbox')chrome_options.add_argument('--disable-dev-shm-usage')chrome_options.add_argument('--window-size=1920,1080')
# Connect to remote browserdriver = webdriver.Remote( command_executor=browser_url, options=chrome_options)
# Navigate to pagedriver.get(url)WebDriverWait(driver, 180).until( lambda d: d.execute_script("return document.readyState") == "complete")html = driver.page_sourceComplete Example
Section titled “Complete Example”import osfrom selenium import webdriverfrom selenium.webdriver.support.ui import WebDriverWaitfrom sdk import CoreSDK
def run(): CoreSDK.Log.info("Starting Selenium demo...")
# Define output headers headers = [ {"label": "url", "key": "url", "format": "text"}, {"label": "html", "key": "html", "format": "text"}, {"label": "resp_status", "key": "resp_status", "format": "text"}, ] CoreSDK.Result.set_table_header(headers)
# Get input parameters input_json = CoreSDK.Parameter.get_input_json_dict() url = input_json['url']
# Get browser auth auth = os.environ.get("PROXY_AUTH") chrome_http = os.environ.get("ChromeHttp") or "chrome-http-inner.coreclaw.com" browser_url = f'http://{auth}@{chrome_http}'
result = {"url": url, "html": "", "resp_status": "200"}
# Configure options chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--no-sandbox') chrome_options.add_argument('--disable-dev-shm-usage')
try: driver = webdriver.Remote( command_executor=browser_url, options=chrome_options ) driver.get(url) WebDriverWait(driver, 180).until( lambda d: d.execute_script("return document.readyState") == "complete" ) result["html"] = driver.page_source except Exception as e: CoreSDK.Log.error(f"Failed: {e}") result['resp_status'] = "500"
CoreSDK.Result.push_data(result)
if __name__ == "__main__": run()DOM Operations
Section titled “DOM Operations”Single Element
Section titled “Single Element”from selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as EC
# CSS selectors (recommended)element = driver.find_element(By.CSS_SELECTOR, '.product-title')element = driver.find_element(By.ID, 'main-content')
# XPathelement = driver.find_element(By.XPATH, '//div[@class="container"]')
# Wait for elementelement = WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.CSS_SELECTOR, '.product-title')))
# Get propertiestext = element.texthtml = element.get_attribute('outerHTML')Batch Elements
Section titled “Batch Elements”# Get all matching elementsitems = driver.find_elements(By.CSS_SELECTOR, '.product-item')
# Iterateproducts = []for item in items: try: name = item.find_element(By.CSS_SELECTOR, '.name').text price = item.find_element(By.CSS_SELECTOR, '.price').text products.append({'name': name, 'price': price}) except: pass
# JavaScript-based extraction (higher performance)products = driver.execute_script(''' const items = document.querySelectorAll('.product-item'); return Array.from(items).map(item => ({ name: item.querySelector('.name')?.textContent.trim(), price: item.querySelector('.price')?.textContent.trim() }));''')Anti-Patterns
Section titled “Anti-Patterns”❌ Don’t use sleep to wait:
time.sleep(5) # Unreliable❌ Don’t use requests to simulate browser:
requests.get(url) # Incomplete content, easily detected