Glossary

Key terms in E2E testing, test automation, and compliance.

Flaky Test

A flaky test is a test that alternates between passing and failing without any code changes. The same test, same code, same environment - different result each run.

End-to-end (E2E) testing verifies that an application works correctly from the user’s perspective by simulating real user flows through the full stack - UI, API, database, and third-party integrations.

Self-Healing Test

A self-healing test automatically adapts when the application's UI changes, rather than breaking because a CSS selector or element attribute was modified. The concept has been central to AI testing marketing since 2019, but real-world reliability remains disputed. Self-healing tools work by recording multiple attributes of each UI element (CSS selector, XPath, text content, aria labels, relative position) and using ML models to find the "best match" when the primary selector breaks. Tools like Mabl, Testim (Tricentis), and Healenium all use variations of this approach. The promise: tests that never break on UI changes. The reality is more nuanced. Self-healing works well for minor changes (a CSS class rename, a small DOM restructure). It struggles with major redesigns, component replacements, or flow changes where the original element no longer exists in any recognizable form. The fundamental question developers raise: if the AI silently finds a different element and passes the test, how do you know it validated the right thing?

Test Flakiness

Test flakiness is the rate at which tests produce inconsistent results - passing sometimes and failing others without code changes. It’s measured as the percentage of test transitions from pass to fail that are non-deterministic.

Selector Brittleness

Selector brittleness is the tendency of CSS/XPath selectors used in automated tests to break when the UI changes, even when the application behavior is unchanged.

Visual Testing

Visual testing compares screenshots of an application’s UI against approved baselines to detect unintended visual changes - layout shifts, font changes, color regressions, or responsive breakpoint issues.

Test Maintenance

Test maintenance is the ongoing effort required to keep automated tests working as the application under test evolves - updating selectors, fixing flaky tests, refreshing test data, and adapting to new features.

PR Gating

PR gating (also called merge gating or branch protection) is the practice of requiring automated checks to pass before a pull request can be merged. If tests fail, the PR is blocked until the issue is resolved.

Diff-Aware Testing

Diff-aware testing is an approach where code changes in a pull request are analyzed to identify affected user flows and coverage gaps. In Zerocheck today, that analysis suggests tests for review while the PR check runs the active approved suite. The result is clearer coverage feedback without letting generated tests gate merge before approval.

Agentic Testing

Agentic testing is a testing approach where an AI agent autonomously generates, executes, and maintains end-to-end tests by interacting with an application the way a human would: visually, through rendered UI, using intent rather than code-level selectors. This is an architectural distinction, not a marketing label. Traditional test automation (Playwright, Cypress, Selenium) works by locating DOM elements via CSS selectors, XPath expressions, or data-testid attributes, then performing actions on those elements programmatically. Self-healing tools (Mabl, Testim, Healenium) sit one layer above: they still capture selectors, but use ML models to guess replacement selectors when the originals break. Agentic testing removes selectors from the equation entirely. The clearest way to understand the difference is the intent-based vs selector-based spectrum. On one end, a Playwright test says: click the element matching 'button.primary-cta[data-testid="submit-order"]'. If a designer renames that class or a developer removes the data-testid, the test breaks. A self-healing tool might recover by finding a nearby button with similar attributes, but it is still searching the DOM for selector matches. An agentic test says: click the Submit Order button. The agent renders the page, identifies the button visually (by its label, position, and context), and clicks it. No selector is ever created, stored, or repaired. This matters because the failure mode is fundamentally different. Selector-based tools fail when the DOM changes. Self-healing tools fail silently when they guess wrong, clicking the wrong element and reporting a pass. Agentic tools fail when they cannot confidently identify the intended element, which maps directly to what would confuse a real user. A button that moved from the header to a sidebar will still be found by an agent that reads the page. A button whose label changed from "Submit" to "Place Order" will trigger a confidence check, not a silent guess.

QA Agent

A QA agent is an AI system that autonomously handles testing tasks that traditionally required human QA engineers: writing tests, executing them, triaging failures, maintaining test suites, and reporting results. Unlike a testing tool that automates execution, a QA agent makes decisions about what to test, when to test it, and how to interpret results. The term emerged in 2025 as AI agent architectures matured beyond simple automation. Microsoft published an official "Automated QA testing agent" template. LambdaTest launched KaneAI as the "world's first AI QA agent." Momentic, Spur, and QA.tech all position their products using agent terminology. The distinction from traditional test automation is meaningful. A Playwright script executes a fixed sequence of steps. A QA agent receives a goal ("verify the checkout flow works after this PR"), decides which steps to take, adapts when the UI doesn't match expectations, and reports whether the goal was achieved with a confidence assessment.

Test Automation

Test automation is the practice of using software to execute tests, compare actual outcomes to expected outcomes, and report results without manual intervention. Instead of a human clicking through a web application to verify a login flow, a script drives the browser, fills in credentials, asserts that the dashboard loads, and logs the result to CI. The concept dates back to the early 2000s when Selenium WebDriver first allowed engineers to script browser interactions. For over a decade, Selenium was synonymous with test automation. Around 2020, Playwright (Microsoft) and Cypress emerged as modern alternatives with better developer ergonomics: auto-waiting, built-in assertions, trace viewers, and TypeScript support. Playwright now sees over 10 million npm downloads per month, making it the de facto standard for new projects. Test automation spans multiple levels: unit tests (testing individual functions), integration tests (testing component interactions), and end-to-end tests (testing full user flows through the stack). Each level has its own tools and trade-offs. Unit tests are fast and cheap but miss UI and integration bugs. E2E tests catch real user-facing issues but are slower and harder to maintain.

AI Testing

AI testing is the application of artificial intelligence to software test generation, execution, maintenance, and failure triage. Rather than manually scripting every assertion, AI models analyze application behavior, generate test cases, identify elements visually or semantically, and classify failures as real bugs or false alarms. Three architectures dominate the current landscape. Selector-healing tools (Mabl, Testim, Healenium) use ML to repair broken CSS/XPath selectors when the UI changes. Intent-based tools (testRigor, Zerocheck) accept plain-language test descriptions and interact with the application based on what elements look like and do, bypassing selectors entirely. Vision-based tools (Applitools Eyes, Momentic) use computer vision to detect visual regressions by comparing rendered screenshots against baselines. The term covers a wide spectrum: from simple AI-assisted code completion (Copilot generating a Playwright test from a comment) to fully autonomous QA agents that generate, run, and maintain entire test suites without human intervention.

No-Code Testing

No-code testing is an approach to software testing that does not require writing programming code. Testers create and maintain automated tests through visual interfaces, record-and-playback tools, or natural language descriptions instead of scripting in JavaScript, Python, or Java. The spectrum ranges from simple recorders (Selenium IDE, Playwright Codegen) that capture browser interactions and generate code behind the scenes, to drag-and-drop builders (Katalon, Leapwork) that represent test steps as visual blocks, to natural language tools (testRigor, Zerocheck) where tests are written as plain English sentences. No-code testing emerged to address a fundamental bottleneck: the State of Testing 2024 report found that 42% of testers do not feel comfortable writing automation scripts. Meanwhile, QA teams are shrinking relative to development teams, with ratios moving from 1:3 to 1:8 or even 1:15 at fast-moving startups. The math does not work when the only people who can create automated tests are senior SDET engineers who are in short supply.

Test Coverage

Test coverage measures the percentage of an application that is exercised by automated tests. It quantifies how much of your code, logic, or user flows have corresponding test verification. The higher the coverage, the more of your application is protected against regressions. Three types of coverage are commonly tracked. Line coverage measures the percentage of code lines executed during testing. Branch coverage measures the percentage of conditional branches (if/else, switch) that are exercised. Flow coverage (also called path coverage or user journey coverage) measures the percentage of end-to-end user flows that are tested from start to finish. Line and branch coverage are measured with instrumentation tools like Istanbul/nyc (JavaScript), JaCoCo (Java), and coverage.py (Python). Flow coverage is harder to measure because there is no standard tooling. Most teams track it manually with a spreadsheet of critical user flows and which ones have corresponding E2E tests.

CI/CD Testing

CI/CD testing is the practice of running automated tests as part of continuous integration and continuous deployment pipelines. Every code commit or pull request triggers a sequence of test stages, from unit tests (seconds) to integration tests (minutes) to E2E tests (minutes to tens of minutes), with each stage gating the next. In a typical setup, a developer pushes code to a branch, which triggers a CI workflow in GitHub Actions, GitLab CI, CircleCI, or Jenkins. The workflow installs dependencies, builds the application, runs unit tests, deploys to a staging environment, runs E2E tests, and reports results back to the PR. If any stage fails, the pipeline stops and the PR is blocked from merging. This pattern has become the standard for professional software teams. GitHub reports that 85% of repositories with more than 10 contributors use some form of CI. The shift from "run tests locally before pushing" to "CI runs tests automatically on every push" was one of the most impactful changes in software engineering practices over the past decade.

Regression Testing

Regression testing is the practice of re-running tests after code changes to verify that existing functionality still works correctly. The word "regression" means moving backward: a regression bug is a feature that previously worked but is now broken due to a new change. Regression testing happens at every level of the testing pyramid. Unit-level regression tests verify that individual functions still return expected outputs. Integration-level regression tests verify that APIs and services still communicate correctly. E2E regression tests verify that complete user flows still work from start to finish. E2E regression testing is the strongest regression safety net because it catches failures that span multiple components. A database migration that changes a column name might pass all unit tests but break the registration flow because the API layer was not updated to match. Only an E2E test that exercises the full registration path would catch this.

Smoke Testing

Smoke testing is a quick, surface-level validation that the most critical paths of an application work after a deployment or build. The term comes from hardware engineering: when you power on a new circuit board, you check if smoke comes out. If it does not smoke, you proceed to deeper testing. A smoke test suite typically covers 5 to 10 critical user flows: can users log in, can they access the main dashboard, does the primary feature load, does the checkout process initiate? These tests are not comprehensive. They do not check edge cases, error handling, or secondary features. Their purpose is to answer one question: is the application fundamentally working? Smoke tests are distinct from sanity tests and regression tests. Sanity tests verify that a specific fix or feature works as intended (narrowly scoped). Smoke tests verify that the overall application is not broken (broadly scoped but shallow). Regression tests verify that all existing functionality still works (broadly scoped and deep).

SOC 2 Compliance Testing

SOC 2 compliance testing is the practice of generating verifiable evidence that application changes are tested before reaching production, as required by SOC 2 Trust Services Criteria. Specifically, control CC7.2 (the entity monitors system components for anomalies) and CC8.1 (the entity authorizes, designs, develops, configures, documents, tests, approves, and implements changes) require documented proof that software changes undergo testing. SOC 2 Type II audits examine whether these controls operated effectively over a review period (typically 6 to 12 months). Auditors want to see timestamped evidence of test execution tied to specific code changes. A passing CI run linked to a commit hash, with test results and artifacts, satisfies CC8.1. A flaky test classification system that demonstrates anomaly detection satisfies aspects of CC7.2. Most SaaS companies pursuing SOC 2 use platforms like Vanta, Drata, or Secureframe to automate infrastructure-level evidence collection: access controls, encryption settings, vulnerability scanning. But application-level testing evidence is a gap. Vanta can prove your AWS configuration is compliant, but it cannot prove that your checkout flow was tested before the last release.

Shift-Left Testing

Shift-left testing is the practice of moving testing activities earlier in the software development lifecycle. Instead of testing after code is complete and deployed to a staging environment, teams test at the PR level, during development, or even during design. The term "shift left" refers to moving testing leftward on a timeline that reads from planning (left) to production (right). The concept gained traction as DevOps practices made deployment faster. When teams deployed quarterly, post-deployment QA made sense. When teams deploy daily, waiting until after deployment to test means bugs ship to production before they are caught. Shift-left testing closes this gap by catching bugs at the earliest possible stage. The most impactful shift-left practice is PR-level E2E testing: running end-to-end tests on every pull request before code is merged. This catches integration bugs, visual regressions, and flow breakages while the developer still has context on the change, rather than days or weeks later when a QA engineer discovers the issue.

Continuous Testing

Continuous testing is the practice of integrating automated tests into every stage of the CI/CD pipeline so that each code change receives immediate quality feedback. It goes beyond "run tests in CI" to mean testing at every commit, every PR, every merge, every deploy, and every production release, with results feeding back to the team in real time. The term is often confused with "running tests in CI," but the distinction matters. Running tests in CI means a test suite executes when triggered. Continuous testing means the testing process is woven into every phase of delivery with appropriate test types at each stage: static analysis at commit, unit tests at push, integration tests at PR, E2E tests at merge, smoke tests at deploy, and synthetic monitoring in production. Continuous testing requires infrastructure: reliable CI pipelines, fast test execution, test data management, environment provisioning, result aggregation, and feedback channels (PR comments, Slack alerts, dashboards). Teams that attempt continuous testing without this infrastructure end up with slow pipelines, flaky results, and alert fatigue.

Test Orchestration

Test orchestration is the coordination of test execution across environments, configurations, and parallelized runners to produce a single, coherent result. It covers the logistics of testing that go beyond writing and running individual tests: which tests run where, in what order, with what data, across how many parallel workers, and how results are aggregated. Orchestration becomes necessary as test suites grow. A 10-test suite can run sequentially on a single machine in 5 minutes. A 500-test suite running sequentially takes 4+ hours. Orchestration splits those 500 tests across 20 parallel workers, manages test data isolation between workers, handles retries for infrastructure failures, aggregates results into a single report, and determines the overall pass/fail status. Common orchestration tools include Buildkite (with its agent-based parallelism), CircleCI's test splitting, GitHub Actions' matrix strategy, and dedicated orchestration platforms like Currents.dev and Sorry Cypress (open-source Cypress parallelization). These tools handle the infrastructure layer but leave the test strategy (what to run, when, how to interpret results) to the team.

Cross-Browser Testing

Cross-browser testing is the practice of verifying that a web application works correctly across different browsers: Chrome, Firefox, Safari, and Edge. Each browser uses a different rendering engine (Chromium/Blink, Gecko, WebKit), which means the same HTML, CSS, and JavaScript can produce different results in each browser. The scope of cross-browser testing has narrowed significantly since the Internet Explorer era. IE's quirks mode and non-standard implementations required extensive browser-specific workarounds. Modern browsers have much higher standards compliance, but differences still exist. Safari's WebKit engine handles CSS grid, flexbox, and web APIs differently from Chromium in many edge cases. Firefox's Gecko engine has its own set of rendering quirks. Cross-browser testing covers functional testing (do buttons, forms, and navigation work?), visual testing (does the layout render correctly?), and performance testing (does the application load and respond within acceptable times?). Most teams prioritize Chrome (65% global browser share), then Safari (18%), Edge (5%), and Firefox (3%), allocating testing effort proportional to their user base's browser distribution.

Test Data Management

Test data management (TDM) is the practice of creating, maintaining, and cleaning up the data that automated tests depend on. Every E2E test needs some form of data: user accounts to log in with, products to add to a cart, orders to view in a dashboard, configuration flags that enable specific features. How that data is created, isolated, and removed after tests run determines whether your test suite is reliable or plagued by flakes. TDM covers three core activities. Data creation involves seeding the application's database with test-specific records before test execution. This can happen through API calls, database fixtures, factory functions, or application UI flows. Data isolation ensures that tests running in parallel do not interfere with each other by reading or modifying the same records. Data cleanup removes test-generated data after execution so it does not accumulate and cause side effects in future runs. The complexity of TDM scales with application complexity. A simple CRUD app might need a test user and a few records. A financial application might need accounts with specific balances, transaction histories, compliance flags, and multi-currency configurations, all in a specific state.

Codeless Test Automation

Codeless test automation is the practice of creating and maintaining automated tests without writing programming code. The term is used interchangeably with "no-code testing" but carries a slight enterprise connotation, often appearing in vendor positioning for mid-market and enterprise QA teams that have manual testers without coding skills. The codeless test automation market includes established players like Testsigma (open-source, NLP-based), Katalon (visual builder with scripting fallback), Leapwork (visual flow designer for enterprises), and Virtuoso (NLP + visual AI). Newer entrants include testRigor (plain English specs), Zerocheck (plain English with visual interaction), and Momentic (vision-based AI). The tools differ primarily in how they abstract away code. Recorder-based tools (Katalon Recorder, Selenium IDE) capture browser interactions and generate scripts. Visual builders (Leapwork, Tosca) let testers construct flows by dragging and connecting blocks. NLP-based tools (testRigor, Testsigma, Zerocheck) accept test descriptions in natural language and handle the execution details internally.

Test Automation Framework

A test automation framework is a structured set of guidelines, tools, and practices for organizing and executing automated tests. It defines how tests are written, how test data is managed, how results are reported, and how the test suite scales as the application grows. Without a framework, test automation devolves into a collection of ad-hoc scripts that are difficult to maintain, extend, and debug. Five framework architectures are commonly used. Data-driven frameworks separate test logic from test data, storing inputs in CSV, Excel, or JSON files and iterating over them. Keyword-driven frameworks map test steps to reusable keywords ("login," "addToCart," "checkout") that abstract implementation details. BDD (Behavior-Driven Development) frameworks like Cucumber and SpecFlow use Gherkin syntax (Given/When/Then) to write tests in near-English that bridge business and engineering communication. Hybrid frameworks combine elements from multiple architectures. Page Object Model (POM) organizes selector logic into per-page classes, centralizing the maintenance burden. Playwright and Cypress are the dominant modern frameworks for web E2E testing. Playwright offers multi-browser support, auto-waiting, trace viewers, and TypeScript-first design. Cypress offers a developer-friendly test runner with time-travel debugging and real-time reloading. Both have opinionated project structures that serve as de facto framework templates.