Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Metodologías Tecnología

Semgrep: modern SAST in your pipeline

Semgrep: modern SAST in your pipeline

Actualizado: 2026-05-03

For a long time SAST meant a heavy, expensive tool with false positive rates that exhausted any team. Classic commercial scanners took hours to finish and when they did, the list of findings was so long nobody read it. That scenario has changed in recent years, and much of the credit goes to Semgrep, a tool that appeared as an open source project and has settled as one of the default options in modern pipelines.

This post summarizes how Semgrep is used in teams I’ve seen, what kinds of rules add value, and what mistakes repeat when deployment happens without care. For full SAST ecosystem context, compare with the analysis of CodeQL in GitHub Advanced Security.

Key takeaways

  • Semgrep thinks in rules written in a language similar to real code: low friction for encoding internal conventions.
  • Syntactic-match analysis speed: a medium-sized repository scans in seconds in CI.
  • Excels at incorrect API usage, hardcoded secrets, internal conventions, and YAML config checks.
  • Does not replace deep interprocedural analysis (CodeQL, Checkmarx) or vulnerable dependency detection.
  • Failed adoption always starts the same way: activate everything by default and fail to filter the resulting noise.

Why Semgrep fits where other SAST doesn’t

The fundamental difference is that Semgrep thinks in rules, not a monolithic catalog. A Semgrep rule is a code pattern written in a language very similar to real code, with wildcards. You write the rule in ten minutes, test it in your own repository, and if it works, add it to the pipeline. That low friction completely changes the economics of static analysis: rules stop being incomprehensible vendor artifacts and become team code.

The second reason is speed. Semgrep analyzes by syntactic match, without building a complete control flow graph. For many vulnerability types that’s sufficient and delivers performance orders of magnitude better than classic SAST. A medium-sized repository scans in seconds in CI, not hours. That means running it on every pull request instead of as a nightly job nobody looks at.

The third reason is the public catalog. Semgrep’s registry accumulates thousands of rules maintained by the community and team. You can start with a per-language pack and add your own rules as recurring problems appear in your code.

What problem types it catches well

Semgrep excels at a specific set of vulnerabilities:

Incorrect usage of sensitive APIs. Calls to encryption functions without secure initialization, use of cryptographic libraries with weak algorithms, SQL construction without parameters, inclusion of user input in shell commands. These are the cases where Semgrep does a security linter’s job with very few false positives.

Hardcoded secrets. Although dedicated tools exist (gitleaks, trufflehog), Semgrep can detect credential patterns, AWS keys, and service tokens with very simple rules, combining this detection with others in the same pipeline without introducing a new tool.

Internal convention violations. Many teams have implicit rules: “always use logger X, not print,” “all endpoints must pass through the auth decorator,” “don’t build URLs manually.” Those rules are written in Semgrep in an afternoon and become part of CI. The code quality impact exceeds that of any external package, because it solves real team problems.

Configuration drift in YAML or JSON. Kubernetes policies without resource requests, CronJobs without backoffLimit, images without pinned digest. Semgrep’s engine understands structure, not just text, and allows precise rules over configuration files.

What Semgrep doesn’t do as well

It’s important not to believe Semgrep replaces all other analyses:

  • Complex taint analyses with flow across many functions or modules. Tools with full interprocedural analysis (CodeQL, Checkmarx, Fortify) cover this niche better. Semgrep Pro has incorporated limited-flow taint recently and covers more cases than before, but deep analysis niches remain for specialized tools.
  • Third-party library vulnerability detection (CVEs). For that, there is SCA (Dependabot, Renovate with Trivy, Snyk). Forcing Semgrep to do SCA leads to a bad experience.
  • Concurrency bugs, race conditions, native code memory problems, and pure logic errors. Semgrep is a pattern engine, not an exhaustive semantic analyzer.

How to add it to a pipeline without turning it into noise

Typical adoption fails when everything is activated by default and two hundred findings appear on day one. Nobody reads them and the tool gets ignored. To avoid this, go step by step:

Step 1: scan locally first. Reviewing findings before connecting to CI lets you identify noisy rules, mark them disabled in the config, and start with a clean profile. It’s an afternoon of work that saves months of conflict.

Step 2: introduce Semgrep in CI as a warning, not a blocker. In the first weeks, findings appear as pull request comments but don’t prevent merging. This lets the team learn to read them without pressure. Once the volume of new findings stabilizes, you can start blocking by high severity.

Step 3: separate rules by purpose. Security ones in a mandatory pack, internal conventions in an optional pack, custom rules in their own versioned file in the repository. This separation helps the team understand what’s happening when something fires.

An example custom rule illustrating the pattern:

yaml
rules:
  - id: block-bare-urllib
    pattern: urllib.request.urlopen($X)
    message: Use http_client.get() from the internal module; urlopen doesn't validate certs by default.
    severity: WARNING
    languages: [python]

A rule like this, written during the integration afternoon, covers a concrete team problem and is revisited when the architecture changes. That’s the difference between doing security as ceremony and doing it as code.

The step to Semgrep Pro and AppSec Platform

Semgrep operates on a dual model: the open source engine, plus the Pro service with advanced rules and interprocedural analysis. The enterprise offering includes the AppSec Platform, integrating SAST, secrets, and a basic SCA tier.

Open source covers 80% of a small or medium team’s needs. The step to Pro makes sense when the code has significant complex data flow (applications with many layers) and when the team has matured tool usage to the point of needing taint analysis. Starting with Pro without going through the custom rules cycle is expensive and delivers less value than it seems.

For a complete DevSecOps stack covering dependency and container image analysis, also check the comparison of Trivy and Grype for image scanning.

My read

Semgrep works because it inverts the power model of traditional SAST. Instead of a vendor deciding what to detect and the team consuming it, the team decides what to detect and codes it in the same language as their code. That shift in locus of control explains why custom rules have more impact than catalog rules: they reflect real team problems, not generic industry problems.

The failing adoption is the one that activates everything by default and expects Semgrep to do code governance work. The one that works spends an afternoon cleaning the baseline, two weeks observing without blocking, and from there adds rules that solve real problems.

Was this useful?
[Total: 13 · Average: 4.5]

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.