GitHub Security Architecture

Understand the core security flow we employ to check code repos and verify exposed keys without compromising system integrity.

Mode 1 Whole Repo Scan Pipeline

We parse GitHub repositories dynamically, downloading text files in isolated sandbox memory to match them against industry standard signature patterns for sensitive secrets, avoiding any local storage traces.

1

URL Resolution

Your target URL is verified to extract the explicit owner and repository coordinates. The scanner checks if the repo exists and is publicly accessible.

URL: https://github.com/user/demo
Parsed: owner="user", repo="demo"
2

Recursive Git Trees

We request GitHub's public git tree index recursively. This returns the list of all files, their paths, sizes, and blob SHA identifiers.

API: /repos/user/demo/git/trees/main
Response: [{path: "app.py", sha: "abc12..."}]
3

In-Memory Ingestion

Files smaller than 500KB are processed. The binary data is downloaded and parsed directly in memory. Files are never written to disk, preventing permanent local cache storage.

Blob query: /git/blobs/abc12...
Decoded: base64 -> raw code text
4

Regex Pattern Matching

We scan the decoded text against dynamic regular expression signatures compiled in SECRET_PATTERNS.txt, detecting leaked API keys, tokens, and private keys.

Regex check: "AWS_KEY = AKIA..."
Matches: AWS Access Key detected!

Mode 2 API Key Search Pipeline

Verify if your proprietary API keys have been accidentally committed and leaked into any public code across the entire GitHub network.

1

Length Validation

The key entered is validated to ensure it is at least 8 characters. Shorter inputs are blocked to avoid vague queries and rate limit issues.

Input: "sk_live_51M..."
Length: >8 characters (PASSED)
2

GitHub Search Query

We securely dispatch a query to the public GitHub Code Search API. We never log or cache the queried API key, which is kept completely ephemeral.

Query: /search/code?q={api_key}
Auth: Rate-limit protected headers
3

Exposure Extraction

GitHub returns matching repositories, file paths, and exposure URLs. We format and mask your keys in the report to ensure the scan remains secure.

Masked: sk_li...51M
Results: Found 1 exposure in repo XYZ

Key Architecture Features

In-Memory Processing
Files downloaded from repos are examined in RAM and immediately garbage collected.
Signature Updates
Scans use the comprehensive patterns from SECRET_PATTERNS.txt.
Masked Reports
Discovered secrets are partially masked to prevent viewing sensitive credentials in full.
Size Constraints
Skips bloated binary and large data files (>500KB) to ensure high scan speed.
Ephemeral Keys
Searched API keys are immediately purged from variables once output is sent.
Rate-Limit Tolerant
Includes safety throttles for GitHub API to prevent IP blacklisting.

Scanner Mode Comparison

Metric Whole Repo Scan API Key Search
Objective Find any secrets inside one repo Find where a specific API key is leaked publicly
Target Scope Single specified repository Global public GitHub code index
Scanning Technique Regex compilation on files Code search querying
Result Info File path, line number, pattern type Repo name, file name, html link

Privacy & Security First

Securing your tokens during scans:

  • No Data Persistence: Scanned files and keys are processed in temporary local threads and never saved.
  • Read-Only Scan: The system has zero authorization write access, meaning your files and repos are never modified or cloned.
  • Secure Connection: Communication is secured strictly via HTTPS/TLS endpoints to GitHub official servers.
  • Anonymity: Verify repositories without registering an account.

Remediation Plan for Leaked Secrets

Revoke Credentials: Immediately revoke the exposed token or API key inside the provider's console.
Clean Git History: Deleting the secret in a new commit is NOT enough. You must rewrite the git history using tools like git-filter-repo or BFG.
Environment Variables: Store secrets in .env configurations and place them inside .gitignore.
Set up Scan Alerts: Enable GitHub's Secret Scanning alerts under repo security options.