GitHub Security Architecture
Understand the core security flow we employ to check code repos and verify exposed keys without compromising system integrity.
Mode 1 Whole Repo Scan Pipeline
We parse GitHub repositories dynamically, downloading text files in isolated sandbox memory to match them against industry standard signature patterns for sensitive secrets, avoiding any local storage traces.
URL Resolution
Your target URL is verified to extract the explicit owner and repository coordinates. The scanner checks if the repo exists and is publicly accessible.
Parsed: owner="user", repo="demo"
Recursive Git Trees
We request GitHub's public git tree index recursively. This returns the list of all files, their paths, sizes, and blob SHA identifiers.
Response: [{path: "app.py", sha: "abc12..."}]
In-Memory Ingestion
Files smaller than 500KB are processed. The binary data is downloaded and parsed directly in memory. Files are never written to disk, preventing permanent local cache storage.
Decoded: base64 -> raw code text
Regex Pattern Matching
We scan the decoded text against dynamic regular expression signatures compiled in SECRET_PATTERNS.txt, detecting leaked API keys, tokens, and private keys.
Matches: AWS Access Key detected!
Mode 2 API Key Search Pipeline
Verify if your proprietary API keys have been accidentally committed and leaked into any public code across the entire GitHub network.
Length Validation
The key entered is validated to ensure it is at least 8 characters. Shorter inputs are blocked to avoid vague queries and rate limit issues.
Length: >8 characters (PASSED)
GitHub Search Query
We securely dispatch a query to the public GitHub Code Search API. We never log or cache the queried API key, which is kept completely ephemeral.
Auth: Rate-limit protected headers
Exposure Extraction
GitHub returns matching repositories, file paths, and exposure URLs. We format and mask your keys in the report to ensure the scan remains secure.
Results: Found 1 exposure in repo XYZ
Key Architecture Features
Files downloaded from repos are examined in RAM and immediately garbage collected.
Scans use the comprehensive patterns from SECRET_PATTERNS.txt.
Discovered secrets are partially masked to prevent viewing sensitive credentials in full.
Skips bloated binary and large data files (>500KB) to ensure high scan speed.
Searched API keys are immediately purged from variables once output is sent.
Includes safety throttles for GitHub API to prevent IP blacklisting.
Scanner Mode Comparison
| Metric | Whole Repo Scan | API Key Search |
|---|---|---|
| Objective | Find any secrets inside one repo | Find where a specific API key is leaked publicly |
| Target Scope | Single specified repository | Global public GitHub code index |
| Scanning Technique | Regex compilation on files | Code search querying |
| Result Info | File path, line number, pattern type | Repo name, file name, html link |
Privacy & Security First
Securing your tokens during scans:
- ✓No Data Persistence: Scanned files and keys are processed in temporary local threads and never saved.
- ✓Read-Only Scan: The system has zero authorization write access, meaning your files and repos are never modified or cloned.
- ✓Secure Connection: Communication is secured strictly via HTTPS/TLS endpoints to GitHub official servers.
- ✓Anonymity: Verify repositories without registering an account.
Remediation Plan for Leaked Secrets
git-filter-repo or BFG..env configurations and place them inside .gitignore.