Skip to content

Instantly share code, notes, and snippets.

@sdirix
Forked from isasmendiagus/README.md
Last active December 10, 2024 10:11
Show Gist options
  • Save sdirix/fe22ed5c010fe53dae712ba340c1c0e2 to your computer and use it in GitHub Desktop.
Save sdirix/fe22ed5c010fe53dae712ba340c1c0e2 to your computer and use it in GitHub Desktop.

SCANOSS Memory Scanning Example

This example demonstrates how to perform in-memory scanning using the SCANOSS SDK.

Pre-requisites

  • node
  • npm

Quick Start

  1. Download this Gist as a ZIP file using the "Download ZIP" button
  2. Extract the ZIP contents
  3. Install dependencies and run:
npm install
npm start

Code Explanation

This example demonstrates in-memory code scanning without the use of a file. The key method used is scanContents which allows direct scanning of code strings.

import { Scanner, ScannerComponent } from 'scanoss';
interface ScanOSSScanner{
scanContents: <T extends string>(options: { content: string; key: T}) => Promise<{[K in `/${T}`]: ScannerComponent[]} | null>;
}
export interface ScanOSSResultClean {
type: 'clean';
}
export interface ScanOSSResultMatch {
type: 'match';
matched: string; // e.g. "75%"
url: string; // e.g. a Github link
raw: unknown; // treat remaining results as a block box for now
}
export interface ScanOSSResultError {
type: 'error';
message: string;
}
export type ScanOSSResult = ScanOSSResultClean | ScanOSSResultMatch | ScanOSSResultError;
async function scanContent(content: string, apiKey?: string): Promise<ScanOSSResult> {
// Partial 'ScannerCfg' does not seem to work, so I need to hand over a full default object?
const scanner = new Scanner(/* {
API_KEY: apiKey || process.env.SCANOSS_API_KEY || undefined,
MAX_RESPONSES_IN_BUFFER: 1,
} as ScannerCfg*/);
// Adjusted type as the real ScanOSS Scanner types return the nested type instead. The type should be adjusted in the library.
const results = await (scanner as unknown as ScanOSSScanner).scanContents({
content,
key: 'content_scanning',
});
// I'm assuming that 'null' means error. Is this correct? What is returned in case limits are exceeded?
if (!results) {
return {
type: 'error',
message: 'ScanOSS request unsuccessful'
};
}
// Is the first result always the "best" (i.e. highest match) result?
// I only ever get one result, which is fine, but when to expect more?
const firstEntry = results['/content_scanning'][0];
// Is this the correct check for no match found?
if (firstEntry.id === 'none') {
return {
type: 'clean'
};
}
// Will 'matched' and 'url' always exist?
// Some of the other properties of 'ScannedComponent' seem to be optional although they are not typed as such, e.g. 'dependencies', 'copyrights', 'server.flags'
// Also some of the returned properties are not typed, e.g. 'url_stats'
return {
type: 'match',
matched: firstEntry.matched,
url: firstEntry.url,
raw: firstEntry
};
}
async function main() {
const result = await scanContent(
`// Compare words in given command against known command\n for (int j = 1; j <= limit; j++)\n {\n char *cword = ldb_extract_word(j, command);\n char *kword = ldb_extract_word(j, ldb_commands[i]);\n bool fulfilled = false;\n if (!strcmp(kword, "{hex}")) fulfilled = ldb_valid_hex(cword);\n else if (!strcmp(kword, "{ascii}")) fulfilled = ldb_valid_ascii(cword);\n else if (!strcmp(kword, cword)) fulfilled = true;\n free(cword);\n free(kword);\n\n if (!fulfilled) break;\n else if (j > hits)\n {\n closest = i;\n hits = j;\n *word_nr = hits;\n *command_nr = closest;\n }\n }\n if ((hits > 0) && (hits == known_words)) return true;\n }\n\n return false;`
);
// In the UI I currently highlight 'matched' and 'url' and pretty print the whole remaining result as JSON in a preformatted block
// Is there already a built-in mechanism to get a nicer textual overview? Ideally I would like to avoid printing a raw JSON in the UI with reasonable effort.
console.log(JSON.stringify(result, null, 2));
}
main();
{
"name": "scanoss-memory-example",
"version": "1.0.0",
"main": "index.js",
"scripts": {
"start": "tsc && node dist/index.js",
"test": "echo \"Error: no test specified\" && exit 1"
},
"keywords": [],
"author": "",
"license": "ISC",
"description": "SCANOSS memory scanning example",
"devDependencies": {
"@types/node": "^22.9.0",
"typescript": "^5.6.3"
},
"dependencies": {
"scanoss": "^0.15.2"
}
}
{
"compilerOptions": {
"target": "es2016",
"module": "commonjs",
"outDir": "./dist",
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"strict": true,
"skipLibCheck": true
},
"include": ["index.ts"],
"exclude": ["node_modules"]
}
@isasmendiagus
Copy link

Hi Stefan, I'll add comments below to your questions:

1 - Partial 'ScannerCfg' does not seem to work, so I need to hand over a full default object?

// Partial config doesn't work because the Scanner class expects a complete configuration.
// You need to provide a full config object:
const cfg = new ScannerCfg({
    API_KEY: apiKey || process.env.SCANOSS_API_KEY,
    API_URL: "https://api.scanoss.com/api/scan/direct",
    MAX_RESPONSES_IN_BUFFER: 1
});
const scanner = new Scanner(cfg);

// Note: We have this as an enhancement request to support partial config objects
// in future versions to align better with JavaScript ecosystem patterns

2 - Adjusted type as the real ScanOSS Scanner types return the nested type instead. The type should be adjusted in the library.

// The Scanner interface returns results in this structure:
{
  "/content_scanning": [
     { id: "none" | "snippet" | "file", ... }
   ]
}

// The available properties depend on the id type:
// - "none": basic metadata only (id and server)
// - "snippet": includes all properties
// - "file": includes all properties
//
// We'll improve the TypeScript definitions to better reflect this structure

3 - I'm assuming that 'null' means error. Is this correct? What is returned in case limits are exceeded?

// The scanner will throw an error for issues like rate limits or invalid API keys:
try {
    const result = await scanner.scanContents(...)
} catch (error) {
    // Handle rate limits, authentication errors, etc.
    console.error('Scan failed:', error.message);
}

4 - Is the first result always the "best" (i.e. highest match) result? I only ever get one result, which is fine, but when to expect more?

// The first result is always the highest confidence match.
// Multiple component scanning is planned for future releases. For now, you'll only get the single best match in the results.

5 - Is this the correct check for no match found?

// Yes, checking id === "none" is the correct way to detect no matches
// Example:
if (firstEntry.id === 'none') {
    // No matches found in the knowledge database
    return { type: 'clean' };
}

6 - Will 'matched' and 'url' always exist?. Some of the other properties of 'ScannedComponent' seem to be optional although they are not typed as such, e.g. 'dependencies', 'copyrights', 'server.flags'. Also some of the returned properties are not typed, e.g. 'url_stats'

// For matches (id === 'snippet' || id === 'file'):
// - matched: always present (percentage of match)
// - url: always present (source location)
// - copyright: always present
//
// Other fields are optional and depend on the match type.
// Safe usage pattern:
if (firstEntry.id !== 'none') {
    // Safe to access matched/url/copyright
}

//Dependencies and Vulnerabilities properties are not available on the free tier

7 - In the UI I currently highlight 'matched' and 'url' and pretty print the whole remaining result as JSON in a preformatted block. Is there already a built-in mechanism to get a nicer textual overview? Ideally I would like to avoid printing a raw JSON in the UI with reasonable effort.

It will depends on what you want to show to the end user. For example, licenses would be useful information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment