Skip to main content

Knowledge Bases & File Storage

Overview

Healthcare AI systems need access to relevant, trustworthy, and up-to-date information — clinical guidelines, imaging reports, patient summaries, and research papers.

ByteEngine provides:

  • Knowledge Bases — vector stores optimized for healthcare data (FHIR, PDFs, text, images)
  • File Storage — HIPAA-compliant storage for structured and unstructured medical files

Together, they form the foundation for RAG (Retrieval-Augmented Generation) in healthcare — enabling AI Workers to ground their reasoning in accurate, context-specific information.

1. What is a Knowledge Base?

A Knowledge Base (KB) in ByteEngine is a semantic, searchable repository where you store and query unstructured healthcare content such as:

  • Clinical documents (PDF, TXT, DOCX)
  • Research papers
  • SOAP notes
  • FHIR resource text fields (e.g., Observation.note)
  • Image captions or radiology reports

ByteEngine automatically:

  • Extracts and preprocesses the text
  • Generates embeddings using a domain-optimized model (e.g., BioLinkBERT, PubMedBERT)
  • Stores the data in a vector database
  • Enables semantic and contextual search for your AI Workers

Knowledge Base Architecture

Knowledge Base Architecture

Flow: [PDFs / FHIR / Text] → [AI Embedding Engine] → [Vector Store] → [AI Worker + Session → Context Retrieval]

2. Creating a Knowledge Base

Using the Console (No Code)

  1. Navigate to Knowledge Bases → Create New
  2. Name your KB (e.g., "Clinical Guidelines")
  3. Upload files (PDF, CSV, TXT, or FHIR export)
  4. Click "Ingest Data"

ByteEngine will preprocess and embed your files automatically.

UI Example: [Screenshot: Knowledge Base creation interface showing file upload and configuration options]

Using the API

curl -X POST "https://api.engine.boolbyte.com/api/knowledgebases" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Diabetes Knowledge Base",
"description": "Clinical research and treatment guidelines for diabetes",
"type": "text"
}'

Upload Files to the Knowledge Base

curl -X POST "https://api.engine.boolbyte.com/api/knowledgebases/{kb_id}/upload" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-F "[email protected]"

Using JavaScript SDK

import { EngineClient } from '@boolbyte/engine';

const client = new EngineClient({ apiKey: 'YOUR_API_KEY' });

// Create a knowledge base
const knowledgeBase = await client.knowledgeBase.createKnowledgeBase({
name: 'Diabetes Knowledge Base',
description: 'Clinical research and treatment guidelines for diabetes',
type: 'text'
});

// Upload files to the knowledge base
const uploadResult = await client.knowledgeBase.uploadFile(knowledgeBase.data.id, {
file: diabetesGuidelinesFile,
name: 'diabetes-guidelines.pdf'
});

console.log('Knowledge base created:', knowledgeBase.data.id);

3. Querying a Knowledge Base

Once your KB is ready, you can run semantic searches.

Example API Query

curl -X POST "https://api.engine.boolbyte.com/api/knowledgebases/{kb_id}/search" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "What are the treatment options for Type 2 Diabetes?"}'

Example Response

{
"success": true,
"data": {
"matches": [
{
"score": 0.89,
"source": "diabetes-guidelines.pdf",
"snippet": "For Type 2 Diabetes, first-line therapy includes Metformin..."
},
{
"score": 0.76,
"source": "clinical-research-2024.txt",
"snippet": "Studies show GLP-1 agonists reduce HbA1c by..."
}
]
}
}

4. Using Knowledge Bases in AI Workers

Knowledge Bases can be attached directly to Workers or Sessions, enabling RAG-powered reasoning.

YAML Example

worker:
name: "diabetes-coach"
model: "medgemma-27b"
knowledge_bases:
- "kb:diabetes-guidelines"
context: "Use the diabetes KB to answer patient treatment queries."

Programmatic Example (JavaScript)

// Create a worker with knowledge base access
const worker = await client.worker.createWorker({
name: 'diabetes-coach',
defaultModelName: 'medgemma-27b',
instructions: 'Use the diabetes knowledge base to answer patient treatment queries.',
toolConfigs: {
tools: [
{
toolName: 'knowledge_base',
config: {
knowledgeBaseId: 'diabetes-guidelines'
}
}
]
}
});

// Run a task with knowledge base context
const session = await client.session.createSession({
workerId: worker.data.id,
metadata: { context: 'diabetes consultation' }
});

const task = await client.task.createTask(session.data.id, {
instructions: 'Recommend medication for Type 2 diabetes based on current guidelines',
model: 'medgemma-27b'
});

The Worker retrieves the most relevant text from your KB and includes it in the model's prompt automatically — no manual context injection needed.

5. File Storage Overview

File Storage is ByteEngine's secure, encrypted storage layer for all healthcare-related files — clinical reports, images, CSV exports, or DICOM files.

Every file you upload is:

  • Encrypted at rest (AES-256)
  • Scanned for PHI (Protected Health Information)
  • Indexed for AI and search
  • Linked to your FHIR resources where applicable

File Storage Architecture

File Storage Architecture

Flow: [Upload File] → [Encryption] → [Metadata Index] → [Secure Access URL]

6. Uploading Files

Using the Console

  1. Go to File Storage → Upload
  2. Choose your file or drag-and-drop
  3. Optionally attach metadata (e.g., patient ID, file type)
  4. Click Upload

UI Example: [Screenshot: File upload interface showing drag-and-drop and metadata options]

Using the API

curl -X POST "https://api.engine.boolbyte.com/api/storage" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-F "[email protected]" \
-F "metadata={\"patient_id\":\"12345\",\"type\":\"LabReport\"}"

Example Response

{
"success": true,
"data": {
"id": "file_abc123",
"url": "https://storage.engine.boolbyte.com/file_abc123",
"metadata": {
"patient_id": "12345",
"type": "LabReport"
},
"status": "stored",
"createdAt": "2024-01-15T10:00:00.000Z"
}
}

Using JavaScript SDK

// Upload a file
const file = await client.storage.uploadFile({
file: labReportFile,
metadata: {
patient_id: '12345',
type: 'LabReport',
category: 'laboratory'
}
});

console.log('File uploaded:', file.data.id);

7. Retrieving Files

Files can be retrieved securely using access tokens or API calls.

curl -X GET "https://api.engine.boolbyte.com/api/storage/file_abc123" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY"

Files can also be linked to FHIR DocumentReference resources for interoperability.

8. Linking Files to FHIR Resources

Example: attach a PDF to a patient's medical record.

curl -X POST "https://api.engine.boolbyte.com/api/fhir/DocumentReference" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"resourceType": "DocumentReference",
"subject": {"reference": "Patient/123"},
"content": [{
"attachment": {
"url": "https://storage.engine.boolbyte.com/file_abc123",
"title": "Lab Results PDF"
}
}]
}'

9. File-Based Triggers

You can configure subscriptions or workflows to trigger on new file uploads.

trigger:
type: "file.uploaded"
filter: "type == 'LabReport'"
workflow: "notify-lab-team"

Real-world use case:

When a new lab report is uploaded, automatically run an AI Worker to summarize it and send the summary to the physician's dashboard.

10. RAG (Retrieval-Augmented Generation) in Practice

ByteEngine makes it effortless to build AI systems that think with context.

Example:

A "Clinical Summarizer" Worker that answers clinician queries based on uploaded patient files and guidelines.

Workflow Example

workflow:
name: "clinical-summarizer"
steps:
- worker: "summarize-documents"
input:
kb: "patient-files"
question: "{{workflow.input.query}}"

Output:

"Based on the uploaded lab report and SOAP notes, the patient's HbA1c trend indicates potential Type 2 Diabetes progression."

11. Best Practices

AreaRecommendation
File NamingUse descriptive names with identifiers (e.g., Patient_123_LabReport_2024.pdf)
Knowledge BasesSeparate KBs by clinical domain for precision (e.g., Cardiology, Radiology)
Storage SecurityEnable file access expiry or signed URLs for sharing
ComplianceUse EU/US data residency options for sensitive uploads
RAG OptimizationLimit retrieved context chunks to < 2KB for better LLM performance

12. Example: AI-Powered Research Assistant

Goal: Create a Worker that helps clinicians find the latest diabetes research.

Steps:

  1. Create a Knowledge Base → upload research PDFs
  2. Create a Worker → attach that KB
  3. Ask questions in natural language
// Create research assistant worker
const researchWorker = await client.worker.createWorker({
name: 'research-assistant',
defaultModelName: 'medgemma-27b',
instructions: 'Help clinicians find the latest diabetes research and treatment guidelines.',
toolConfigs: {
tools: [
{
toolName: 'knowledge_base',
config: {
knowledgeBaseId: 'diabetes-research'
}
}
]
}
});

// Query the research assistant
const session = await client.session.createSession({
workerId: researchWorker.data.id,
metadata: { context: 'research query' }
});

const task = await client.task.createTask(session.data.id, {
instructions: 'What are the new GLP-1 therapy guidelines?',
model: 'medgemma-27b'
});

Output:

"According to ADA 2024 guidelines, GLP-1 receptor agonists are recommended as first-line for patients with cardiovascular risk factors."

13. Coming Soon: Hybrid Knowledge Graphs

ByteEngine will soon support FHIR + unstructured knowledge graph linking, allowing automatic relationships between structured EHR data and unstructured notes, e.g.:

[Patient] → [Observation: HbA1c] → [Lab Report PDF] → [KnowledgeBase: Diabetes Guidelines]

Real-World Implementation Examples

Clinical Documentation System

// Complete clinical documentation workflow
const clinicalDocs = {
// 1. Upload patient documents
uploadDocument: async (patientId, documentFile) => {
const file = await client.storage.uploadFile({
file: documentFile,
metadata: {
patient_id: patientId,
type: 'ClinicalDocument',
category: 'progress_notes'
}
});

// 2. Create FHIR DocumentReference
await client.dataStore.initializeFhirStoreClient('main-fhir-server');
const fhirClient = client.dataStore.getFhirStoreClient();

await fhirClient.create({
resource: {
resourceType: 'DocumentReference',
status: 'current',
subject: { reference: `Patient/${patientId}` },
content: [{
attachment: {
url: file.data.url,
title: documentFile.name
}
}]
}
});

return file.data;
},

// 3. Add to knowledge base for AI access
addToKnowledgeBase: async (fileId, knowledgeBaseId) => {
return await client.knowledgeBase.uploadFile(knowledgeBaseId, {
fileId: fileId,
name: 'clinical-document'
});
}
};

Research Paper Analysis

// AI-powered research analysis system
const researchAnalysis = {
// Upload research papers
uploadResearch: async (paperFile) => {
const file = await client.storage.uploadFile({
file: paperFile,
metadata: {
type: 'ResearchPaper',
category: 'diabetes_research'
}
});

// Add to research knowledge base
await client.knowledgeBase.uploadFile('diabetes-research-kb', {
fileId: file.data.id,
name: paperFile.name
});

return file.data;
},

// Query research with AI
queryResearch: async (question) => {
const session = await client.session.createSession({
workerId: 'research-analyst',
metadata: { context: 'research analysis' }
});

const task = await client.task.createTask(session.data.id, {
instructions: `Based on the research papers in the knowledge base, answer: ${question}`,
model: 'medgemma-27b'
});

return task.data;
}
};

Next Steps