Knowledge Bases & File Storage
Overview
Healthcare AI systems need access to relevant, trustworthy, and up-to-date information — clinical guidelines, imaging reports, patient summaries, and research papers.
ByteEngine provides:
- Knowledge Bases — vector stores optimized for healthcare data (FHIR, PDFs, text, images)
- File Storage — HIPAA-compliant storage for structured and unstructured medical files
Together, they form the foundation for RAG (Retrieval-Augmented Generation) in healthcare — enabling AI Workers to ground their reasoning in accurate, context-specific information.
1. What is a Knowledge Base?
A Knowledge Base (KB) in ByteEngine is a semantic, searchable repository where you store and query unstructured healthcare content such as:
- Clinical documents (PDF, TXT, DOCX)
- Research papers
- SOAP notes
- FHIR resource text fields (e.g., Observation.note)
- Image captions or radiology reports
ByteEngine automatically:
- Extracts and preprocesses the text
- Generates embeddings using a domain-optimized model (e.g., BioLinkBERT, PubMedBERT)
- Stores the data in a vector database
- Enables semantic and contextual search for your AI Workers
Knowledge Base Architecture
Flow: [PDFs / FHIR / Text] → [AI Embedding Engine] → [Vector Store] → [AI Worker + Session → Context Retrieval]
2. Creating a Knowledge Base
Using the Console (No Code)
- Navigate to Knowledge Bases → Create New
- Name your KB (e.g., "Clinical Guidelines")
- Upload files (PDF, CSV, TXT, or FHIR export)
- Click "Ingest Data"
ByteEngine will preprocess and embed your files automatically.
UI Example: [Screenshot: Knowledge Base creation interface showing file upload and configuration options]
Using the API
curl -X POST "https://api.engine.boolbyte.com/api/knowledgebases" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Diabetes Knowledge Base",
"description": "Clinical research and treatment guidelines for diabetes",
"type": "text"
}'
Upload Files to the Knowledge Base
curl -X POST "https://api.engine.boolbyte.com/api/knowledgebases/{kb_id}/upload" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-F "[email protected]"
Using JavaScript SDK
import { EngineClient } from '@boolbyte/engine';
const client = new EngineClient({ apiKey: 'YOUR_API_KEY' });
// Create a knowledge base
const knowledgeBase = await client.knowledgeBase.createKnowledgeBase({
name: 'Diabetes Knowledge Base',
description: 'Clinical research and treatment guidelines for diabetes',
type: 'text'
});
// Upload files to the knowledge base
const uploadResult = await client.knowledgeBase.uploadFile(knowledgeBase.data.id, {
file: diabetesGuidelinesFile,
name: 'diabetes-guidelines.pdf'
});
console.log('Knowledge base created:', knowledgeBase.data.id);
3. Querying a Knowledge Base
Once your KB is ready, you can run semantic searches.
Example API Query
curl -X POST "https://api.engine.boolbyte.com/api/knowledgebases/{kb_id}/search" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "What are the treatment options for Type 2 Diabetes?"}'
Example Response
{
"success": true,
"data": {
"matches": [
{
"score": 0.89,
"source": "diabetes-guidelines.pdf",
"snippet": "For Type 2 Diabetes, first-line therapy includes Metformin..."
},
{
"score": 0.76,
"source": "clinical-research-2024.txt",
"snippet": "Studies show GLP-1 agonists reduce HbA1c by..."
}
]
}
}
4. Using Knowledge Bases in AI Workers
Knowledge Bases can be attached directly to Workers or Sessions, enabling RAG-powered reasoning.
YAML Example
worker:
name: "diabetes-coach"
model: "medgemma-27b"
knowledge_bases:
- "kb:diabetes-guidelines"
context: "Use the diabetes KB to answer patient treatment queries."
Programmatic Example (JavaScript)
// Create a worker with knowledge base access
const worker = await client.worker.createWorker({
name: 'diabetes-coach',
defaultModelName: 'medgemma-27b',
instructions: 'Use the diabetes knowledge base to answer patient treatment queries.',
toolConfigs: {
tools: [
{
toolName: 'knowledge_base',
config: {
knowledgeBaseId: 'diabetes-guidelines'
}
}
]
}
});
// Run a task with knowledge base context
const session = await client.session.createSession({
workerId: worker.data.id,
metadata: { context: 'diabetes consultation' }
});
const task = await client.task.createTask(session.data.id, {
instructions: 'Recommend medication for Type 2 diabetes based on current guidelines',
model: 'medgemma-27b'
});
The Worker retrieves the most relevant text from your KB and includes it in the model's prompt automatically — no manual context injection needed.
5. File Storage Overview
File Storage is ByteEngine's secure, encrypted storage layer for all healthcare-related files — clinical reports, images, CSV exports, or DICOM files.
Every file you upload is:
- Encrypted at rest (AES-256)
- Scanned for PHI (Protected Health Information)
- Indexed for AI and search
- Linked to your FHIR resources where applicable
File Storage Architecture
Flow: [Upload File] → [Encryption] → [Metadata Index] → [Secure Access URL]
6. Uploading Files
Using the Console
- Go to File Storage → Upload
- Choose your file or drag-and-drop
- Optionally attach metadata (e.g., patient ID, file type)
- Click Upload
UI Example: [Screenshot: File upload interface showing drag-and-drop and metadata options]
Using the API
curl -X POST "https://api.engine.boolbyte.com/api/storage" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-F "[email protected]" \
-F "metadata={\"patient_id\":\"12345\",\"type\":\"LabReport\"}"
Example Response
{
"success": true,
"data": {
"id": "file_abc123",
"url": "https://storage.engine.boolbyte.com/file_abc123",
"metadata": {
"patient_id": "12345",
"type": "LabReport"
},
"status": "stored",
"createdAt": "2024-01-15T10:00:00.000Z"
}
}
Using JavaScript SDK
// Upload a file
const file = await client.storage.uploadFile({
file: labReportFile,
metadata: {
patient_id: '12345',
type: 'LabReport',
category: 'laboratory'
}
});
console.log('File uploaded:', file.data.id);
7. Retrieving Files
Files can be retrieved securely using access tokens or API calls.
curl -X GET "https://api.engine.boolbyte.com/api/storage/file_abc123" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY"
Files can also be linked to FHIR DocumentReference resources for interoperability.
8. Linking Files to FHIR Resources
Example: attach a PDF to a patient's medical record.
curl -X POST "https://api.engine.boolbyte.com/api/fhir/DocumentReference" \
-H "Authorization: Bearer $BYTEENGINE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"resourceType": "DocumentReference",
"subject": {"reference": "Patient/123"},
"content": [{
"attachment": {
"url": "https://storage.engine.boolbyte.com/file_abc123",
"title": "Lab Results PDF"
}
}]
}'
9. File-Based Triggers
You can configure subscriptions or workflows to trigger on new file uploads.
trigger:
type: "file.uploaded"
filter: "type == 'LabReport'"
workflow: "notify-lab-team"
Real-world use case:
When a new lab report is uploaded, automatically run an AI Worker to summarize it and send the summary to the physician's dashboard.
10. RAG (Retrieval-Augmented Generation) in Practice
ByteEngine makes it effortless to build AI systems that think with context.
Example:
A "Clinical Summarizer" Worker that answers clinician queries based on uploaded patient files and guidelines.
Workflow Example
workflow:
name: "clinical-summarizer"
steps:
- worker: "summarize-documents"
input:
kb: "patient-files"
question: "{{workflow.input.query}}"
Output:
"Based on the uploaded lab report and SOAP notes, the patient's HbA1c trend indicates potential Type 2 Diabetes progression."
11. Best Practices
| Area | Recommendation |
|---|---|
| File Naming | Use descriptive names with identifiers (e.g., Patient_123_LabReport_2024.pdf) |
| Knowledge Bases | Separate KBs by clinical domain for precision (e.g., Cardiology, Radiology) |
| Storage Security | Enable file access expiry or signed URLs for sharing |
| Compliance | Use EU/US data residency options for sensitive uploads |
| RAG Optimization | Limit retrieved context chunks to < 2KB for better LLM performance |
12. Example: AI-Powered Research Assistant
Goal: Create a Worker that helps clinicians find the latest diabetes research.
Steps:
- Create a Knowledge Base → upload research PDFs
- Create a Worker → attach that KB
- Ask questions in natural language
// Create research assistant worker
const researchWorker = await client.worker.createWorker({
name: 'research-assistant',
defaultModelName: 'medgemma-27b',
instructions: 'Help clinicians find the latest diabetes research and treatment guidelines.',
toolConfigs: {
tools: [
{
toolName: 'knowledge_base',
config: {
knowledgeBaseId: 'diabetes-research'
}
}
]
}
});
// Query the research assistant
const session = await client.session.createSession({
workerId: researchWorker.data.id,
metadata: { context: 'research query' }
});
const task = await client.task.createTask(session.data.id, {
instructions: 'What are the new GLP-1 therapy guidelines?',
model: 'medgemma-27b'
});
Output:
"According to ADA 2024 guidelines, GLP-1 receptor agonists are recommended as first-line for patients with cardiovascular risk factors."
13. Coming Soon: Hybrid Knowledge Graphs
ByteEngine will soon support FHIR + unstructured knowledge graph linking, allowing automatic relationships between structured EHR data and unstructured notes, e.g.:
[Patient] → [Observation: HbA1c] → [Lab Report PDF] → [KnowledgeBase: Diabetes Guidelines]
Real-World Implementation Examples
Clinical Documentation System
// Complete clinical documentation workflow
const clinicalDocs = {
// 1. Upload patient documents
uploadDocument: async (patientId, documentFile) => {
const file = await client.storage.uploadFile({
file: documentFile,
metadata: {
patient_id: patientId,
type: 'ClinicalDocument',
category: 'progress_notes'
}
});
// 2. Create FHIR DocumentReference
await client.dataStore.initializeFhirStoreClient('main-fhir-server');
const fhirClient = client.dataStore.getFhirStoreClient();
await fhirClient.create({
resource: {
resourceType: 'DocumentReference',
status: 'current',
subject: { reference: `Patient/${patientId}` },
content: [{
attachment: {
url: file.data.url,
title: documentFile.name
}
}]
}
});
return file.data;
},
// 3. Add to knowledge base for AI access
addToKnowledgeBase: async (fileId, knowledgeBaseId) => {
return await client.knowledgeBase.uploadFile(knowledgeBaseId, {
fileId: fileId,
name: 'clinical-document'
});
}
};
Research Paper Analysis
// AI-powered research analysis system
const researchAnalysis = {
// Upload research papers
uploadResearch: async (paperFile) => {
const file = await client.storage.uploadFile({
file: paperFile,
metadata: {
type: 'ResearchPaper',
category: 'diabetes_research'
}
});
// Add to research knowledge base
await client.knowledgeBase.uploadFile('diabetes-research-kb', {
fileId: file.data.id,
name: paperFile.name
});
return file.data;
},
// Query research with AI
queryResearch: async (question) => {
const session = await client.session.createSession({
workerId: 'research-analyst',
metadata: { context: 'research analysis' }
});
const task = await client.task.createTask(session.data.id, {
instructions: `Based on the research papers in the knowledge base, answer: ${question}`,
model: 'medgemma-27b'
});
return task.data;
}
};
Next Steps
- Learn about AI Workers - Create intelligent healthcare agents
- Build RAG-enabled Workflows - Automate knowledge-based processes
- View REST API for File Uploads - Complete API documentation
- Explore Full Example on GitHub - Real-world implementations
- Quick Start Guide - Get started with ByteEngine