GitHub

Source Management

Complete CRUD operations for managing data sources within knowledge graphs. Provides KG-centric source management with complete artifact tracking. Track ingestion status, view statistics, update sources, and delete data with comprehensive artifact cleanup. All operations interface with SQLHandler and Qdrant for data management.

List All Knowledge Graphs with Sources

Get an overview of all KGs with source counts and artifact information.

GET /api/v1/sources/kgs

curl -X GET "https://your-api.com/api/v1/sources/kgs?include_artifacts=true" \
-H "X-API-Key: your-api-key"

Query Parameters

ParameterTypeDescription
include_artifactsbooleanInclude visualization and storage file info (default: true)

Response

{
"kgs": [
{
"kg_name": "KG_Universal",
"sources_count": 15,
"collections": ["chunks", "entities", "relations"],
"chunks_count": 2341,
"entities_count": 1205,
"relations_count": 2876,
"has_visualizations": true,
"has_storage_files": true,
"visualization_count": 4,
"storage_file_types": ["gexf", "graphml", "json"]
}
],
"total_count": 1
}

List Sources for a Knowledge Graph

Get all data sources for a specific knowledge graph.

GET

/api/v1/sources/kg/{kg_name}

Path Parameters

ParameterTypeDescription
kg_namestringKnowledge graph name

Query Parameters

ParameterTypeDescription
include_statsbooleanInclude per-source statistics (default: true)

Get Source Statistics

Retrieve detailed statistics for a specific source.

GET /api/v1/sources/{source_id}/stats

curl -X GET "https://your-api.com/api/v1/sources/src_123/stats?include_vector_stats=true" \
-H "X-API-Key: your-api-key"

Query Parameters

ParameterTypeDescription
include_vector_statsbooleanInclude vector database statistics (default: true)

Response

{
"source_id": "src_123",
"source_name": "WHO Diabetes Fact Sheet",
"kg_name": "KG_Universal",
"chunks_count": 89,
"entities_count": 245,
"relations_count": 512,
"vector_points_count": 89,
"entity_type_distribution": {
"Disease": 45,
"Symptom": 67,
"Treatment": 89,
"Risk_Factor": 44
},
"relation_type_distribution": {
"CAUSES": 123,
"TREATS": 156,
"SYMPTOM_OF": 145,
"RISK_FOR": 88
},
"avg_chunk_size": 512,
"total_text_length": 45678,
"created_at": "2024-01-15T10:00:00Z",
"last_accessed": "2024-01-15T14:30:00Z"
}
DELETE

/api/v1/sources/{kg_name}/{source_id}

Delete a source and all associated data (implements "uncheck PDF" functionality). This removes chunks, entities, relations, vector embeddings, and optionally visualizations and KG files.

Path Parameters

ParameterTypeDescription
kg_namestringKnowledge graph name
source_idstringSource identifier

Query Parameters

ParameterTypeDefaultDescription
delete_from_vector_dbbooleantrueDelete vector embeddings from Qdrant
delete_visualizationsbooleantrueDelete visualizations if this is the last source
delete_kg_filesbooleanfalseDelete KG export files if this is the last source
forcebooleanfalseForce deletion (bypass safety checks)

Response

{
"success": true,
"source_id": "src_123",
"kg_name": "KG_Universal",
"chunks_deleted": 89,
"entities_deleted": 245,
"relations_deleted": 512,
"vector_points_deleted": 89,
"visualizations_deleted": false,
"kg_files_deleted": false,
"message": "Source 'WHO Diabetes Fact Sheet' deleted successfully from KG_Universal",
"timestamp": "2024-01-15T14:30:00Z"
}
PUT

/api/v1/sources/{kg_name}/{source_id}

Update source metadata or trigger re-ingestion with content change detection.

Request Body

FieldTypeDescription
force_reprocessbooleanForce re-ingestion even if content unchanged (default: false)
update_metadataobjectAdditional metadata to merge with existing metadata

Response

{
"success": true,
"source_id": "src_123",
"kg_name": "KG_Universal",
"changes_detected": true,
"job_id": "job_789",
"chunks_updated": 12,
"entities_updated": 34,
"relations_updated": 56,
"message": "Source content changed. Re-ingestion started.",
"timestamp": "2024-01-15T14:30:00Z"
}

Check Source Management Health

Verify the source management system is operational.

GET /api/v1/sources/health

curl -X GET "https://your-api.com/api/v1/sources/health" \
-H "X-API-Key: your-api-key"

Response

{
"status": "healthy",
"database_connection": true,
"vector_db_connection": true,
"tracked_sources_count": 42,
"active_sources_count": 40,
"failed_sources_count": 2,
"timestamp": "2024-01-15T14:30:00Z"
}

Best Practices

  • Regular cleanup: Periodically review and delete unused sources
  • Monitor status: Track source ingestion status for data quality
  • Use statistics: Enable stats to understand source content
  • Batch operations: Use bulk endpoints for multiple sources

Frontend Integration

React - Source Management Component

import { useState, useEffect } from 'react';
function SourceManager({ kgName }) {
const [sources, setSources] = useState([]);
const [loading, setLoading] = useState(true);
useEffect(() => {
loadSources();
}, [kgName]);
const loadSources = async () => {
const response = await fetch(
`https://your-api.com/api/v1/sources/kg/${kgName}`,
{
headers: { 'X-API-Key': process.env.API_KEY }
}
);
const data = await response.json();
setSources(data.sources);
setLoading(false);
};
const handleDelete = async (sourceId) => {
if (!confirm('Delete this source?')) return;
await fetch(
`https://your-api.com/api/v1/sources/${kgName}/${sourceId}?force=true`,
{
method: 'DELETE',
headers: { 'X-API-Key': process.env.API_KEY }
}
);
loadSources(); // Refresh list
};
if (loading) return <div>Loading...</div>;
return (
<div>
<h2>Sources in {kgName}</h2>
<table>
<thead>
<tr>
<th>Name</th>
<th>Chunks</th>
<th>Entities</th>
<th>Actions</th>
</tr>
</thead>
<tbody>
{sources.map(source => (
<tr key={source.id}>
<td>{source.source_name}</td>
<td>{source.chunks_count}</td>
<td>{source.entities_count}</td>
<td>
<button onClick={() => handleDelete(source.id)}>
Delete
</button>
</td>
</tr>
))}
</tbody>
</table>
</div>
);
}

Error Responses

Source Not Found

{
"detail": "Source 'src_123' not found in knowledge graph 'KG_Universal'",
"status_code": 404
}

Deletion Failed

{
"detail": "Cannot delete source: database connection failed",
"status_code": 500
}

Best Practices

  • Confirmation: Always confirm before deleting sources
  • Statistics: Check statistics before deletion to understand impact
  • Batch operations: Use batch processing for multiple sources
  • Force flag: Use force=true only when necessary
  • Health checks: Monitor system health before bulk operations
  • Metadata: Keep metadata updated for better source tracking
  • Re-ingestion: Use update endpoint for content changes, not delete+create