Table of Contents
Overview
SANDI Solr is a comprehensive search and indexing platform built on Apache Solr. It provides separate APIs for searching documents and indexing content with support for multiple document formats, scheduled processing, and advanced search features including semantic search, reranking, and RAG (Retrieval-Augmented Generation).
The system consists of two main components:
- Search API - Handles search queries and returns results
- Index API - Manages document indexing, scheduling, and administration
Search API
The Search API provides powerful search capabilities with support for various query types, result formatting, and advanced features.
Base URL
Endpoints
POST /search
Performs a search query using JSON request body.
Request Body Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
requestId | string | Yes | Unique identifier for the request |
clientId | string | Yes | Client identifier for authentication |
searchQuery | string | No | Main search query text |
pageSize | integer | No | Number of results per page (default: 10, max: 10000) |
pageNumber | integer | No | Page number to retrieve (default: 1) |
filterQuery | string | No | Additional filter query |
resultFields | string | No | Comma-separated list of fields to return |
groupFields | string | No | Fields to group results by |
facetFields | string | No | Fields to generate facets for |
sortFields | string | No | Fields to sort results by |
highlightFields | string | No | Fields to highlight in results |
highlightTags | string | No | Custom highlight tags |
precision | string | No | Search precision level: "high", "medium", "low" |
group | boolean | No | Enable result grouping |
facet | boolean | No | Enable facet generation |
highlight | boolean | No | Enable result highlighting |
exact | boolean | No | Enable exact matching |
legacy | boolean | No | Use legacy search mode |
synonyms | boolean | No | Enable synonym expansion |
dym | boolean | No | Enable "Did You Mean" suggestions |
rerank | boolean | No | Enable result reranking |
rag | boolean | No | Enable RAG (Retrieval-Augmented Generation) |
collapse | boolean | No | Enable result collapsing |
Example Request:
{
"requestId": "2c38e64d-19ce-4db2-acde-8df40edbf447",
"clientId": "TREC_001",
"pageSize": 10,
"pageNumber": 1,
"searchQuery": "Case for \"Samsung Galaxy\" with mirror",
"filterQuery": "",
"resultFields": "_id,id,title,content,score,rscore,_chunks",
"group": false,
"groupFields": "id",
"facet": false,
"facetFields": "id",
"sortFields": "",
"precision": "medium",
"legacy": false,
"rerank": true,
"rag": true,
"synonyms": false,
"dym": true,
"collapse": false,
"exact": false
}
GET /search
Performs a search query using URL parameters. All parameters from the POST endpoint can be passed as URL parameters.
Example Request:
Response Format
Successful Response:
{
"requestId": "2c38e64d-19ce-4db2-acde-8df40edbf447",
"status": "SUCCESS",
"message": null,
"dymQuery": "case for samsung galaxy with mirror",
"ragAnswer": "Based on the search results, here are some cases for Samsung Galaxy phones with mirror features...",
"foundResults": 1247,
"start": 0,
"took": 156,
"results": [
{
"_id": "doc123",
"id": "product_456",
"title": "Samsung Galaxy S24 Mirror Case",
"content": "Premium mirror case for Samsung Galaxy...",
"score": 0.95,
"rscore": 0.87,
"_chunks": ["chunk1", "chunk2"]
}
]
}
Error Response:
{
"requestId": "abc123",
"status": "ERROR",
"message": "Client not found"
}
Search Features
Precision Levels
- high: Most accurate results, slower performance
- medium: Balanced accuracy and performance (default)
- low: Fast results, lower accuracy
Advanced Features
- Reranking: Improves result relevance using ML models
- RAG: Generates answers based on search results
- DYM: Provides query suggestions for typos/misspellings
- Semantic Search: Uses embeddings for contextual matching
Index API
The Index API manages document ingestion, processing, and scheduling of indexing jobs.
Base URL
Indexing Interface
POST /index
Indexes documents directly via API.
Request Body:
{
"requestId": "req123",
"clientId": "CLIENT_001",
"data": [
{
"id": "doc1",
"title": "Document Title",
"content": "Document content...",
"metadata": {
"category": "news",
"date": "2025-01-20"
}
}
]
}
POST /index/json
Indexes JSON documents with flexible schema.
Request Body:
{
"requestId": "req124",
"clientId": "CLIENT_001",
"data": [
{
"title": "Product Review",
"description": "Excellent product...",
"rating": 5,
"tags": ["electronics", "mobile"]
}
]
}
Scheduler Interface
The scheduler manages automated indexing jobs with support for various document sources and formats.
Job Types
| Job Type | Description | Source | File Extensions |
|---|---|---|---|
JSON | Single JSON document per file | File system or URL | .json |
JSONL | JSON Lines format (one JSON per line) | File system or URL | .jsonl |
TXT | Plain text documents | File system or URL | .txt |
TXTL | Text Lines format (one document per line) | File system or URL | .txtl |
EXCEL | Excel spreadsheet documents | File system or URL | .xlsx, .xls |
SITE | Website crawling | URL | Various web formats |
SITEMAP | XML sitemap processing | File system or URL | .xml |
JSONMAP | JSON-based URL mapping | File system or URL | .json |
POST /schedule/index
Schedules an indexing job.
Request Body:
{
"requestId": "schedule123",
"clientId": "CLIENT_001",
"jobType": "JSONL",
"directory": "/sandi/documents/data/",
"fileExtensions": ".jsonl,.json",
"forceReindexing": true,
"scheduledTime": "2025-01-21T10:00:00",
"cron": "0 0 2 * * ?",
"jobId": "daily-import-001"
}
Parameters:
jobType: One of the supported job types (see table above)directory: Source directory or URLfileExtensions: Comma-separated file extensions to processforceReindexing: Whether to reindex existing documentsscheduledTime: When to start the job (ISO format)cron: Optional cron expression for recurring jobsjobId: Optional custom job identifier
Job Management
Job Status Types
SCHEDULED Waiting to runRUNNING Currently executing
COMPLETED Successfully finished
FAILED Encountered errors
CANCELLED Manually cancelled
Cron Expression Examples
"0 0 2 * * ?"- Daily at 2:00 AM"0 30 1 * * MON"- Every Monday at 1:30 AM"0 0 */6 * * ?"- Every 6 hours"0 15 10 * * ?"- Daily at 10:15 AM
Available Endpoints:
- GET /schedule/jobs - Returns all indexing jobs
- GET /schedule/jobs/{jobId} - Returns specific job details
- GET /schedule/jobs/status/{status} - Returns jobs by status
- POST /schedule/jobs/{jobId}/cancel - Cancels a job
- GET /schedule/stats - Returns job statistics
Admin Interface
The admin interface provides management capabilities for clients, collections, and configurations.
Base URL
Client Management
- GET /admin/clients - Lists all clients
- GET /admin/clients/{clientId} - Gets specific client details
- POST /admin/clients - Creates or updates a client
- DELETE /admin/clients/{clientId} - Deletes a client
Create Client Request:
{
"clientId": "NEW_CLIENT",
"name": "Client Name",
"collection": "client_collection",
"active": true,
"createCollection": true,
"configuration": "default_config"
}
Collection Management
- GET /admin/collections - Lists all Solr collections
- GET /admin/collections/{name} - Gets collection details
- POST /admin/collections - Creates a new collection
- DELETE /admin/collections/{name} - Deletes a collection
Configuration Management
- DELETE /admin/configuration/{name} - Deletes a configuration
Error Handling
Common HTTP Status Codes
- 200 OK: Request successful
- 400 Bad Request: Invalid request parameters
- 404 Not Found: Resource not found
- 500 Internal Server Error: Server error
Error Response Format
{
"requestId": "req123",
"status": "ERROR",
"message": "Detailed error description"
}
Common Errors
- Client not found: Invalid clientId
- Client is inactive: Client exists but is disabled
- Invalid page size: pageSize exceeds maximum (10000)
- RequestId is required: Missing requestId parameter
- Invalid job type: Unsupported jobType in scheduler
Best Practices
Search API
- Always include a unique
requestIdfor tracking - Use appropriate
pageSizevalues (10-100 for UI, larger for batch) - Enable
rerankfor better relevance on important queries - Use
precision: "medium"for balanced performance - Implement proper error handling for client validation
Index API
- Batch documents when possible (up to 1000 per request)
- Use appropriate job types for your data format
- Schedule heavy indexing during off-peak hours
- Monitor job status and handle failures gracefully
- Use
forceReindexing: falsefor incremental updates
Performance
- Cache frequently accessed search results
- Use filters for better query performance
- Limit result fields to only what's needed
- Consider pagination for large result sets
- Monitor job queue length and processing times