Creating and Managing Knowledge Databases
A knowledge database in Alan can be created and managed in the settings under "Knowledge Databases".
There, you'll find an overview of all existing knowledge databases that you have created or that you can edit.
When clicking on a knowledge database, you see and edit all further information about this database, such as its description and the included files.
Creating
To create a new knowledge database, click on the "New" button and select the type of knowledge database you want to create. To upload documents, select the "file-based" option.
Define a title and description for the database. These two fields will later help you select a suitable knowledge database for chatting and contextualize Alan's source references.
You can also select files of different file types for the knowledge database. After creating the knowledge database, these documents will be uploaded and indexed to make the contained knowledge available to Alan.
INFO
Note that not all file types are equally suitable for use in Alan. An overview of supported file formats and hints on their suitability can be found here.
Finally, click on "Save" to create the knowledge database.
Editing
After creating a knowledge database, you can continuously edit and optimize it.
For example, you can add new files via the upload field or remove existing files via the "X" symbol next to the respective file name.
After creating the knowledge database, you can also share it.
INFO
Files that you upload to shared knowledge databases are available to all users who have access to that knowledge database. Please respect your organization's data privacy regulations.
Indexing Status
You can view the overall indexing progress via the status display of the knowledge database.
The indexing status of a specific file can be tracked in the file overview.
Status | Description |
---|---|
To upload | File is marked for upload and will be uploaded after saving |
Uploading | File is being uploaded |
Indexing | File is being indexed |
Indexed | File is ready for chatting |
Error | Error occurred while indexing the file, please re-upload |
Deleting | File is being deleted |
Indexing Pipeline
The indexing pipeline is one of the core components of Alan. Through a highly sophisticated processing pipeline, uploaded files are analyzed, and the information contained therein is extracted and prepared for use in knowledge databases. File contents are automatically broken down into coherent, semantically consistent blocks, recognizing and intelligently considering page breaks, text blocks, paragraphs, lists, tables, headings, and other structured elements. Various machine learning techniques and models are used to maximize the quality of the extracted information.
Extracted and prepared blocks are referred to as "chunks". These chunks are embedded into a high-dimensional vector space by an embedding model and stored in a vector database in a form that is understandable and retrievable by Alan, among other things. The size and content of the chunks are chosen to fit both the currently used embedding model and the used large language model (LLM).
By default, chunking works intelligently and reliably. However, for custom use cases, it is possible to use JSONL files to gain control over chunking. Details on this are explained here.
Deleting
To delete a knowledge database in Alan, click on "Delete knowledge database".
Note that you can only delete knowledge databases that you have created yourself, and that deleted knowledge databases cannot be restored.