How Index Manager file discovery works
This technical document explains in detail how Index Manager tracks file system changes and the processes that are involved during service startup and under normal operation.
The mechanisms that power Index Manager indexing
Four mechanisms are at work to make sure all changes to the file system are recorded.
During system startup, these are the Full Recursive Scan and the Folder List Scan. Under normal operation, File System Notifications and Crawlers keep the index in check. These are described in detail below.
What happens at service startup
When the Index Manager service starts up, two things take priority:
- Any and all changes that have happened while the service was stopped need to be discovered
- Startup should be as fast as possible
To this end, two operations are initiated at startup:
1. Full recursive scan
- Traverses the folder hierarchy recursively from the document folder root and finds all valid files
- Only done when the folder list is empty (Folder and File lists are emptied on Rebuild and Rescan operations)
- May take more than an hour for very large archives
2. Folder list scan
- Discover changes that occurred during the time the service was stopped
- Scan all folders with a modified date that differs from the known date and folders with a date less that 30 minutes old
- Compares content in folder with known files to discover new, changed, and deleted files
- Shares that are temporarily offline are not scanned. No files are removed
- Normally takes 2-10 minutes for very large archives
Index Manager activity during normal operation
During normal operations, there are two main priorities:
- Changes in the file system must be handled as fast as possible
- As a backup, background scanning should pick up any change that has been missed
Two services are used to keep the index updated during operation; File system notifications and Crawlers.
File system notifications
- Provide instant information about changes in the file system
- Use an API function called ReadDirectoryChangesW
- Works only with the SMB protocol
- Watches the whole folder tree from the Document folder root
- All notifications are logged when Debug logging is enabled (be mindful of huge log files and high load)
- Notifications are started before any other operation in order to catch any change during the startup phase
Notifications and NAS
- Notifications generally work well with NAS solutions
- NAS based on older Linux and Samba versions may need a firmware update
- Samba may not support notifications from clients accessing the NAS using other protocols (such as AFP). Usually, the only way to find out is to test.
- It's generally very difficult to tell from data sheets if a storage device supports file system notifications. SMB protocol support is the best indication.
Notifications and test probes
- Every 2 minutes Index Manager writes a file with .probe extension to the root of every document folder.
- If the notification does not arrive the notification system is reset and a folder crawler is started.
Errors and warnings in the logs related to notifications
- Error: “Automatic update could not be set”
A previously working notification could not be set. Caused by share going offline or malfunctioning storage device.
- Warning: “Automatic updates has been reset”
A probe file failed to trigger a notification.
- Warning: “Automatic update unsupported. Background scanning only”
Notifications are not supported on this storage device.
- Warning: “Automatic update supported with small buffer”
Notifications are working with a small buffer and can potentially overflow leading to a loss of notifications. Depends on the storage device.
- Warning: “Update buffers are nearly full”
The buffer is almost full and can potentially overflow leading to a loss of notifications.
- Warning: “Next update offset beyond buffer”
The storage device is generating incorrect notifications.
There are two different crawlers at work during normal operation:
- Background scanning crawler
- Triggered folder crawler
Background scanning crawler
- Enabled for all archives that have "Background scanning" enabled.
- Works in two modes: Normal and Forced. Forced mode is the same as Scan unchanged folders options in the Operations Center Settings app.
- In Normal mode only folders with a modification date different from the known date are scanned.
- In Forced mode, all folders are scanned.
- Forced mode is automatically triggered on a per document folder basis in some situations:
- First scan of existing archives after startup.
- Notifications have been reset after a failed probe.
- A share has been offline and comes back online.
- There is one crawler for all archives. It constantly loops around all enabled document folders with low priority. A full loop may take hours for large configurations.
Triggered folder crawler
- When a new folder is discovered (by notifications) the triggered folder crawler starts scanning the folder.
- When folders are renamed or removed this crawler is sometimes the only method to discover the files. Notifications are only sent for the renamed folder, not for the files contained in it.
- It also works for deleted folders.
Service startup and operation diagram