Nextcloud with Elastic

I would not really considers this as an addition to my Nextcloud stack as it was not suggested by Imapbox documentation. I heard about this few times, but it was not something deemed necessary. However in case of Nextcloud it brings you searching inside of certain documents, like PDFs or ODTs. So that can come in handy.
Basic Elasticsearch docker setup
You need to have Nextcloud running (duh..) and you will need to run Elasticsearch container. I chose bitnami container as it had nice documentation straight on docker hub. It seems to be maintained often and it is not behind the official docker container. I am sure that the official container would also work.
Following docker run command gives you a running Elasticsearch 7 container with the necessary ingest-attachment plug-in with persistent storage.
docker run
-d
--name='elasticsearch'
--net='bridge'
-e TZ="Europe/Budapest"
-e HOST_OS="Unraid"
-e HOST_HOSTNAME="ClearSky"
-e HOST_CONTAINERNAME="elasticsearch"
-e 'ELASTICSEARCH_PLUGINS'='ingest-attachment'
-l net.unraid.docker.managed=dockerman
-l net.unraid.docker.webui='http://[IP]:[PORT:9200]/'
-l net.unraid.docker.icon='https://raw.githubusercontent.com/d8sychain/unraid-ca-templates/master/images/elasticsearch.png'
-p '9200:9200/tcp'
-p '9300:9300/tcp'
-v '/mnt/cache/appdata/elasticsearch':'/bitnami/elasticsearch/data':'rw' 'bitnami/elasticsearch:7'
Version 7 is the only one supported currently (fulltextsearch_elasticsearch - GitHub). Although you can find mention that v8 is also running (fulltextsearch_elasticsearch/issue#240 - GitHub), but I haven't tried that yet. Version 8 has to be supported quite soon as v7 is going to hit EOL soon.
Plug-ins required can be found on wiki page, but it is only the ingest-attachment mentioned above. (fulltextsearch_elasticsearch/wiki - GitHub).
Other setup is not necessary. Some people suggest running Nextcloud and Elasticsearch in the same docker network, but that turned out to be unnecessary. You can access exposed Elasticsearch ports through your server's IP address.
Nextcloud side setup
On Nextcloud side, you will need at least two new apps installed.
Now in your Nextcloud administration go to Full text search and fill in the blanks.
- General
- Search Platform
- Elasticsearch
- Search Platform
- Navigation Icon
- Checked
- Elastic Search
- Address of the Servlet
- http://[ServerIP]:9200
- Index
- elasticsearch-cluster
- Analyzer tokenizer
- standard
- Address of the Servlet
- Files
- Here you can setup how deep you want to go
Index name defaults to elasticsearch-cluster as the bitnami's Elasticsearch documentation mentions.
What is Analyzer Tokenizer? Sounds cool, eh? I am not gonna lie, I had to look it up. Nice summary by Mallikarjuna J S is in this article on medium - What is tokenizer, analyzer and filter in Elasticsearch?. Standard is the way to go or at least for starters.
In files I went mostly with the defaults. So I chose to go through Local Files and to Extract PDF and Extract Office. For external files I chose to Extract Path and Content. I kept the limit for file size to 20 MB.
First fire
Before building some index, it is worth trying if the setup is consistent and ready to go.
occ fulltextsearch:test
If the test ends will all green OKs, then you are ready to go.
Now we have to build the initial index. That can be done by running occ command in the Nextcloud container.
occ fulltextsearch:index
This can take a lot of time. I had roughly 1,6 million of files on my server and it took more than two hours. I had this command running in server side Putty container, so I went with start and forget.
Going through your index
You can go through your Elasticsearch index without Nextcloud. You can use Elasticvue add-on to directly list the index - Elasticvue/Firefox, Elasticvue/Chrome.
If you went with default on the Elasticsearch container, then you just fill in the URI during first start of the Elasticvue and that is it. You click Test Connection and Connect and you are in.
Using all of this
In the search tab of Elasticvue you will see all the indexed data. Now you can try looking for some ODT file for example and you will see all its contents indexed. Find some very specific word and try looking for it in your Nextcloud instance. You should be able to find it.
Also your messages from Imapbox will now get automatically added to your index. So you can search in your old backed up emails with this Nextcloud add-on.
MAY2023: Elasticsearch 8 upgrade
Back your Elastic files before starting the upgrade. The docker run command needs updating a bit. Main changes are:
- The ingest-attachment is now part of the Elasticsearch so we can drop it - https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment.html
- We need to point to version 8
docker run
-d
--name='elasticsearch'
--net='bridge'
-e TZ="Europe/Budapest"
-e HOST_OS="Unraid"
-e HOST_HOSTNAME="ClearSky"
-e HOST_CONTAINERNAME="elasticsearch"
-e 'ELASTICSEARCH_PLUGINS'=''
-l net.unraid.docker.managed=dockerman
-l net.unraid.docker.webui='http://[IP]:[PORT:9200]/'
-l net.unraid.docker.icon='https://raw.githubusercontent.com/d8sychain/unraid-ca-templates/master/images/elasticsearch.png'
-p '9200:9200/tcp'
-p '9300:9300/tcp'
-v '/mnt/cache/appdata/elasticsearch':'/bitnami/elasticsearch/data':'rw' 'bitnami/elasticsearch:8'
The container started fine on me. You can again try it with occ fulltextsearch:test and if you won't get any results while using search in Nextcloud, then I suggest creating the index again.