Install and configure SolrCloud
This is a work in progress!
To start with SolrCloud one needs to download Solr version that supports SolrCloud. The latest releases of Solr can be downloaded from the official Solr resource. It is likely that one will need an older release of Solr rather than the latest. Here’s where you can get older releases of Solr.
This guide is written using Solr 5.2.1 and Windows OS. For Linux based systems reverse slashes and use Linux commands for file management.
Follow the usual drill to install Solr which is unpack the archive of Solr distributive into a local drive. For the sake of simplisity E:\app\solr
will be used as a <SolrRoot>
folder in this article.
Prepare Solr configuration for Sitecore indexes
Every Solr collection is tied to a certain Solr configuration that is uploaded into ZooKeeper that orchestrates file management for SolrCloud instances. Solr distributive comes with ZooKeeper instance baked into it. When running SolrCloud instance locally, there is no need to configure ZooKeeper itself.
Standalone configuration of ZooKeeper (Zk) is described later in this article.
Before collections for Sitecore indexes could be created, one needs to upload Solr configuration with Solr index schema into ZooKeeper. The default Sitecore index schema could be generated using the UI interface in Sitecore application. Refer to Configuring Solr for use with Sitecore 8 for more details on that.
Here are basic steps to prepare Solr configuration for Sitecore indexes.
- Duplicate
<SolrRoot>\server\solr\configsets\basic_configs
folder and give it a configuration name (e.g.sitecore_configs
)Next two steps are optional and should be used if one needs to configure language specific stopwords so that the content parsed properly for those langauges. These steps use
sitecore_core_index
as an example sinceCore
database by default uses several different languages. By default Sitecore usestext_general
field type for all language specific fields inschema.xml
file which uses commonstopwords.txt
file located in theconf
folder of the Solr configuration directory (e.g.sitecore_configs
). - [Optional] Add the following files to
sitecore_configs\conf\lang
folderstoptags_ja.txt
stopwords_da.txt
stopwords_de.txt
stopwords_ja.txt
userdict_ja.txt
stopwords_en.txt
this file should already be in there.These files can be copied from
<SolrRoot>\server\solr\configsets\data_driven_schema_configs\conf\lang
folder.
All these files are used by Solr to parse language specific content properly. SitecoreCore
database by default usesDE
,DA
,JA
andEN
languages which can be configured to use language specific stopwords.
For solutions that have other languages defined, corresponding files must be added to thelang
folder and language field types configured inschema.xml
file.
- [Optional] Configure language specific dynamic fields and field types. Add field types for
DA
,DE
andJA
languages to the `schema.xml’ file generated thru Sitecore app.- add field types for each language. Here is the example for
DA
field type definition: ```XML
Repeat this step for every language that needs to have its own type and refernece proper `stopwords` file. > Field type definitions could be copied from `<SolrRoot>\server\solr\configsets\data_driven_schema_configs\conf\managed-schema` file. - ensure language specific dynamic fields use proper field types. For example: ```XML <dynamicField name="*_t_en" type="text_general" indexed="true" stored="true" /> <dynamicField name="*_t_da" type="text_da" indexed="true" stored="true" /> <dynamicField name="*_t_de" type="text_de" indexed="true" stored="true" /> <dynamicField name="*_t_ja" type="text_ja" indexed="true" stored="true" />
- add field types for each language. Here is the example for
- Replace
schema.xml
file insitecore_configs\conf
folder with the one generated thru Sitecore or modified on steps 2 and 3.
Run SolrCloud instance locally
The easiest way to stand up SolrCloud instance is to run it locally. To do that open any command-line interface, navigate to Solr root folder and run the following command:
bin\solr -e cloud
This will launch an interactive command-line based process to get SolrCloud configured. Simply hitting the <Enter>
thru the process steps will stand up 2 SolrCloud nodes on different ports running on the local machine. There will be a default collection called gettingstarted
that is split into 2 shards with replication factor set to 2.
Local instance of SolrCloud by default gets placed into
<SolrRoot>\example\cloud
folder. Examine that forlder to see how SolrCloud distributes the collection(s) among its nodes.
Upload index configuration into ZooKeeper
Solr 5.2.1 provides ZooKeeper Command-line interface (a.k.a ZkCli) to work with ZooKeeper file system. The CLI is located at <SolrRoot>\server\scripts\cloud-scripts
folder.
Run the following command to upload Solr configuration into ZooKeeper:
zkcli -zkhost localhost:9973 -cmd upconfig -confdir E:\app\solr-5.2.1\server\solr\configsets\sitecore_configs\conf -confname scbasic
Where
localhost:9973
is the server and port Zk runs on which is local in this case.-cmd upconfig
is command to upload configuration into Zk.Run
zkcli --help
to see all Zk commands-confdir <dirPath>
should point to the folder that holdsconf
directory with Solr configuration (e.g.<SolrRoot>\server\solr\configsets\sitecore_configs\conf
).-confname <configurationName>
specifies configuration that will be.
Once configuration is uploaded navigate tohttp://localhost:<port>/solr/#/~cloud?view=tree
URL and expandconfigs
node to see available configurations.
Create collection
Each index collection must be linked to either existing configuration or should provide a path to configuration folder that will be uploaded into Zk and linked to the collection. Here are a few examples how one can create a collection:
- Run this command to create a collection based on existing configuration
bin\solr create -c scindex -n scbasic -shards 2 -replicationFactor 2 -p 8973
create
is command that instructs Solr to create a collection-c
is collection name parameter-n
is configuration name parameter. Thescbasic
configuration was used in this example created in previous paragraph-shards
is the number of shards for the collection-replicationFactor
is the number of replicas for each shard of the collection-p
is the port that SolrCloud instance runs on
- Run this command to upload configuration into Zk and create a collection based on it
bin\solr create -c scitems2 -d E:\app\solr-5.2.1\server\solr\configsets\scbasic_configs -n scitems2 -shards 1 -replicationFactor 1 -p 8973
Where
-d
is path to the folder that holds configuration directory- rest of the parameters are the same as in step 1
Note, when uploading index configuration into Zk using ZkCli, the path in
-confdir
parameter must point to the root of the folder that holdssolrconfig.xml
file (i.e.sitecore_configs\conf
).
When uploading index configuration usingbin\solr create -d <path>
command, the path should either point to the root of directory that containssolrconfig.xml
file (e.g.sitecore_configs\conf
) or to its parent container that contansconf
directory which holdssolrconfig.xml
file (i.e.sitecore_configs
).
Run disdributed SolrCloud instance
Distributed SolrCloud environment consists of ZooKeeper (Zk) ensemble and SolrCloud cluster. Zk is used to orchestrate file management among all SolrCloud nodes. SolrCloud nodes will hold all collections and manage indexing/searching operations. Both Zk and SolrCloud need to have redundancy to provide higher fault tolerance.
It’s not uncommon to share machines to run SolrCloud nodes and Zk instances side by side.
This section describes how to configure Zk ensemble and stand up a SolrCloud cluster.
Configuring ZooKeeper ensemble
Download ZooKeeper from Zk distributive resource.
ZooKeeper 3.4.6 was used for this guide.
Unpack Zk distributive into a folder. For the sake of simplisity E:\app\zk-3.4.6
will be used as a <ZkRoot>
folder in this article.
Follow these steps to stand up Zk ensemble:
- Create
zkData
folder that Zk will use to run its operations (e.g.e:\app\zkData
).If Zk instances share machines with SolrCloud nodes, it’s highly recommended to place Zk and SolrCloud data folders on separate disks for the best performance.
- Rename
zoo_sample.cfg
tozoo.cfg
then modify these configuration settings in the file:dataDir=e:/app/zkData
- Add
server
entries that list all servers that run Zk instances. For example:server1=<server1.IP>:2888:3888
server2=<server2.IP>:2888:3888
server3=<server3.IP>:2888:3888
Rest of settings feel free to keep at default values or tweak as needed.
-
Create
myid
file (no extension) in thezkData
folder (i.e.e:\app\zkData
) and set server id number in that file. For example, onserver1
the file should have1
. Onserver2
it should have2
and so on. - [Optional] Use NSSM tool to configure Zk as a Windows service. Here are the steps to create ZooKeeper as a Win service:
- Download NSSM tool
- Run
nssm install
in WinCMD
- Configure service to use
<ZkRoot>\bin\zkServer.cmd
- Run the following command to start Zk instance manually
<ZkRoot>\bin\zkServer.cmd
On Linux based systems it will be
bin\zkServer.sh
- Repeat steps 1-5 on all machines that run Zk instances.
Configuring SolrCloud with ZooKeeper ensemble
Once Zk ensemble is configured and running, one can add SolrCloud nodes to it. To start SolrCloud instance and link it to existing Zk ensemble, execute the following command:
bin\solr -c -f -z "<zkHost1>:<zkPort>,<zkHost2>:<zkPort>,<zkHost3>:<zkPort>" -m 1g -d <solrNodeHome> -p 8983
Where
-c
starts Solr in SolrCloud mode-f
runs SolrCloud process in foreground. By default it starts in background-z "<zkHost1>:<zkPort>,<zkHost2>:<zkPort>,<zkHost3>:<zkPort>"
specifies ZooKeeper connection string<zkHost>
is the IP address of each Zk instance<zkPort>
is the port number for each Zk instance
-m
memory cap for the SolrCloud instance. For example:-m 500m
allocates 500MB-m 1g
allocates 1GB
-p
port number for SolrCloud instance
Run the command on every server that hosts SolrCloud node to join the cluster.
Once everything up and running one can start uploading configuration into Zk and creating Solr collections.