Saturday, 1 June 2013

Configuring Solr 4.0 to index data from a MySQL database


In a previous post, I documented the steps in setting up Solr 4.0 on a Tomcat 7.0.28 instance. In this post, I'll describe the steps involved in configuring the same Solr instance to index data from a MySQL database.

1.Set up another Solr core.

A Solr core holds the configuration details for connecting to a data store as well as the indexed data of the data store.
The simplest way to set up a core is to copy the collection_1 core and re-name the copy 'test_core'
To make Solr aware of this new core, bring up the Solr Admin console at http://localhost:8080/apache-solr-4.0.0/#/
Select the core-admin option and add core and supply the parameters for name, instanceDir, dataDir, config and schema.

As a result of adding this core, the solr.xml file in the root of the solr_home directory will have the following entry

2. Set up the required libraries

Since to index data from the MySQL database, we will be calling a DataImporter, we will need the Solr libraries. The lib directory is available in your downloaded unzipped version of Solr at c:\apache-solr-4.0.0\dist
Copy all jar files within this directory and transfer them to your solr_home. The solr_home directory should look similar to the screenshot below



I could have transferred these jars to my Tomcat/lib and have solr use them from there but I was worried about conflicts with existing jars.
Finally, tell Solr about this lib directory by modifying the solr.xml file and adding the sharedLib="lib" element.
The solr.xml looks like this.


3. Configure your test_core to talk to the database.

To enable this, you'll need to follow the instructions given on http://wiki.apache.org/solr/DIHQuickStart
To summarise, essentially you will :
(a) Modify the solrconfig.xml and add the defintion for a DataImportHandler


  
    data-config.xml
  


(b) Create a data-config.xml file in the same directory as your solrconfig.xml and specify the database connection parameters


  
  
    
      
      
      
    
  

Ensure that the name values in the element have corresponding matches in the schema.xml
For example : The value 'description' should exist in the schema.xml

Finally, we are all set up to import records from our MySQL test schema.

4. Select the test_core in the Solr Admin console and click on the DataImport option which should be last option available in the menu.
Clicking on the DataImport option will bring up the DataImporter Interface, select the Clean and Commit checkboxes and click on Execute Import.

When the indexing is complete, you will see a message giving the number of documents added or deleted.
To check if the indexes have been created as expected and we are getting some sensible results back, perform a Search using 'Query' option
available under the 'test_core'

In my case, I searched for the term 'Hyundai' as I knew of one record having that particular value in the column that has been indexed.
And we get 1 record returned.



No comments: