Categories

  • Big Data
  • java
  • solr

Tags

  • bigdata
  • java
  • solr

Solr Multithreaded concurrent atomic updates problem:
Solr has few limitations for the data ingestion, as it doesn’t provide row level lock over document.
I face this problem while uploading data in bulk to solr5 in multithread environment and I solved it by solrj client side lock.
When concurrent threads try to make atomic update on a multivalued field of a document at the same time, few threads changes get overridden and it happens because last thread update take sometime to get indexed.

Data ingestion scenario:
There are two tables in RDBMS and I need to denormalize in solr, Steps I was following for atomic/partial document update-
1- Fetch the existing document.
2- Update the single value fields if required and add/set the new values to multivlaued fields.
3- Update the final document back to solr.

e.g.
collection fields- field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> field name="address" type="text_general" indexed="true" stored="true" multiValued="true"/> field name="name" type="text_general" indexed="true" stored="true" /> field name="_version_" type="long" indexed="true" stored="true"/>
[caption id=”attachment_445” align=”aligncenter” width=”300”]img1 img1[/caption]
[caption id=”attachment_446” align=”aligncenter” width=”300”]img2 img2[/caption]

I followed below steps to create client side lock to resolve this problem:

  • Store last recently updates in last recently used set LRUSet.
  • Set maximum number of elements limit in LRUSet
  • if new update present in LRUSet then check whether that document is indexed succesfully or not.
  • if document is indexed then make atomic update set/add to solr else wait current thread until document is indexed in solr successfuly
  • Add or replace new entry in LRUSet.

Please post your comment if you have any queries.