Solr Multithreaded concurrent atomic updates problem:
Solr has few limitations for the data ingestion, as it doesn’t provide row level lock over document.
I face this problem while uploading data in bulk to solr5 in multithread environment and I solved it by solrj client side lock.
When concurrent threads try to make atomic update on a multivalued field of a document at the same time, few threads changes get overridden and it happens because last thread update take sometime to get indexed.
Data ingestion scenario:
There are two tables in RDBMS and I need to denormalize in solr, Steps I was following for atomic/partial document update-
1- Fetch the existing document.
2- Update the single value fields if required and add/set the new values to multivlaued fields.
3- Update the final document back to solr.
field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
field name="address" type="text_general" indexed="true" stored="true" multiValued="true"/>
field name="name" type="text_general" indexed="true" stored="true" />
field name="_version_" type="long" indexed="true" stored="true"/>
[caption id=”attachment_445” align=”aligncenter” width=”300”] img1[/caption]
[caption id=”attachment_446” align=”aligncenter” width=”300”] img2[/caption]
I followed below steps to create client side lock to resolve this problem:
- Store last recently updates in last recently used set LRUSet.
- Set maximum number of elements limit in LRUSet
- if new update present in LRUSet then check whether that document is indexed succesfully or not.
- if document is indexed then make atomic update set/add to solr else wait current thread until document is indexed in solr successfuly
- Add or replace new entry in LRUSet.
Please post your comment if you have any queries.