Synchronous read model updates

Coordinator
Nov 19, 2011 at 9:42 PM

In the Sponsorship application, we really need a synchronous update of the read models. I.e., when a user clicks on "Add sponsor" and then views a list of sponsors, that sponsor must be on the list. (Anything else, IMO, is not going to work for this application.) To achieve this, the sample UI in the current code uses "Thread.Sleep (500)", but I think this is a really big wart.

So, how can we achieve a synchronous read model update? I.e., how can we make sure that _bus.Send (command) won't return until the read model (including all RavenDb indexes) has been updated?

Coordinator
Nov 20, 2011 at 8:57 AM

I'm not really sure, but I think the problem here is the way RavenDB works. As far as I know when querying data that has changed moments ago, there is a big chance that the index has not updated / adapted to the changes.
Indexing seems to be done in a seperate asynchronous process. 

 

 

Coordinator
Nov 20, 2011 at 9:04 AM

Maybe the WaitForNonStaleResults() method is the solution:

http://ravendb.net/documentation/client-api/querying

Coordinator
Nov 20, 2011 at 5:12 PM

The docs for WaitForNonStaleResults() look promising (although it seems to be recommended to use WaitForNonStaleResultsAsOfNow instead), and it's much better than Thread.Sleep (500).

(IMO, this is one place where the RavenDB philosophy doesn't really align ideally with a CRUD-style application.)

Coordinator
Nov 20, 2011 at 6:58 PM

What are the arguments against Thread.Sleep(500) ?

Most of the applications are working with stale data anyway. Even with a synchronous DB call you have the network delay.

If you REALLY want to do it in an synchronous way you can update the read DB in the same transaction as the command processing. The "disadvantage" will be that the EventStore have to be in the same DB as the read side.

Coordinator
Nov 20, 2011 at 7:00 PM

If waiting for 500ms is too long use Thread.Sleep(200) -> show a "please wait while we process your data" screen for 200ms

Or: use another UI design to show the data.

Coordinator
Nov 20, 2011 at 7:53 PM

What you say is true for asynchronous command execution and event dispatching, but aren't we using a synchronous model?

 

Coordinator
Nov 20, 2011 at 7:56 PM

Sorry, didn't see the comment before your last one.

Coordinator
Nov 21, 2011 at 8:42 AM

About applications working with stale data: the problem is only the data that the user has entered. When the application allows the user to enter data and afterwards displays a list of data items, it must display the data the user has just entered; otherwise, the user will believe his changes to be lost. (That's only my opinion, of course, but I believe most users would not understand why the data can't be displayed.)

Ways to approach this are:

  • Waiting until the user's data has been processed in the asynchronous index updates before displaying the list of data items. (E.g., by using Thread.Sleep, any other Wait call, or by building the UI in a way that ensures the DB has enough time; see below.)
  • "Faking" the results in the UI, including any changes made by the user no matter what the repository says.
  • Switching to a synchronous update.

About the waiting approach: Why do I think Thread.Sleep, with any hard-coded waiting time, is really just a hack in this situation?

  • It's not deterministic. Nobody can tell in advance for exactly how long the application needs to wait. When the server is fast, 200 ms (or any hard-coded value, 500 ms even more so) will slow down the app unnecessarily. When the server is slow, 200 ms might not be enough. When it's not enough, we (= Martin) will get reports of the application not working correctly.
  • It's not intentional code: we don't want to wait for 200 milliseconds. We want to wait until we can display the effects of the command data we just sent to the bus.
  • There are better ways to solve the problem using waiting: for example, we can use WaitForNonStaleResultsAsOfNow. However, I wouldn't want to build this into every read model query displaying the list because usually I don't really care about stale data from other users; I just care about the data from this user. Maybe we could call WaitForNonStaleResultsAsOfNow from the event handlers? If the event handlers are executed synchronously to the application logic (is that currently so?), that would cause each subsequent query against the read model to include the changed data.

About using another UI approach to solve the waiting - we haven't designed the UI yet, so yes, maybe we could get around the problem by designing the UI in a way that gives RavenDB enough time to update its indexes. However, I wouldn't want to design the UI in a way that slows down the user artificially (i.e., build in an intermediate screen or never navigating the user from an entry form to a list form) just for technical reasons. In my opinion, efficiency of user interactions (e.g., elimination of clicks, presenting as much information in context as possible, etc.) should have highest priority. (And also, the UI design approach is not deterministic either because we can't predict how long RavendDB is going to take.)

About faking the read results in the view: I believe that this is the most difficult solution. We'd need to perform some sort of merge against the read model queries, and unless there's already an infrastructure for this available, it's probably out of scope for now.

About switching to synchronous updates: In general, I feel we need to work around an issue caused by a feature we don't need. RavenDB's approach is to be as lazy as possible with its index updates because this is the best approach for responsiveness and scalability of applications with a very large number of concurrent users. For such applications, asynchronously updated indexes work great, much better than, e.g., a SQL Server-style index that is synchronously updated. However, we don't need to write a (highly) scalable application with short request times. We need to write a responsive data management application for a very limited number of users (multiply 4-5 with 100 - still shouldn't be a performance problem with a synchronous approach). In my opinion, the ideal way would be to switch RavenDB to a synchronous index update for this application if possible. Is this possible?

@Jörg: You're writing about updating the read DB in the same transaction as the command processing. How would we change the application to do that? And would that really solve the problem, aren't RavenDB's index updates always asynchronous?

Coordinator
Nov 21, 2011 at 9:12 AM

Faking the UI: not necessary for this application. It adds complexity.

Updating the UI synchronous: still not sure if you would like it because we are all familiar with this. Problem: index updates of RavenDB are still asynchronous (-> you can change it: http://www.gamlor.info/wordpress/2011/07/ravendb-queries-and-indexes/ )

I think there is more a mental problem than a technical. RavenDb updates the read index in just a few milliseconds. So Thread.Sleep(200) is far more than RavenDB will ever need even on very, very slow machines.

Why I'm pro Thread.Sleep() -> one line of code, easy to use and everybody will understand it -> + you can show a short waiting screen
-> it's enough for this small app
-> not over engineered -> I know I'm the guy who introduced CQRS ;-)

If you want to do it a little bit more complicated: write a RavenDB event listener and send a notification to the UI when the RavenDB index has been updated.

The RavenDB team has measured the index update time:
>> "Completed in 61 seconds, just a tad over 250 documents / second. Considering everything that is involved in indexing, just 4 ms per index seems pretty good."
http://ravendb.net/documentation/performance/indexing

Coordinator
Nov 21, 2011 at 9:25 AM
Edited Nov 21, 2011 at 9:25 AM

Jörg: Yes, you can use Thread.Sleep, and yes, it will probably work. It's still a hack IMO, for the reasons outlined above.

(Look at it this way: instead of Thread.Sleep you could also calculate a few thousand digits of Pi. That would also work, but you probably wouldn't do it since it's completely non-intentional code, and it's not guaranteed to work (with faster processors, etc.). The same is why I feel Thread.Sleep isn't a good thing...)

I really like the "ConsistencyOptions.QueryYourWrites" option discussed in the blog post you linked to. This is exactly what we need: query our own writes. It's also one line. And it communicates what we want to do, everyone reading it will understand why it is there.

BTW, the blog post makes exactly the same argument as I do:

Let’s look at an example. We build a simple blog / news site. Now let’s think about the consistency here. On the public website stale results shouldn’t be an issue. Why? Because you as a visitor can’t tell the difference between the ‘super-latest’ articles and the one published a few seconds ago. When an article shows up a few seconds later it makes no difference to you. Of course we also have an ‘administration’ backend. For the website administrator which is editing articles the story is different. He has just edited an article and wants to see his changes immediately. If he would encounter a stale content he would probably think that his changes are lost. What’s the conclusion? Well we use the ‘QueryYourWrites’ consistency for administration-backend, while we allow stale results on the public website.

Coordinator
Nov 21, 2011 at 9:48 AM

When updating the read side in the same transaction then updating the write side (= synchronous) "ConsistencyOptions.QueryYourWrites" will definitely be a good option as you mentioned above.

Why I don't want to synchronously update the read and write side in my production systems: I loose options.
I want to have the option to use the database that fits best for the feature/module... and I don't want to put the EventStore in the same DB as the data for the read side. Even when using a relational DB as SQL Server I normally put the EventStore in a seperate DB. Sometimes there are other event handlers that don't update a database but write a report to the file system, generate CSV files,...
And sometimes there are other systems that you have to integrate with and I don't want to use two phase commits.

But: to get started with CQRS it's easier to use the synchronous way until you get the mental switch.

Coordinator
Nov 21, 2011 at 11:20 AM

I think you just have to weigh your (potential and actual) requirements against each other. In this case, asynchronous updates make it harder for us to fulfill the actual requirement of "show me my edits after I've made them". Synchronous updates make it harder for us to do something else that we'll probably not need. So this app seems to be a good use case for synchronous updates.

But there's still one thing unclear for me: why do synchronous updates require the EventStore and the read side to be updated within the same transaction? For me, "synchronous" just means that the command handler blocks until the event handlers have run (in my mental model, IRepository.Save just executes all event handlers on the calling thread), and the read queries block until the respective index has processed all changes ("QueryYourWrites").  I'd think that for this, it's not necessary that command handler and event handlers use the same transaction. In this scenario, we don't lose the possibility of having the event store be in a different DB than the read side.

What am I missing?

Coordinator
Nov 21, 2011 at 2:30 PM

+1 for ConsistencyOptions.QueryYourWrites

Coordinator
Nov 24, 2011 at 9:36 AM

For this to work, the event store and the read repository need to use the same DocumentStore (I think). How would one configure the DocumentStore used by the event store?

Coordinator
Nov 24, 2011 at 9:37 AM

Sorry, that was stupid - only the event handlers need to use the same event store. Never mind.