I'm wrting a simple Tornado REST webserver - when queried, will return the results from a cached dictionary.
The cached dictionary may need to be referenced a couple of times for one client call. Consequently, I'm trying to tackle the issue of refreshing the cached dictionary, in a thread safe manner.
Cached dictionary will hold around 1000 items.
On each client call the returned data will be:
- always include results from cached_dict['ALL'].
- and include additional results from cached_dict[<PASSED BY REST REF>]
- each cached item would be a tuple
So two references to the cache. However, there's a danger the cache could be refreshed / switched between the two calls.
Initial approach - locking
The cached dictionary will be refreshed periodically. I plan to launch a timer thread, which when expires will query the database, populate a temporary dictionary, then in one atomic instruction, re-assign the cached dictionary with the newly populated one. During the re-assign, a lock will be used, with the same lock surrounding the two references to the cached dictionary in the client's call.
This should work, however, I'm concerned about performance of checking the lock on every client call, but maybe only refreshing the cache every ten minutes. Is locking that expensive? As far as I can see, the general thinking is - if you can find a way of not using locks, then use that.
Second approach - re-reorganising cache structure
Alternatively, I could reorganise the cache, so each dictionary item contains a list of 'ALL' and <PASSED BY REST REF> list. Consequently, each client call would only require a single query on the cache which would be atomic, so no explicit locking be required.
However, I now duplicate the 'ALL' list of items in every entry of the cache dictionary. However, if each item is a tuple, despite the same tuple X being added to every element of the cache, I believe it would be a reference to the same item? I.e. no additional memory would be used? Is that correct? I.e.:
- Code: Select all
all_items = [list of 600 tuples]
for entry in database_entries:
cache[entry.name] = list( all_items )
cache[entry.name].append( additional_items_specific_to this_entry )
Now does the cache hold references to each item of 'all_items' - so there's very little memory overhead, or is there now a new instance of every item 'all_items', for each entry of the cache?
If references are being used (I.e. little memory overhead), then this may be the better approach.
I'd be very interested in your throughts, to help me understand the best way to approach this.
All the best.