Skip to content

Improved caching in website #1143

@viroulep

Description

@viroulep

Here is a dedicated issue to continue the discussion started in #1137:

@viroulep


I had a quick look at the cache directory on prod:

~/worldcubeassociation.org/WcaOnRails @production> du -h -d 0 tmp/cache/fragments/
4.8G tmp/cache/fragments/

I don't feel confortable just sending the modification knowing we may multiply this number.
I couldn't find a maximum for the cached content size in the configuration files nor could I find a time where the cache expires.
I think we should modify the production environment to set a maximum to the cache store.
Assuming the eviction policy used is LRU, that would still ensure that "trending" parts of the website stays in the cache.

Reading the Rails doc about caches I noticed we could also have some cache stored in the RAM, but I couldn't find information about how to easily setup a multilevel cache (similar to L1/L2/... in CPUs). If this is actually something doable, having a small memory cache before hitting the filesystem would be a nice optimization to do.
A quick gem searching returns this, maybe that's something to look into (not part of this PR though :p)


@jfly


I couldn't find a maximum for the cached content size in the configuration files nor could I find a time where the cache expires.

The rails caching guide has the following to say about ActiveSupport::Cache::FileStore:

As the cache will grow until the disk is full, it is recommended to periodically clear out old entries.

Dang! Good catch. This is definitely something we need to fix. Is it even possible to specify a max size on our ActiveSupport::Cache::FileStore? My google fu is not being very effective at answering this.


@timhabermaas


Dang! Good catch. This is definitely something we need to fix. Is it even possible to specify a max size on our ActiveSupport::Cache::FileStore?

I don't think so. You probably need to use more sophisticated solutions like Memcached or Redis.


@larspetrus


You can always just delete everything from the cache directory.

So are these cache files accumulated from when the Rails app first was deployed?

I don't see an expiry date in the code. 1-2 weeks would maybe be the right level. Once everyone who went has looked at a competition, there should be very little traffic.

That's assuming the FileCache removes expired files.


@jfly


So are these cache files accumulated from when the Rails app first was deployed?

We spin up a new server semi-regularly, so this is only since the server was last deployed, which looks like 24 days:

~ @kaladin> ssh wca uptime
 02:53:11 up 24 days,  8:11,  0 users,  load average: 0.09, 0.10, 0.13

I don't see an expiry date in the code. 1-2 weeks would maybe be the right level

How do you set a expiry date in the code?


@larspetrus


Wow. That's a lot more data than I expected.

I think you just pass in expires_in: 10.days to the cache() function, but the pages I look at aren't super clear, and I have to leave now.


@timhabermaas


expires_at does indeed delete the cached file (and not just ignore it): handle_expired_entry calls delete_entry of the file store.

So, we could add expires_at: x to the cache calls, but the expired cache pages need to be hit in order to be deleted. Thanks to the auto-expiring nature [1] of the cache keys this won't happen [2], so the file size gains are effectively 0. Besides that most of the cached content (probably > 90%) wouldn't actually need an expiration date since e.g. old competitions are completely static.

Assuming the size of the cache directory is a problem [3], here's what we could do: There's Cache#cleanup which specifically cleans up expired entries. This could be used in some cron-like fashion to clean up old unused cache keys. We will then need to live with response time spikes every 10 days or so, but that's probably fine.

The LRU strategy @viroulep suggested would be a much nicer solution, though, because it will keep frequently visited pages always cached. Sadly, after digging more into the file store, I still think there's no way to achieve this without in-memory stores. We don't happen to run redis anyway?

[1] We could change the cache keys to [competition.id, view], expires_in: 10.days and this problem will go away. But this will lead to stale data and we then have to arbitrarily decide on the expiration date (too high = "why's the competition not yet posted, I uploaded it two days ago!", too low = unnecessary performance hit for all competition pages). Not a fan.
[2] e.g. we won't ever hit the the old cache of competitions which results have been updated since the results_updated_at timestamp is part of the cache key.
[3] The directory should grow linearly with the amount of competitions (and now locales). I haven't done any number crunching, but this might work out fine for quite some time depending on the available disk size?


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions