-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How one can cache Dataset #425
Comments
Yeah, it's a bit unfortunate. GDAL doesn't allow you to read from a So I think your options are to either:
I should probably ask on the mailing list for clarification, though. |
Starting with GDAL 3.6.0, if the GDAL_NUM_THREADS config option is set, reading in a TIFF/COG file a window of interest that intersects multiple tiles at one will use multithreaded decompression (cf https://github.com/OSGeo/gdal/blob/v3.6.0/NEWS.md), and in GDAL 3.7.0 this was further improved to trigger parallel network requests |
I don't think multi-threaded decoding helps in this case (a tile server), since each request will read a single block if everything is set up properly. But we can't have everything just yet :-). |
Not sure if this could be considered canonical or even acceptable (YMMV), but we have a production tile server written in Axum + use crate::raster::GdalPath;
use crate::Error;
use gdal::Dataset;
use moka::sync::Cache;
use once_cell::sync::Lazy;
use std::ops::Deref;
use std::sync::{Arc, Mutex};
use std::time::Duration;
pub(crate) struct DatasetCache(Cache<GdalPath, Arc<Mutex<Dataset>>>);
static INSTANCE: Lazy<DatasetCache> = Lazy::new(DatasetCache::new);
impl DatasetCache {
fn new() -> Self {
Self(
Cache::builder()
.time_to_idle(Duration::from_secs(3600))
.max_capacity(5)
.build(),
)
}
pub(crate) fn dataset_for(path: &GdalPath) -> crate::Result<Arc<Mutex<Dataset>>> {
let ds = INSTANCE.0.try_get_with(path.clone(), || {
let ds: Result<Dataset> = path.open();
ds.map(|d| Arc::new(Mutex::new(d)))
.map_err(|e| e.to_string())
});
ds.map_err(|e| Error::Unexpected(e.deref().clone()))
}
} |
Isn't the problem that There are shared datasets in GDAL, but we haven't implemented them since they cannot simply be used with all the stuff currently implemented for a dataset. We have done the thread + channel thing that @lnicola mentioned 😆 . EDIT: Was wrong, they are |
Yeah, IIRC shared datasets are actually the opposite of the "open the file multiple times" trick. Instead, you (probably) get a mutex around each access, but end up with better cache utilization.
You can stick them in an |
no, you don't. You just get the same dataset (if calling GDALOpenShared() from the same thread from which the initial one was opened. Otherwise you'll get a different instance) |
Oh, right. Well that's an argument for |
You can't call There would need to be a second type of |
Hello, team,
I have a slippy server that serves Slippy Tiles implemented as HTTP server using gdal-rs. Actual rasters are partitioned in many Cloud Optimized GeoTIFF (COG) files with overviews. On high level, I extract tile information from the request that looks like
/:prefix/:layer/:z/:x/:y
and map it to overview and offset to read from COG. My COG files are stored in S3 and I use vsis3. In the beginning of request I open Dataset, in the end it is implicitly closed because of drop. Interestingly, if I query the same slippy tile twice, the only first request has high latency, the second one is much faster (is it because of VSI cache?):Does it make sense in such scenario to cache the C descriptor of Dataset and reuse it? Or VSI_CACHE_SIZE together with GDAL_CACHEMAX should be enough?
Thank you.
The text was updated successfully, but these errors were encountered: