You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I often save big parquet files with BatchedWriter forcing each row group to contain only a single node_id. When reading those files I use the ParquetAsyncReader to get the row group metadata so that I know which node_ids are in the file. For reference I do this
As it is now I can't (or don't see how) to use the same async_reader to read the data from each row group. Instead I have to do a LazyFrame::scan_parquet with a filter.
The async_reader already has a RowGroupFetcher which has fetch_row_groups but they're not public so can't be used directly.
This would (I think) also alleviate the need to use bigidx since none of my row groups are that big.
The text was updated successfully, but these errors were encountered:
Description
I often save big parquet files with
BatchedWriter
forcing each row group to contain only a singlenode_id
. When reading those files I use theParquetAsyncReader
to get the row group metadata so that I know whichnode_id
s are in the file. For reference I do thisAs it is now I can't (or don't see how) to use the same
async_reader
to read the data from each row group. Instead I have to do aLazyFrame::scan_parquet
with a filter.The async_reader already has a
RowGroupFetcher
which hasfetch_row_groups
but they're not public so can't be used directly.This would (I think) also alleviate the need to use bigidx since none of my row groups are that big.
The text was updated successfully, but these errors were encountered: