Skip to content

Commit

Permalink
prep release
Browse files Browse the repository at this point in the history
  • Loading branch information
cboettig committed Sep 6, 2023
1 parent beb62c4 commit 97a6c85
Show file tree
Hide file tree
Showing 6 changed files with 46 additions and 11 deletions.
7 changes: 6 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# duckdbfs 0.0.2

* spatial data query support! See README.md
* duckdbfs now has spatial data query support! Users can leverage spatial
data operations like `st_distance()` and `st_area()` and request return
values as `sf` objects. Supports network-based access too. See README.md

* Added `write_dataset()` which can write to (potentially partitioned) parquet
to local directories or remote (S3) buckets.

* The S3 interface supports `arrow`-compatible URI notation:
- Alternate endpoints can now be passed like so
Expand Down
8 changes: 6 additions & 2 deletions R/write_dataset.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@
#' @param ... additional arguments to [duckdb_s3_config()]
#' @examplesIf interactive()
#' write_dataset(mtcars, tempfile())
#'
#' @return Returns the path, invisibly.
#' @export
#' @examplesIf interactive()
#' write_dataset(mtcars, tempdir())
#'
write_dataset <- function(dataset,
path,
conn = cached_connection(),
Expand Down Expand Up @@ -72,7 +75,8 @@ write_dataset <- function(dataset,
paste0("(", options, ")"), ";")


DBI::dbSendQuery(conn, query)
status <- DBI::dbSendQuery(conn, query)
invisible(path)
}

is_not_remote <- function(x) {
Expand Down
11 changes: 10 additions & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,18 @@ spatial_ex |>

For more details including a complete list of the dozens of spatial operations currently supported and notes on performance and current limitations, see the [duckdb spatial docs](https://github.com/duckdblabs/duckdb_spatial)

## Writing datasets

Like `arrow::write_dataset()`, `duckdbfs::write_dataset()` can write partitioned parquet files to local disks and also directly to an S3 bucket. Partitioned writes should take advantage of threading. Partition variables can be specified explicitly, or any `dplyr` grouping variables will be used by default:

```{r message=FALSE}
mtcars |> group_by(cyl, gear) |> write_dataset(tempfile())
```


## Local files

Of course, `open_dataset()` can also be used with local files. Remember that parquet format is not required, we can read csv files (including multiple and hive-partitioned csv files).
Of course, `open_dataset()` and `write_dataset()` also be used with local files. Remember that parquet format is not required, we can read csv files (including multiple and hive-partitioned csv files).

```{r}
write.csv(mtcars, "mtcars.csv", row.names=FALSE)
Expand Down
22 changes: 17 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ explicitly request `duckdb` join the two schemas. Leave this as default,
``` r
ds <- open_dataset(urls, unify_schemas = TRUE)
ds
#> # Source: table<csnyzuoxobayjyy> [3 x 4]
#> # Database: DuckDB 0.8.1 [unknown@Linux 5.17.15-76051715-generic:R 4.3.1/:memory:]
#> # Source: table<kkkmtknecathhep> [3 x 4]
#> # Database: DuckDB 0.8.1 [unknown@Linux 6.4.6-76060406-generic:R 4.3.1/:memory:]
#> i j x k
#> <int> <int> <chr> <int>
#> 1 42 84 1 NA
Expand Down Expand Up @@ -182,11 +182,23 @@ operations currently supported and notes on performance and current
limitations, see the [duckdb spatial
docs](https://github.com/duckdblabs/duckdb_spatial)

## Writing datasets

Like `arrow::write_dataset()`, `duckdbfs::write_dataset()` can write
partitioned parquet files to local disks and also directly to an S3
bucket. Partitioned writes should take advantage of threading. Partition
variables can be specified explicitly, or any `dplyr` grouping variables
will be used by default:

``` r
mtcars |> group_by(cyl, gear) |> write_dataset(tempfile())
```

## Local files

Of course, `open_dataset()` can also be used with local files. Remember
that parquet format is not required, we can read csv files (including
multiple and hive-partitioned csv files).
Of course, `open_dataset()` and `write_dataset()` also be used with
local files. Remember that parquet format is not required, we can read
csv files (including multiple and hive-partitioned csv files).

``` r
write.csv(mtcars, "mtcars.csv", row.names=FALSE)
Expand Down
3 changes: 1 addition & 2 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
## R CMD check results

0 errors | 0 warnings | 1 note
0 errors | 0 warnings | 0 notes

* Addresses concerns raised by CRAN on initial submission.
6 changes: 6 additions & 0 deletions man/write_dataset.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 97a6c85

Please sign in to comment.