prep release

cboettig · Sep 6, 2023 · 97a6c85 · 97a6c85
1 parent beb62c4
commit 97a6c85
Show file tree

Hide file tree

Showing 6 changed files with 46 additions and 11 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,6 +1,11 @@
 # duckdbfs 0.0.2
 
-* spatial data query support! See README.md
+* duckdbfs now has spatial data query support! Users can leverage spatial
+  data operations like `st_distance()` and `st_area()` and request return
+  values as `sf` objects.  Supports network-based access too.  See README.md
+
+* Added `write_dataset()` which can write to (potentially partitioned) parquet
+  to local directories or remote (S3) buckets.
 
 * The S3 interface supports `arrow`-compatible URI notation:
   - Alternate endpoints can now be passed like so 

diff --git a/R/write_dataset.R b/R/write_dataset.R
@@ -11,8 +11,11 @@
 #' @param ... additional arguments to [duckdb_s3_config()]
 #' @examplesIf interactive()
 #'   write_dataset(mtcars, tempfile())
-#'
+#' @return Returns the path, invisibly.
 #' @export
+#' @examplesIf interactive()
+#' write_dataset(mtcars, tempdir())
+#'
 write_dataset <- function(dataset,
                           path,
                           conn = cached_connection(),
@@ -72,7 +75,8 @@ write_dataset <- function(dataset,
                  paste0("(",  options, ")"), ";")
 
 
-  DBI::dbSendQuery(conn, query)
+  status <- DBI::dbSendQuery(conn, query)
+  invisible(path)
 }
 
 is_not_remote <- function(x) {

diff --git a/README.Rmd b/README.Rmd
@@ -127,9 +127,18 @@ spatial_ex |>
 
 For more details including a complete list of the dozens of spatial operations currently supported and notes on performance and current limitations, see the [duckdb spatial docs](https://github.com/duckdblabs/duckdb_spatial)
 
+## Writing datasets
+
+Like `arrow::write_dataset()`, `duckdbfs::write_dataset()` can write partitioned parquet files to local disks and also directly to an S3 bucket. Partitioned writes should take advantage of threading. Partition variables can be specified explicitly, or any `dplyr` grouping variables will be used by default:
+
+```{r message=FALSE}
+mtcars |> group_by(cyl, gear) |> write_dataset(tempfile())
+```
+
+
 ## Local files
 
-Of course, `open_dataset()` can also be used with local files.  Remember that parquet format is not required, we can read csv files (including multiple and hive-partitioned csv files). 
+Of course, `open_dataset()` and `write_dataset()` also be used with local files.  Remember that parquet format is not required, we can read csv files (including multiple and hive-partitioned csv files). 
 
 ```{r}
 write.csv(mtcars, "mtcars.csv", row.names=FALSE)

diff --git a/README.md b/README.md
@@ -61,8 +61,8 @@ explicitly request `duckdb` join the two schemas. Leave this as default,
 ``` r
 ds <- open_dataset(urls, unify_schemas = TRUE)
 ds
-#> # Source:   table<csnyzuoxobayjyy> [3 x 4]
-#> # Database: DuckDB 0.8.1 [unknown@Linux 5.17.15-76051715-generic:R 4.3.1/:memory:]
+#> # Source:   table<kkkmtknecathhep> [3 x 4]
+#> # Database: DuckDB 0.8.1 [unknown@Linux 6.4.6-76060406-generic:R 4.3.1/:memory:]
 #>       i     j x         k
 #>   <int> <int> <chr> <int>
 #> 1    42    84 1        NA
@@ -182,11 +182,23 @@ operations currently supported and notes on performance and current
 limitations, see the [duckdb spatial
 docs](https://github.com/duckdblabs/duckdb_spatial)
 
+## Writing datasets
+
+Like `arrow::write_dataset()`, `duckdbfs::write_dataset()` can write
+partitioned parquet files to local disks and also directly to an S3
+bucket. Partitioned writes should take advantage of threading. Partition
+variables can be specified explicitly, or any `dplyr` grouping variables
+will be used by default:
+
+``` r
+mtcars |> group_by(cyl, gear) |> write_dataset(tempfile())
+```
+
 ## Local files
 
-Of course, `open_dataset()` can also be used with local files. Remember
-that parquet format is not required, we can read csv files (including
-multiple and hive-partitioned csv files).
+Of course, `open_dataset()` and `write_dataset()` also be used with
+local files. Remember that parquet format is not required, we can read
+csv files (including multiple and hive-partitioned csv files).
 
 ``` r
 write.csv(mtcars, "mtcars.csv", row.names=FALSE)

diff --git a/cran-comments.md b/cran-comments.md
@@ -1,5 +1,4 @@
 ## R CMD check results
 
-0 errors | 0 warnings | 1 note
+0 errors | 0 warnings | 0 notes
 
-* Addresses concerns raised by CRAN on initial submission.
diff --git a/man/write_dataset.Rd b/man/write_dataset.Rd