Skip to content
This repository has been archived by the owner on May 15, 2020. It is now read-only.

Latest commit

 

History

History
90 lines (74 loc) · 6.41 KB

README.md

File metadata and controls

90 lines (74 loc) · 6.41 KB

Document Upload Service

This is a basic document upload service. It was written as part of a job interview that unfortunately did not pan out.

The service allows uploading, listing (primitive search), download and deletion of documents. The following endpoints are defined by the API:

Endpoint Verb Description
api/v1/documents GET Lists all documents (paginaated) with basic search capabilities
api/v1/documents/id GET Gets a document by it's identifier
api/v1/documents/id/download GET Downloads a document
api/v1/documents DELETE Delete a document by it's identifier
api/v1/documents POST Upload a new document into the system
swagger N/A Swagger endpoint for testing

Architecture and Features

The project is split into the following components:

Component Description
DocumentUpload.Core Shared interfaces, enum, classes and utilities
DocumentUpload.Services Business-layer implementation of services
DocumentUpload.API.Contracts API Contracts/models (versioned)
DocumentUpload.API Main application / API
DocumentUpload.SQL SQL Server Data Project (DACPAC)
DocumentUpload.API.Tests API Endpoint Tests / Utility Tests
DocumentUpload.Services.Tests Business-layer tests

The database schema is basic: a table for storing document information and a second table for storing the content as VARBINARY(MAX). Queries against the Documents table will not be impacted by the storage of binary data. This split-out will also make it easier to allow versioned document contents as well as provides the opportunity to store data in alternate locations (file based, Azure Blob storage, etc). Access to the database is performed using a repository-type pattern with Dapper. A generic repository (ie. IRepository<T> ) was not used partially due to the simplistic nature of the project, but also due to the denormalized storage (Contents is not part of the DocumentDetails model but is rather retrieved by a GetContent method on the IDocumentRepository interface). The generic repository was also avoided as it fits better when the backing store can offer an IQueryable (which is not included here).

The DocumentRepository allows optionally specifying Page/Size. This means that direct usage of the repository allows querying for all results. It would have been an unfair/illogical limitation placed on the repository itself. To disallow users from requesting the entire listing of documents (which could grow to millions!), the GET[All] endpoint of the API enforces a maximum page size (currently hard-coded). The endpoint accepts optional pageSize and pageNumber query parameters; the total number of available documents is returned in the response via the X-Pagination-Count header with first/last/next/previous URIs provided via the Link header.

When files are uploaded, the user must specify a Title for the document. This information is required to be unique (for the user) and is not impacted by/related to the file name. Duplicate file names are allowed. The user may specify an optional Description for their document. If no description is specified, the implementation delegates to an IDescriptionGenerator (retrieved based on document type from an IDescriptionGeneratorFactory) which will inspect the file and provide a short description (image dimensions, first line of text, etc). During upload,
the IFileValidator is used to verify the file falls within the current constraints of the system (see below).

The IFileTypeInfoProvider interface describes a set of services that can provide meta-information about a specified file/document. When files are downloaded, this interface is queryed for the appropriate MIME Type/content-type headers. When uploading, an implementation of this interface translates a file name into the appropriate DocumentType (image, text, PDF, unknown).

Limitations

The system currently only allows a specific set of file extensions to be uploaded and files must be smaller than a certain size. These settings are configurable via the appsettings.json file. The system can be side-stepped by renaming the local file (todo-list item).

Document titles must be unique per-user; uploading a second version of the file currently requires the user to specifiy an alternate name. Duplicate titles are not currently guarded against directly; instead a SQLException with Error Number 2627 is caught, translating insertion failures to a value of 0 (Insert returns the new identity). This was chosen due to the simplistic nature of the task. The project can/should be updated to query for duplicates first; alternatively we can remove the limitation around duplicates (TODO list)

TODO

  • More Tests!
  • Support File Editing
    • Versioning of files
  • User/Authorization-based access
    • Row-Level Security on tables
    • Get, GetAll, Delete and Download limited by user-access
    • Owner should be determined by Authorization, not manually part of the payload
  • Improved File Handling
    • File Compression
    • Stream Large files
  • Better duplicate handling
    • Remove restriction or query for duplicates first
  • Use AutoMapper instead of manual mapping
  • Support more file types
    • Dynamically add new types
    • Support classification-determination by type
  • Alternate-storage mechanisms
    • Azure blob storage, etc.
  • Composable Validation
    • Rewrite validator to return a Validation-type object
    • Support different types of validation based on document type
    • Validate by file header/magic bytes rather than extension-based
  • Tagging/Categories
  • More Search options
    • Title
    • Category (future)
    • Tags (future)
  • Better SQL Generation
    • Switch to DapperContrib ?
    • Provide an IQueryable<T> and query-translation