Generate XLSX from schemapack (GSI-676) #144

lkuchenb · 2024-03-15T16:37:32Z

This PR is a refactoring of the XLSX generator script to use schemapack instead of LinkML. Other than that the behaviour changes as follows:

The script no longer generates multiple output files but only one (full) submission XLSX
The order of the columns is now ID, relations, content properties. Originally the LinkML spec offered full control over the column ordering.

sbilge

Should we adhere to the practice of including docstrings for every function and class, or is it acceptable to omit them when the function is self-explanatory?

In addition, there are some inline comments and suggestions. Furthermore, I've included the submission.schemapack.yaml file in the repository to resolve failing checks, except for the linkml-specific schema-linter. We may consider removing the schema-linter, as it's no longer relevant. Or replace it with a schemapack-specific linter. However, I've implemented a temporary solution to utilize only the linkml schema, so keeping it shouldn't pose any issues for this PR.

requirements.txt

sbilge · 2024-04-09T13:12:38Z

scripts/generate_xlsx.py

 from openpyxl.utils import get_column_letter
-from pydantic import BaseModel, root_validator
-import yaml
+from pydantic import BaseModel


Suggested change

from pydantic import BaseModel

from pydantic import BaseModel, Field

sbilge · 2024-04-09T13:14:27Z

scripts/generate_xlsx.py

+    """A XLSX generator config"""
+
+    output_filename: str
+    styles: Optional[dict[str, WorksheetStyle]] = {}


Suggested change

styles: Optional[dict[str, WorksheetStyle]] = {}

styles: dict[str, WorksheetStyle] = Field(default_factory=dict)

I agree with the obsolete Optional but replace the plain default with a Field() and a factory?

scripts/generate_xlsx.py

sbilge · 2024-04-09T13:47:11Z

scripts/generate_xlsx.py

+    header_color: Optional[str] = None
+    content_color: Optional[str] = None


Header and content color is defined as the attributes of the WorksheetStyle class previously. Maybe this would be better:

Suggested change

header_color: Optional[str] = None

content_color: Optional[str] = None

style: Optional[WorksheetStyle] = None

sbilge · 2024-04-09T14:00:17Z

scripts/generate_xlsx.py

+class Column:
+    name: str
+    description: Optional[str]
+    type: str


Column type can be an enum, by that we would actually have control over the vocabulary. Smt like:
class ValueType(Enum): ARRAY = "array" ENUM = "enum" STRING = "string"

sbilge · 2024-04-09T14:01:48Z

scripts/generate_xlsx.py

+def _get_element_type_from_schema(schema: dict) -> str:
+    """Get the type of the elements in an array from a JSON schema."""
+    element_type = schema.get("type", "object")
+    if element_type == "array":


If you accept the previous enum idea:

Suggested change

if element_type == "array":

if element_type == ValueType.ARRAY.value:

scripts/generate_xlsx.py

sbilge · 2024-04-09T14:48:50Z

scripts/generate_xlsx.py

+
+        cols = sheet.columns
+        rows = []
+        rows.append([Cell(wb_sheet, value=col.name) for col in cols])


I get type error type error from all the Cell related operations. Shall we suppress the with # type: ignore? I do not know if there is a way to suppress all the Cell related type errors through out this script.

Co-authored-by: sbilge <[email protected]>

lkuchenb added 4 commits March 15, 2024 16:25

Update requirements

215bf54

Refactor XLSX generator to read a schemapack

b0eff24

Add transpiler information

424149d

Add shebang

40c46ca

lkuchenb requested a review from sbilge March 21, 2024 16:39

lkuchenb marked this pull request as ready for review March 21, 2024 16:39

lkuchenb and others added 3 commits April 9, 2024 08:52

Update to schemapack 2.0.0a3

f0ce613

(not-curated-yet) schemapack added

1b812b3

fixing failed checks

7cce2f3

sbilge requested changes Apr 9, 2024

View reviewed changes

sbilge and others added 5 commits April 9, 2024 15:30

minor for linkml linter workflow

51f89d3

Remove unnecessary enumerate

03e1d09

Update scripts/generate_xlsx.py

acfd478

Co-authored-by: sbilge <[email protected]>

Update scripts/generate_xlsx.py

703528b

Co-authored-by: sbilge <[email protected]>

Minor change

eeb9e82

sbilge force-pushed the refactor_schemapack branch from 441a5e5 to a4afdf7 Compare May 14, 2024 12:48

Review comments

5ca1381

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate XLSX from schemapack (GSI-676) #144

Generate XLSX from schemapack (GSI-676) #144

lkuchenb commented Mar 15, 2024 •

edited

Loading

sbilge left a comment

sbilge Apr 9, 2024

sbilge Apr 9, 2024

lkuchenb Apr 12, 2024

sbilge Apr 9, 2024

sbilge Apr 9, 2024

sbilge Apr 9, 2024

sbilge Apr 9, 2024

	from pydantic import BaseModel
	from pydantic import BaseModel, Field

	styles: Optional[dict[str, WorksheetStyle]] = {}
	styles: dict[str, WorksheetStyle] = Field(default_factory=dict)

		header_color: Optional[str] = None
		content_color: Optional[str] = None

	header_color: Optional[str] = None
	content_color: Optional[str] = None
	style: Optional[WorksheetStyle] = None

	if element_type == "array":
	if element_type == ValueType.ARRAY.value:

Generate XLSX from schemapack (GSI-676) #144

Are you sure you want to change the base?

Generate XLSX from schemapack (GSI-676) #144

Conversation

lkuchenb commented Mar 15, 2024 • edited Loading

sbilge left a comment

Choose a reason for hiding this comment

sbilge Apr 9, 2024

Choose a reason for hiding this comment

sbilge Apr 9, 2024

Choose a reason for hiding this comment

lkuchenb Apr 12, 2024

Choose a reason for hiding this comment

sbilge Apr 9, 2024

Choose a reason for hiding this comment

sbilge Apr 9, 2024

Choose a reason for hiding this comment

sbilge Apr 9, 2024

Choose a reason for hiding this comment

sbilge Apr 9, 2024

Choose a reason for hiding this comment

lkuchenb commented Mar 15, 2024 •

edited

Loading