- Preprocessing
- Decomposition
- Model learning
- Model interpolation
- Server configuration file
- Starting the server
- Connecting the client
- Exploration
The first step in using dSpaceX for exploration is to preprocess datasets for loading.
Preprocessing script usage is described in Data Preparation.
At this point the dataset can be loaded into the client and various partitionings can be explored.
Appropriate decomposition of the data is important before generating models since if the data isn't sensibly partitioned, the "garbage in, garbage out" rule of machine learning will take precedence.
To explore Morse-Smale decompositions, load the preliminary preprocessed model in the dSpaceX GUI, select a datasets and field and open the Partitioning menu as shown here. The options shown include selection of the number of closest neighbors in the dataset to consider when evaluating proximity. Play with the parameters and click Recompute to explore the results, shown as a graph of selectable persistence levels below. Each persistence level consists of a number of "crystals," shown in the right side of the window. Each crystal can be selected and the samples included with it will be shown in a drawer below.
Once the desired dataset partitioning is discovered, click Export and the partitions file used for learning models will be written.
Once a dataset has been partitioned, models can be learned using the data associated with each partition. These models can then be interpolated to produce new elements of a dataset. Examples of scripts used to learn models are described in Data Preparation.
After a model has been generated, new latent space coordinates can be used by the model to generate new data members. This can be performed dynamically by the server, or such data can be pre-generated by the processing scripts. Interpolation of models such as PCA or ShapeOdds have been incorporated into the server already. Other models can be interpolated using external Python scripts callable by the server, and still others can simply be interpolated offline and the results themselves loaded for exploration as shown in the following image.
Dynamic use of Python scripts by the server is now supported. Please see
External Python Model Modules for details. Custom
module names can be specified in the config.yaml
as described below.
Offline model interpolation is desired. See GitHub issue #187.
The dSpaceX server reads datasets consisting of images (samples), design
parameters (parameters), and quantities of interest (QoIs). These must be
organized into a single directory with a config.yaml
that specifies the name
of the dataset, its number of samples, and the names, locations, and formats of
its images, parameters, QoIs, distance matrices, embeddings (e.g., a tsne
layout), and probabilistic models. The currently supported formats are csv,
json, and yaml (comma-separated values, JavaScript object notation, and "YAML
ain't markup language"), and png images. Here is an example of a the yaml
configuration:
name: CantileverBeam
samples:
count: 1000
parameters:
format: csv
file: CantileverBeam_design_parameters.csv
qois:
format: csv
file: CantileverBeam_QoIs.csv
thumbnails:
format: png
files: images/?.png
offset: 1 # base-1 image names (0th name is 1; if offset by 1000, names would start at 1000)
padZeroes: false # padded image names (min chars needed must represent offset + num_files)
channels: 3 # num channels in each shape (e.g., 1-greyscale, 3-RGB, 4-RGBA)
distances:
format: csv
file: CantileverBeam_distance_matrix.csv
metric: euclidean
embeddings:
- name: tsne
format: csv
file: CantileverBeam_tsne_layout.csv
- name: ShapeOdds
format: csv
file: shapeodds_global_embedding.csv
- name: Shared GP
format: csv
file: shared_gp_global_embedding.csv
models:
- fieldname: Max Stress
type: shapeodds # shapeodds, pca, sharedgp, etc
root: shapeodds_models_maxStress # directory of models for this field
persistences: persistence-? # persistence files
crystals: crystal-? # in each persistence dir are its crystals
padZeroes: false # for both persistence and crystal dirs/files
partitions: CantileverBeam_CrystalPartitions_maxStress.csv # has 20 lines of varying length and 20 persistence levels
first_partition: 0 # if depth != -1 && num_persistences > 20, this is the first directory #
mesh: false # This is a mesh model generating corresponding sets of points (each set of points has the same triangle associations)
rotate: false # the shape produced by this model needs to be rotated 90 degrees clockwise to match samples (old ShapeWorks models need this)
ms: # Morse-Smale parameters used to compute partitions
knn: 15 # k-nearest neighbors
sigma: 0.25 #
smooth: 15.0 #
depth: 20 # num persistence levels; -1 means compute them all
noise: true # add mild noise to the field to ensure inequality
curvepoints: 50 # vis only? Not sure if this matters for crystal partitions
normalize: false # vis only? Not sure if this matters for crystal partitions
interpolations: # precomputed interps
- i1:
params: # model interpolation parameters used
sigma: 0.15 # Gaussian width
num_interps: 50 # precomputed interps per crystal
- i2:
params:
sigma: 0.01
num_interps: 500
- fieldname: Max Stress
type: pca
root: ms_partitions/test_max_stress_pca_model
persistences: persistence-?
crystals: crystal-?
padZeroes: false
partitions: ms_partitions.csv
first_partition: 6
rowmajor: true
ms:
knn: 15
sigma: 0.25
smooth: 15
depth: 20
noise: true
curvepoints: 50
normalize: true
- fieldname: Angle
type: pca
root: pca_models/pca_model_param_Angle
persistences: persistence-?
crystals: crystal-?
padZeroes: false
partitions: crystal_partitions/cantilever_crystal_partitions_Angle.csv
rowmajor: true
ms:
knn: 15
sigma: 0.25
smooth: 15.0
depth: -1
noise: true
curvepoints: 50
normalize: false
interpolations:
- i1:
params:
sigma: 0.15
num_interps: 50
- i2:
params:
sigma: 0.01
num_interps: 500
- fieldname: avg_field
type: pca
mesh: true
python_evaluator: None
python_renderer: data.thumbnails # module must have MeshRenderer class
root: ms_partitions/avg_field_pca_model
persistences: persistence-?
crystals: crystal-?
padZeroes: false
partitions: ms_partitions.csv
first_partition: 21
ms:
knn: 15
sigma: 0.25
smooth: 15
depth: 20
noise: true
curvepoints: 50
normalize: true
- fieldname: Angle
type: custom # a new model type (dynamic interpolation will require external evaluators and renderers and/or precomputed should be provided)
root: custom_models/custom_model_param_Angle
persistences: persistence-?
crystals: crystal-?
padZeroes: false
partitions: crystal_partitions/cantilever_crystal_partitions_Angle.csv
ms:
knn: 15
sigma: 0.25
smooth: 15.0
depth: -1
noise: true
curvepoints: 50
normalize: false
interpolations:
- i1:
params:
sigma: 0.15
num_interps: 50
- i2:
params:
sigma: 0.01
num_interps: 500
See Running the Server for instructions on starting the server.
See Running the Client for details on starting the web client.
It's time to explore. See Using dSpaceX for guidance using the application.