Capturing Multicellular System Designs Using Synthetic Biology Open Language (SBOL)

,

T he increasing popularity of synthetic biology has yielded a wealth of biological systems that have been designed, implemented, and characterized to various degrees. 1,2These systems span a wide range of functionalities, from fundamental genetic circuits such as oscillators 3 and toggle switches, 4 to applied devices such as biosensors 5 and microbial factories. 6o date, most synthetic biology systems have been homogeneous in nature, meaning that every cell in the population is intended to have both the same genotype and the same phenotype.While this approach has yielded promising results, the complexity and optimization of such systems can become limited for certain applications. 7One reason for these limitations is that the larger genetic circuits required by morecomplex systems can place cells under metabolic strain, resulting in suboptimal performance.Another reason is that elements in large and complex designs havw a tendency to be drawn from many diverse sources, which have different optimal environments. 8Therefore, the host cell chosen to express these circuits hasa tendency to be a compromise that limits overall performance.Finally, it is also the case that many applications inherently involve cells with multiple distinct phenotypes, such as organoid models 9 and microbiome engineering. 10o tackle the issues mentioned above, there has been an increased interest in multicellular biological designs that involve more than one distinct population of cells and interactions between populations.In such systems, the overall design is split across the multiple cell populations.−16 This approach can reduce the metabolic burden on individual cells, which now only have to perform a fragment of the overall system. 17In addition, different host cells can be chosen for each element of the design, allowing optimal cells to be chosen for each element of the design.This could be extended further to create designs which involve both engineered and nonengineered cells. 18plitting designs between cell populations can also assist with the concept of modular design, where modules with specific functions can be designed, shared, and easily reused in other system designs.This modularity can also be achieved by splitting designs across different plasmids, which are then implemented in the same host. 19However, as this can already be easily captured using SBOL (The Synthetic Biology Open Language), it is not focused on here.
One of the major hallmarks of synthetic biology is standardization, which aims to increase the reproducibility of engineered biological systems and facilitate reuse by other researchers.To fully realize this aim, it is important that information about the design, implementation, and characterization of engineered systems can be easily shared with and understood by other members of the synthetic biology community.SBOL 20 has been developed by a community of synthetic biologists to capture information about engineered systems in a standardized format.In a similar fashion to the way that file formats such as the GenBank flat file format (GBF) were developed to capture information about natural biological systems, 21 SBOL enables the design-build-test cycle to be standardized by storing the information required at each stage.This information can detail designs, build plans, implementation details, and experimental information/results.SBOL aids the sharing of information between researchers and laboratories and promotes more reliable documentation 22 . 23reviously, however, this standard has only been used to represent homogeneous designs and has not explicitly been used to represent strain or other aspects of host context.Here, it is shown how the host context of a design can be represented using SBOL, and how this can be further applied to represent multicellular system designs.

■ METHODS
In this section, the relevant portions of the SBOL data model are reviewed and it is shown how they may be used to represent host context and multicellular systems.These representational practices are based on SBOL version 2.3.0. 24ote that, in this discussion, the word "class" is used to refer to types of entities in the SBOL data model.
Defining Parts, Devices, and Systems with the ComponentDef inition and ModuleDef inition Classes.The current SBOL data model has two main classes used to capture biological designs: ComponentDefinition and Module-Definition.The ComponentDef inition class is usually used to store information about physical structures, such as DNA and proteins, whereas the ModuleDef inition class is used to group biological entities together in a design in order to define the functional interactions between such entities.
The designs captured can range in complexity, from the representation of single parts (such as promoters, coding sequences, proteins), to devices composed of multiple parts (for instance, an expression construct), to complex systems comprising many devices (for example a genetic biosensor).In the case of devices and systems, each of the individual parts must be described by a separate ComponentDefinition or ModuleDefinition.The use of that part is then described within a ModuleDefinition class, with the part being referenced using the FunctionalComponent class.
The ModuleDef inition class may contain interactions between biological entities in the design (for example, a coding sequence (CDS) encoding a protein, which, in turn, represses a promoter, or a small molecule inhibiting a protein), whereas instances of ComponentDef inition may not.These relationships are formally captured using two other SBOL classes, Interaction and Participation: the Interaction class specifies the type of interaction (e.g.; genetic production) and contains instances of Participation giving the role played by each interacting object (for example, a CDS as genetic template and a protein as product).In addition, the ComponentDefinition class has both type and role properties, but ModuleDef inition has only role properties.The type property in SBOL is used to describe the category within which a biological entity falls (for example DNA molecule, small molecule, protein), and the role property describes the intended biological function for an entity or design.For example, a metabolic pathway might have roles of "metabolic process" and 'small molecule biosynthetic process' from the Gene Ontology (GO), 25 and a biosensor might have a role of "response to chemical" also from GO.
Ontologies in SBOL.An ontology can be thought of as a set of formal descriptions for specific terms and their relationships.In synthetic biology, ontologies allow a standardized language to be used when describing biological systems.Many separate ontologies are used in SBOL to better describe entities within a system, and how those entities interact.The SBOL-OWL ontology defines the relationships between classes in the SBOL data model and terms from other ontologies. 26For example, the role property used by instances of the ComponentDefinition class is recommended to be defined using terms taken from the Sequence Ontology (SO). 27Examples of terms used in this property are "Promoter" (SO:0000167), "Ribosome Entry Site" (SO:0000139), and "CDS" (SO:0000316).Another key ontology is the Systems Biology Ontology (SBO), 28 which is used for defining types for Interaction instances, such as "Genetic Production" (SBO:0000589) and roles for Participation instances, such as "Template" (SBO:0000645) and "Product" (SBO:0000011).Other ontologies commonly used in SBOL include the Gene Ontology (GO), 25,29 and Chemical Entities of Biological Interest (CHEBI). 30epresenting Cells in SBOL.When attempting to reproduce a previously designed multicellular system, it is necessary to know some fundamental information about the cells used.The most crucial information consists of the strains that were used, any plasmids transformed into the cells, and the expected functionality.By providing precise taxonomic information, it can be ensured that the correct strains are used when other researchers attempt to re-create a system, hence increasing reproducibility.Finally, it is useful to record and share the exact functionality of a design to ensure that future users select the correct system for their desired application, and to allow for informed modification of the system.
Using the classes and ontologies described in the section above, a recommended approach for representing cells in a biological system using SBOL has been developed.This approach captures (i) taxonomy, (ii) interactions occurring within the cell, and (iii) components inside the cell (for example, DNA and small chemical molecules).
The approach uses an instance of the ModuleDefinition class to represent a system that involves a specified type of cell (Figure 1).Usage of the cell type is represented by an instance of the FunctionalComponent class inside the ModuleDefinition, whose definition is a ComponentDef inition instance that is used to capture information about the species and strain of the cell in the design.This ComponentDefinition has a type of "cell" from the Gene Ontology (GO:0005623) and a role of "physical compartment" (SBO:0000290).Taxonomic information is captured by annotating the class instance with a URI that leads to a description of the strain.As a best practice, and where possible, the organism's species and strain should be defined by providing a link to the relevant entry in the National Center for Biotechnology Information (NCBI) taxonomy database.This standardized approach would allow for easier automated retrieval of information about the organism.While a link to an NCBI entry would be preferable, there are instances where this may not be possible (for example, when using a novel strain that is not yet recorded in NCBI).In these cases, it is suggested that a different database, which does contain the organism, is used.If the organism is not in any database, then a description of the organism should be provided.
Other relevant entities, such as inducer molecules or plasmid DNA, are also captured using instances of the FunctionalComponent class.Interactions that occur within the system are captured using the Interaction and Participation classes, and interactions that occur within the cell are specified by including a Participation for the cell with a role of "physical compartment".An additional Interaction class instance can also be used to explicitly define which entities are only present within the cell and, therefore, not available to the rest of the system.This interaction has a type of "containment" (SBO:0000469) and has at least two participants: the cell, which has a role of "physical compartment", and one or more contained entities, which have roles of "contained" (SBO:0000064) (Figure S1B in the Supporting Information).
Note that when a cell is included in a SBOL design, it is actually representing a "pool" of cells of that type.This is similar to how, for example, in SBOL, genetic production of a protein from a plasmid is interpreted as production of a pool of some number of proteins from a collection of some number of copies of the plasmid.Thus, for example, a containment interaction such as in Figure S1B may be interpreted as stating: "in this system, cells of type X contain plasmids of type Y".
Representing Designs with Multiple Cells.Once cells have been individually defined, they can be included in a design for a multicellular system.In systems involving more than one cell, it is important to capture the relative amounts of each cell type, since this can have a large effect on the system's behavior.In addition, it is important to define how each cell type interacts with other cells in the system, as these intercellular interactions are usually the basis for a multicellular system's functionality.Intercellular interactions normally occur by the same type of molecule being involved in processes of different cells.For example, two cell types in the system may require the same molecule for metabolic pathways to facilitate cell growth and, hence, are competing for resources, or one cell may produce a molecule that interacts with genetic circuits in a second cell, which is the basis for intercellular communication.
Given the above representation for a single class of cells, the same approach can be used to represent designs that incorporate more than one cell type.At the simplest level, one can simply have one FunctionalComponent for each cell type and appropriate Interaction and Participation instances to specify which aspects of the systems are associated with each cell type.
One can also compose a multicellular system by using the Module class to link together ModuleDefinition instances that each define a design for a system containing a single type of cell.Figure 2 shows an example of this approach.Here, each Module instance has its definition pointing to the ModuleDef inition class that is used to represent each single system containing a cell type.In order to capture links between the same entities present in multiple parts of the same design, the Module classes contain instances of the MapsTo class.Here, a MapsTo class with a refinement value of "merge" is used to link FunctionalComponent classes that represent a cell type in the multicellular system to the FunctionalComponent class used to represent the same cell in the lower-level cell system design.Instances of the MapsTo class are also used to capture the fact that noncell entities in the multicellular system are identical to those same entities when used in the cell system design, such as a small molecule produced by one cell population and utilized by another population.
Finally, it is recommended that the proportion of cell types in a multicellular system can be captured using the Measure class. 31The Measure class has value, unit, and type properties, which allow specification both of a parameter and of how to  S1 in the Supporting Information.Using the best practices described here, each cell should be contained within its own system.Entities, such as the two molecules in this diagram, can be represented within the system and the SBOL Interaction class can be used to convey that they are contained within the cell.Additional interactions, not shown here for clarity, can also be used to represent behavior such as active transport of molecules into the cell, or binding of entities to the cell's surface.Taxonomic information about the cell in the system is captured using an instance of the ComponentDef inition class, which is referenced from the data structure representing the cell in the cell system.This allows for a distinction between an organism in general, and an actual cell in a system.interpret it in the context of the biological system. 32For example, Figure 3 shows how an instance of the Measure class can be used to annotate the Module instance in the multicellular design, which represents a cell system.The Measure instance can capture the proportion of cells using any relevant units, including percentage, cell count, mass, or culture volume.Note that the Measure class could be used to help represent structured multicellular systems by describing the location of cells in space.

■ RESULTS AND DISCUSSION
Having presented a method for using SBOL to represent multicellular designs, in this section, it is applied to two recent complex designs, one a sensor system distributed across three types of cells, the other an inducible cell-sorting system with two types of cells.
Example System: A Modular, Multicellular Biosensor.Figure 4 shows an example of a multicellular system: the Modular, Multicellular Biosensor (MMB) described by the Newcastle team for the 2017 International Genetically Engineered Machines (iGEM) competition. 33The MMB consists of three cell types: (i) a detector cell that converts the presence or absence of a specific stimulus into a genetic signal; (ii) a processor cell that modifies the signal from the detector cell in some way (for instance, amplifies it); and (iii) a reporter cell that converts the genetic signal to a response, such as color change or regulation of a metabolic pathway.
The three cell types in the MMB exhibit unidirectional communication, in which the detector cell passes a signal to the processor cell, and the processor cell passes a signal to the reporter cell.This communication is enabled using two orthogonal quorum sensing (QS) mechanisms.The LasIR QS mechanism is used to pass the signal from the detector cell to the processor cell.When the stimulus is present, the detector cell produces the acylhomoserine lactone (AHL) C12-HSL (homoserine lactone), which diffuses out of the detector cell and activates gene expression in the processor cell.The RhIIR QS mechanism is used to pass the signal from the processor cell to the reporter cell in a similar way, except that the processor cell produces AHL C4-HSL to activate gene expression in the reporter cell. 34igure 5 illustrates key aspects of the SBOL representation of a variant of the detector cell in the MMB, in this case designed to detect IPTG (isopropyl β-d-1-thiogalactopyranoside).A full XML file with this file can be found in the Supporting Information.An instance of the ModuleDefinition class is used to represent the system in which the IPTG Detector cell is implemented, and a separate ComponentDef inition instance is used to capture taxonomic information about the cell; in this case, that it is an Escherichia coli DH5α strain.This ComponentDef inition is used to define an instance of the FunctionalComponent class, which represents the cell within the system.The important molecules in the design are captured using instances of the FunctionalComponent class, and transformation of the cell with a plasmid can be captured in the same way.However, the containment of the plasmid within the cell must also be captured explicitly using a "containment" interaction.This Interaction instance has the host cell and plasmid DNA as participants with roles of "physical compartment" and "contained", respectively.Other molecules that are produced by the cell and are not transported out into the extracellular space can also be defined in this way.
It is possible to explicitly capture the movement of molecules in/out of the cell if desired by using the Interaction class to define specific transport mechanisms.This approach can provide additional information that may be important, such as if transportation is passive or relies on additional cellular machinery.In this case, however, we do not add this additional information, because it is not anticipated to be of significance for the MMB design.
Figure 6 depicts how a design containing the IPTG Detector Module and Processor Module from the MMB can be captured using SBOL and the best practices described in the Methods section.An XML file describing this system can also be found in the Supporting Information.In this design, there are two cell populations; each population contains identical bacterial strains (E. coli DH5α) but are transformed with different plasmids (either the IPTG Detector Plasmid or the Blank Processor Plasmid).The design contains two ModuleDef initions to capture information about each cell type.These ModuleDefinitions include one of the cell types as a FunctionalComponent, that is defined by a ComponentDef inition  that links to the NCBI entry from E. coli DH5α, which conveys that both cell populations are composed of identical bacterial strains.The ModuleDefinitions representing the cell systems also contain other important entities; in the case of the Detector or Processor plasmid, this is an inducer molecule (IPTG for the Detector system and C12-HSL for the Processor system) and a molecule that is produced by the cell (C12-HSL for the Detector system and C4-HSL for the Processor system).Each of these entities are included as a FunctionalComponent, which is defined by a ComponentDefinition.The small molecule C12-HSL is involved in both cell systems, and therefore the FunctionalComponents in both systems are defined by the same ComponentDef inition, which directly conveys that this molecule pool is identical.
Each ModuleDef inition representing a cell system also contains an Interaction, which defines the function of that cell system.This design captures that, within the Detector cell, the small molecule IPTG stimulates something on the Detector plasmid to produce C12-HSL, and, within the Processor cell, C12-HSL stimulates the production of C4-HSL.More details could be included at this point (such as exact mechanisms for how IPTG stimulates the production of C12-HSL or the diffusion of the small molecules across the cell walls), but, for the sake of clarity, this functionality is abstracted here.Also note that, ordinarily, it would be recommended that an additional Interaction class be included to capture that the plasmids that are contained within each cell (as depicted in Figure 1), but again this is omitted in Figure 6 for the sake of clarity.
Each cell system can now be included in a new ModuleDefinition as a Module to convey that they are members of a multicellular design.The other important biological entities, such as the small molecules, can also be included in the multicellular design as FunctionalComponents, along with the cell populations.MapsTo classes are used to explicitly link identical entities between the cell system designs and the multicellular design.In this way, interactions between the cell system become apparent.In this example, the small molecule C12-HSL is produced by the Detector cell population, which then stimulated production of C4-HSL in the Processor cell population, therefore conveying unidirectional communication from the Detector cells to the Processor cells.
Finally, the proportion of each cell population is captured by annotating the Modules which represent each cell system in the multicellular design with the Measure class.In the example in Figure 6, each cell population is annotated as comprising 33.33% of the entire cell population.Since this does not add up to 100%, it can be inferred that another cell population may be required, or that there are unknown cell types in the design.
An Inducible Cell-Sorting System.Another prototypical example of systems involving multiple types of cells are pattern-formation systems based on cell sorting.Here, we consider a recent work in this area on programmable cell sorting, 35,36 in which the pattern formed is controlled predictably by mixing cells with high cadherin expression and cells with low cadherin expression.If the cadherins used are all the same, then cell motility will result in high-adhesion cells gradually sorting into clusters with low-adhesion cells on the outside of each cluster.The shape formed in this manner is controlled by the fraction of high-adhesion cells: above a critical threshold, they form a "sorted ball", consisting of a single large cluster with a surface of low-adhesion cells.At lower fractions, the high-adhesion cells instead form "polka dots", with small clusters embedded in a unified background of low-adhesion cells.Controlling cadherin expression (e.g., by adding a synthetic expression cassette with an inducible promoter) can further allow sorting behavior to be selected dynamically.
Figure 7 shows an example of how such an inducible cellsorting system can be represented in SBOL, following the recommendations given above.In this case, the two strains of cells are both Chinese Hamster Ovary (CHO) cells, which are natively low in cadherin expression and clump only weakly.One of the two strains, however, has been transformed with the addition of a synthetic Doxycycline-inducible cadherin expression cassette.Within the ModuleDef inition describing this system, each cell strain's representation is based on a FunctionalComponent, both using the same definition of a CHO cell ComponentDef inition.However, the inducible CHO (iCHO) cells are enhanced with an Interaction type of containment that sets them as the physical compartment that contains the FunctionalComponent instantiations of both the cadherin cassette and the cadherin that is its output.The production of cadherin from this cassette is represented by a second Interaction (additional details of the structure of the cassette and its induction by Doxycycline are omitted for space purposes).The actual cell-to-cell adhesion relationships that implement the sorting behavior are included in the  4 can be represented using SBOL.The cell's species is E. coli DH5α and it contains two small molecules (IPTG and C12-HSL), and the IPTG Detector Plasmid.In SBOL, these entities are defined using ComponentDef inition classes and implemented within the cell as FunctionalComponent classes.The small molecules travel from the extracellular environment into the cell, whereas the plasmid is contained only within the cell.This information is stored in SBOL using the direction property of the FunctionalComponent class.An instance of the Interaction class could also be used to explicitly state this behavior, along with any other information.
ModuleDefinition as more Interactions, each representing one of the three adhesion relations in the system: CHO cells with CHO cells, CHO cells with iCHO cells, and iCHO cells with iCHO cells.Finally, the parameters and dynamics of this system may be represented by attaching Measures to the FunctionalComponents and a Model to the ModuleDefinition (not shown).

■ DISCUSSION
While the SBOL data model has previously been used to capture information about genetic constructs and intracellular interactions, it has not been widely used to describe and share information about multicellular systems.This paper describes a set of best practices for how multicellular system designs can be captured in a standard way using SBOL.Examples have been provided to illustrate specific concepts and demonstrate feasibility, and valid illustrative SBOL documents are available (see the Supporting Information).The SBOL documents provided were created using the currently available python SBOL libraries.
The best practices described focus on ensuring that there is sufficient flexibility to describe a wide variety of multicellular designs, and incorporates the concept of modular design, which is an important principle in synthetic biology.In addition, to ensure that the approach described here is as backward compatible as possible, terminology from ontologies already widely used in the current version of SBOL are used, and no new classes or features are required.In addition, it is recommended that, where possible, existing resources such as the NCBI database be used to reference in-depth information, to avoid replication of information.
The best practices described here have been included with the latest version of SBOL version 2 (2.3.0).Note that SBOL version 3 is now available; however, most libraries and tools are not compatible with this latest version and the underlying data model is subject to change.Therefore, the approaches described here for capturing information about multicellular systems using SBOL 2 should be relevant for the coming years.
XML files for SBOL Files 1−5 (ZIP)  UML diagrams for the systems described in the paper (SBOL File 1); SBOL file representing the system depicted in Figure 1 (SBOL File 2); SBOL file representing the system depicted in Figure 2 (SBOL File 3); SBOL file representing the system depicted in Figure 5 (SBOL File 4); SBOL file representing the system depicted in Figure 6 (SBOL File 5); and SBOL file representing the system depicted in Figure 7 (SBOL File 6) (PDF) Special Issue Paper Invited contribution from the 11th International Workshop on Bio-Design Automation.

Figure 1 .
Figure1.A cell-encoded depiction using SBOL Visual.A UML diagram of the system shown here is represented in FigureS1in the Supporting Information.Using the best practices described here, each cell should be contained within its own system.Entities, such as the two molecules in this diagram, can be represented within the system and the SBOL Interaction class can be used to convey that they are contained within the cell.Additional interactions, not shown here for clarity, can also be used to represent behavior such as active transport of molecules into the cell, or binding of entities to the cell's surface.Taxonomic information about the cell in the system is captured using an instance of the ComponentDef inition class, which is referenced from the data structure representing the cell in the cell system.This allows for a distinction between an organism in general, and an actual cell in a system.

Figure 2 .
Figure 2. Multicellular system representation using SBOL Visual.A UML diagram of the system shown here is represented in Figure S2 in the Supporting Information.In this diagram, a multicellular system composed of two different cells of different organism types is shown.The two cell types are represented by cell system 1 and 2 and are depicted similarly to the cell in Figure 1.Here, the two cells contain molecule A, which is imported into the cell from the extracellular environment.In SBOL, the multicellular system itself is represented by an instance of the ModuleDef inition class.This ModuleDef inition contains elements representing the two cells and Molecule A, which are referenced from the original cell systems.

Figure 3 .
Figure 3. Diagram depicting how to capture cell ratios using SBOL.A UML diagram of the system shown here is represented in Figure S3 in the Supporting Information.This diagram shows a multicellular system composed of two undefine cells: Cell 1 and Cell 2. Cell 1 comprises 30% of all cells in the system, and Cell 2 comprises the other 70%.

Figure 4 .
Figure 4. Schematic of a generic modular, multicellular biosensor.The Modular, Multicellular Biosensor (MMB) Framework described by Newcastle iGEM 2017 consists of three modules: a detector, a signal processor, and a reporter.These three modules are expressed on separate plasmids and transformed to Escherichia coli cells.A co-culture of these three cell types is then created to form a functional biosensor.The signal propagates from detector cells to processor cells to reporter cells, using AHL (acylhomoserine lactone)-based quorum sensing mechanisms.

Figure 5 .
Figure 5. SBOL Visual depiction of the IPTG detector cell.A UML diagram of the system shown here is represented in Figure S4 in the Supporting Information.This diagram depicts how the IPTG Detector Cell from the MMB in Figure4can be represented using SBOL.The cell's species is E. coli DH5α and it contains two small molecules (IPTG and C12-HSL), and the IPTG Detector Plasmid.In SBOL, these entities are defined using ComponentDef inition classes and implemented within the cell as FunctionalComponent classes.The small molecules travel from the extracellular environment into the cell, whereas the plasmid is contained only within the cell.This information is stored in SBOL using the direction property of the FunctionalComponent class.An instance of the Interaction class could also be used to explicitly state this behavior, along with any other information.

Figure 6 .
Figure 6.SBOL Visual diagram showing intercellular interactions using SBOL.A UML diagram of the system shown here is represented in Figure S5 in the Supporting Information.This diagram depicts how the IPTG Detector Cell and Blank Processor Cell variants of the MMB interact.The two cell types are represented using the same principles described by Figure5.Both cells contain three entities (captured in SBOL as FunctionalComponent objects).One of these entities, the small molecule C12-HSL, is present in both cell systems: in the Detector Cell, it has a role of "product", and in the Processor Cell, it has a role of "stimulator".When both cell types are combined into a multicellular system (represented in SBOL as a ModuleDef inition), the sharing of this molecule is captured as a shared FunctionalComponent.This feature can be used to capture intercellular interactions in a nonexplicit way.In this case, the interaction between the Detector Cells and Processor Cells can be derived as follows: The Detector Cell produces C12-HSL, which stimulates the Processor Cell to produce the small molecule C4-HSL.

Figure 7 .
Figure 7. Visual representation of a cell sorting system using SBOL.A UML diagram of the system shown here is represented by Figure S6 in the Supporting Information.Two cell types are captured in this system, both of which are CHO cells.The natural CHO cells (blue) clump together weakly (shown by the association glyph).In SBOL, this clumping is represented using the Interaction class.The iCHO cells (purple) have been transfected with a cadherin cassette encoding cadherin, which enhances cell clumping.This allows the iCHO cells to clump together at a greater rate.The CHO and iCHO cells can also associate to form a CHO−iCHO cell complex.