Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010195150 | QD455.3.E4 O46 2009 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Optimize Your Chemical Database
Design and Use of Relational Databases in Chemistry helps programmers and users improve their ability to search and manipulate chemical structures and information, especially when using chemical database "cartridges". It illustrates how the organizational, data integrity, and extensibility properties of relational databases are best utilized when working with chemical information.
The author facilitates an understanding of existing relational database schemas and shows how to design new schemas that contain tables of data and chemical structures. By using database extension cartridges, he provides methods to properly store and search chemical structures. He explains how to download and install a fully functioning database using free, open-source chemical extension cartridges within PostgreSQL. The author also discusses how to access a database on a computer network using both new and existing applications.
Through examples of good database design, this book shows you that relational databases are the best way to store, search, and operate on chemical information.
Table of Contents
Preface | p. xi |
Acknowledgments | p. xiii |
Biography | p. xv |
Chapter 1 Introduction | p. 1 |
Chapter 2 Relational Database Fundamentals | p. 5 |
2.1 Introduction | p. 5 |
2.2 Tables, Rows, and Columns | p. 5 |
2.3 External and Internal Representations of Data | p. 7 |
2.4 Advantages over Spreadsheets | p. 8 |
2.4.1 Size and Speed | p. 8 |
2.4.2 Multiple Users | p. 8 |
2.5 Relationships among Tables | p. 9 |
2.5.1 One-to-Many Relationships | p. 9 |
2.5.2 One-to-One Relationships | p. 11 |
2.5.3 Many-to-Many Relationships | p. 12 |
2.6 Entity Relationship Diagrams | p. 12 |
2.7 Uniqueness | p. 14 |
2.8 Sequences | p. 14 |
2.9 Keys | p. 15 |
2.9.1 Primary Keys | p. 15 |
2.9.2 Foreign Keys | p. 15 |
2.10 Constraints | p. 16 |
2.11 Indexes | p. 16 |
2.12 Joining Tables | p. 16 |
2.13 Normal Forms | p. 17 |
2.13.1 First Normal Form | p. 17 |
2.13.2 Second Normal Form | p. 18 |
2.13.3 Third Normal Form | p. 19 |
2.13.4 Summary of Normal Forms | p. 20 |
References | p. 20 |
Chapter 3 Structured Query Language (SQL) | p. 21 |
3.1 Introduction | p. 21 |
3.2 Databases, Schemas, Tables, Rows, and Columns | p. 21 |
3.3 Create | p. 22 |
3.4 Insert | p. 23 |
3.5 Select | p. 24 |
3.6 Update and Delete | p. 25 |
3.7 SQL Functions | p. 26 |
3.7.1 Regular Functions | p. 26 |
3.7.2 Aggregate Functions | p. 27 |
3.8 Domains, Triggers, and Views | p. 28 |
3.9 Unions, Intersections, and Differences | p. 29 |
References | p. 30 |
Chapter 4 Relational Database Management Systems | p. 31 |
4.1 Introduction | p. 31 |
4.2 Standard SQL | p. 32 |
4.3 A Sampling of Differences | p. 32 |
4.4 Server and Client | p. 33 |
4.5 Compatibility | p. 35 |
References | p. 35 |
Chapter 5 Client and Web Applications | p. 37 |
5.1 Introduction | p. 37 |
5.2 Command Line Programs | p. 37 |
5.3 Web-Based Applications | p. 38 |
5.4 Client Applications | p. 39 |
5.5 SQL Interfaces in Various Languages | p. 41 |
5.5.1 Perl | p. 43 |
5.5.2 Python | p. 44 |
5.5.3 PHP | p. 44 |
5.5.4 Java | p. 45 |
References | p. 46 |
Chapter 6 Data Storage, Searching, and Manipulation | p. 47 |
6.1 Introduction | p. 47 |
6.2 General Schema Design Decisions | p. 47 |
6.3 Sample Schema for Tracking Chemical Samples | p. 49 |
6.4 Schemas for PubChem Data | p. 53 |
6.4.1 BioAssay Data | p. 54 |
6.4.2 Substances | p. 56 |
6.4.3 Compounds | p. 58 |
6.5 Data Constraints and Data Integrity | p. 60 |
6.6 Developing Complex SQL | p. 63 |
6.7 Subselect Statements | p. 66 |
6.8 Views | p. 67 |
References | p. 70 |
Chapter 7 Computer Representations of Molecular Structures | p. 71 |
7.1 Introduction | p. 71 |
7.2 SMILES Representation of Molecular Structure | p. 72 |
7.3 Extensions to SQL for Chemical Structures | p. 72 |
7.4 SMARTS Representation of Molecular Searches | p. 74 |
7.5 SMILES and SMARTS Quirks | p. 76 |
7.5.1 Hydrogen Atoms | p. 76 |
7.5.2 Aromaticity | p. 77 |
7.5.3 Tautomers | p. 77 |
7.5.4 Valence | p. 80 |
7.5.5 Chirality | p. 80 |
7.5.6 Isotopes | p. 81 |
7.5.7 Salts and Mixtures | p. 81 |
7.5.8 InChI and Canonical SMILES | p. 82 |
7.6 SMILES and Inorganic Structures | p. 82 |
7.7 Other SMILES Extensions | p. 82 |
7.8 Input and Output of Molecular Structures | p. 83 |
7.9 Useful SQL Extensions | p. 85 |
7.10 SMILES as an SQL Data Type | p. 86 |
7.10.1 Domains | p. 86 |
7.10.2 Triggers | p. 87 |
7.11 Summary | p. 88 |
References | p. 88 |
Chapter 8 Molecular Fragments and Fingerprints | p. 91 |
8.1 Introduction | p. 91 |
8.2 Fragments | p. 91 |
8.2.1 Fragment Keys | p. 92 |
8.2.2 MACCS Keys and Other Fragment Keys | p. 95 |
8.3 Fingerprints | p. 95 |
8.4 Similarity Measures | p. 96 |
8.5 Computing Fragment-Based Properties | p. 96 |
References | p. 98 |
Chapter 9 Reactions and Transformations | p. 99 |
9.1 Introduction | p. 99 |
9.2 Reaction SMILES | p. 99 |
9.3 Transformations | p. 100 |
9.3.1 Unimolecular Transformations | p. 101 |
9.3.2 Multi-Component Transformations | p. 104 |
9.4 Canonical Reaction SMILES | p. 106 |
References | p. 107 |
Chapter 10 PostgreSQL Extensions | p. 109 |
10.1 Introduction | p. 109 |
10.2 Composite Data Types | p. 109 |
10.3 Composite Data Type for Experimental Values | p. 111 |
10.4 Array Data Types for Two- and Three-Dimensional Coordinates | p. 115 |
10.5 Functions in Other Languages | p. 117 |
10.5.1 Plpgsql | p. 117 |
10.5.2 Plperl, Plpython, Pltcl | p. 118 |
10.5.3 Core Chemical Functions | p. 119 |
10.5.4 C Language Functions | p. 120 |
10.6 Object RDBMS | p. 121 |
References | p. 121 |
Chapter 11 Three-Dimensional Molecular Structure Tables | p. 123 |
11.1 Introduction | p. 123 |
11.2 Using Tables Instead of Files | p. 123 |
11.3 Molfile and Other Common File Formats | p. 124 |
11.4 Processing SDF Files | p. 125 |
11.5 Using Tables Instead of Files in Client Programs | p. 131 |
11.6 File Import, Export, and Conversions | p. 132 |
11.7 Functions Using Three-Dimensional Atomic Coordinates | p. 133 |
11.8 Conformations | p. 135 |
11.9 Other Representations of Three-Dimensional Molecular Structure | p. 136 |
References | p. 136 |
Chapter 12 More on Client and Web Interfaces to RDBMS | p. 137 |
12.1 Introduction | p. 137 |
12.2 Store All Possible Data in the RDBMS | p. 139 |
12.3 Advanced SQL Techniques | p. 140 |
12.3.1 Placeholders in SQL Statements | p. 141 |
12.3.2 Bind Values in SQL Statements | p. 142 |
12.4 Web Applications | p. 143 |
12.5 R Programs | p. 147 |
12.5.1 Hierarchical Clustering | p. 147 |
12.5.2 Linear Models | p. 148 |
References | p. 153 |
Chapter 13 Applications | p. 155 |
13.1 Introduction | p. 155 |
13.2 Compound Registration | p. 155 |
13.3 Experimental Chemical and Biological Data Integration | p. 162 |
13.4 Data from External Sources | p. 164 |
13.5 Utilities | p. 167 |
13.5.1 molgrep | p. 168 |
13.5.2 molcat | p. 168 |
13.5.3 molview | p. 169 |
13.5.4 molarb | p. 170 |
13.5.5 molrandom | p. 170 |
13.5.6 molnear | p. 171 |
13.5.7 molsame | p. 171 |
References | p. 172 |
Appendix | p. 173 |
A.1 Introduction | p. 173 |
A.2 Symbols and Bonds from Simplified Molecular Input Line Entry System (SMILES) | p. 173 |
A.3 Normalizing Data | p. 175 |
A.4 SQL Functions | p. 176 |
A.4.1 Public166keys | p. 176 |
A.4.2 Orsum | p. 176 |
A.4.3 Tanimoto | p. 176 |
A.4.4 Euclid | p. 177 |
A.4.5 Hamming | p. 177 |
A.4.6 Nbits_set | p. 177 |
A.4.7 Amw | p. 177 |
A.4.8 Tpsa | p. 181 |
A.5 Tables Used in Functions | p. 182 |
A.5.1 Amw | p. 183 |
A.5.2 Tpsa | p. 183 |
A.5.3 Public166keys | p. 183 |
A.6 Core Function Implementation for PostgreSQL | p. 188 |
A.6.1 PerlMol/plperlu | p. 188 |
A.6.2 FROWNS/plpythonu | p. 191 |
A.6.3 OpenBabel/python | p. 197 |
A.7 C Language PostgreSQL Functions | p. 203 |
A.8 Database Utilities Dbutils | p. 205 |
A.9 Loading Files into Simple Tables | p. 206 |
A.9.1 Smiloader | p. 207 |
A.9.2 Sdfloader | p. 208 |
References | p. 210 |
Index | p. 211 |