Design and use of relational databases in chemistry

Title:

Personal Author:

O'Donnell, T. J.

Publication Information:

Boca Raton, FL : CRC Press, 2009

Physical Description:

xii, 229 p. : ill. ; 25 cm.

ISBN:

9781420064421

Subject Term:

Chemistry -- Databases

Chemistry -- Data processing

Relational databases

Available:*

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010195150	QD455.3.E4 O46 2009	Open Access Book	Book	Searching... Unknown

Design and Use of Relational Databases in Chemistry helps programmers and users improve their ability to search and manipulate chemical structures and information, especially when using chemical database "cartridges". It illustrates how the organizational, data integrity, and extensibility properties of relational databases are best utilized when working with chemical information.

The author facilitates an understanding of existing relational database schemas and shows how to design new schemas that contain tables of data and chemical structures. By using database extension cartridges, he provides methods to properly store and search chemical structures. He explains how to download and install a fully functioning database using free, open-source chemical extension cartridges within PostgreSQL. The author also discusses how to access a database on a computer network using both new and existing applications.

Through examples of good database design, this book shows you that relational databases are the best way to store, search, and operate on chemical information.

Preface	p. xi
Acknowledgments	p. xiii
Biography	p. xv
Chapter 1 Introduction	p. 1
Chapter 2 Relational Database Fundamentals	p. 5
2.1 Introduction	p. 5
2.2 Tables, Rows, and Columns	p. 5
2.3 External and Internal Representations of Data	p. 7
2.4 Advantages over Spreadsheets	p. 8
2.4.1 Size and Speed	p. 8
2.4.2 Multiple Users	p. 8
2.5 Relationships among Tables	p. 9
2.5.1 One-to-Many Relationships	p. 9
2.5.2 One-to-One Relationships	p. 11
2.5.3 Many-to-Many Relationships	p. 12
2.6 Entity Relationship Diagrams	p. 12
2.7 Uniqueness	p. 14
2.8 Sequences	p. 14
2.9 Keys	p. 15
2.9.1 Primary Keys	p. 15
2.9.2 Foreign Keys	p. 15
2.10 Constraints	p. 16
2.11 Indexes	p. 16
2.12 Joining Tables	p. 16
2.13 Normal Forms	p. 17
2.13.1 First Normal Form	p. 17
2.13.2 Second Normal Form	p. 18
2.13.3 Third Normal Form	p. 19
2.13.4 Summary of Normal Forms	p. 20
References	p. 20
Chapter 3 Structured Query Language (SQL)	p. 21
3.1 Introduction	p. 21
3.2 Databases, Schemas, Tables, Rows, and Columns	p. 21
3.3 Create	p. 22
3.4 Insert	p. 23
3.5 Select	p. 24
3.6 Update and Delete	p. 25
3.7 SQL Functions	p. 26
3.7.1 Regular Functions	p. 26
3.7.2 Aggregate Functions	p. 27
3.8 Domains, Triggers, and Views	p. 28
3.9 Unions, Intersections, and Differences	p. 29
References	p. 30
Chapter 4 Relational Database Management Systems	p. 31
4.1 Introduction	p. 31
4.2 Standard SQL	p. 32
4.3 A Sampling of Differences	p. 32
4.4 Server and Client	p. 33
4.5 Compatibility	p. 35
References	p. 35
Chapter 5 Client and Web Applications	p. 37
5.1 Introduction	p. 37
5.2 Command Line Programs	p. 37
5.3 Web-Based Applications	p. 38
5.4 Client Applications	p. 39
5.5 SQL Interfaces in Various Languages	p. 41
5.5.1 Perl	p. 43
5.5.2 Python	p. 44
5.5.3 PHP	p. 44
5.5.4 Java	p. 45
References	p. 46
Chapter 6 Data Storage, Searching, and Manipulation	p. 47
6.1 Introduction	p. 47
6.2 General Schema Design Decisions	p. 47
6.3 Sample Schema for Tracking Chemical Samples	p. 49
6.4 Schemas for PubChem Data	p. 53
6.4.1 BioAssay Data	p. 54
6.4.2 Substances	p. 56
6.4.3 Compounds	p. 58
6.5 Data Constraints and Data Integrity	p. 60
6.6 Developing Complex SQL	p. 63
6.7 Subselect Statements	p. 66
6.8 Views	p. 67
References	p. 70
Chapter 7 Computer Representations of Molecular Structures	p. 71
7.1 Introduction	p. 71
7.2 SMILES Representation of Molecular Structure	p. 72
7.3 Extensions to SQL for Chemical Structures	p. 72
7.4 SMARTS Representation of Molecular Searches	p. 74
7.5 SMILES and SMARTS Quirks	p. 76
7.5.1 Hydrogen Atoms	p. 76
7.5.2 Aromaticity	p. 77
7.5.3 Tautomers	p. 77
7.5.4 Valence	p. 80
7.5.5 Chirality	p. 80
7.5.6 Isotopes	p. 81
7.5.7 Salts and Mixtures	p. 81
7.5.8 InChI and Canonical SMILES	p. 82
7.6 SMILES and Inorganic Structures	p. 82
7.7 Other SMILES Extensions	p. 82
7.8 Input and Output of Molecular Structures	p. 83
7.9 Useful SQL Extensions	p. 85
7.10 SMILES as an SQL Data Type	p. 86
7.10.1 Domains	p. 86
7.10.2 Triggers	p. 87
7.11 Summary	p. 88
References	p. 88
Chapter 8 Molecular Fragments and Fingerprints	p. 91
8.1 Introduction	p. 91
8.2 Fragments	p. 91
8.2.1 Fragment Keys	p. 92
8.2.2 MACCS Keys and Other Fragment Keys	p. 95
8.3 Fingerprints	p. 95
8.4 Similarity Measures	p. 96
8.5 Computing Fragment-Based Properties	p. 96
References	p. 98
Chapter 9 Reactions and Transformations	p. 99
9.1 Introduction	p. 99
9.2 Reaction SMILES	p. 99
9.3 Transformations	p. 100
9.3.1 Unimolecular Transformations	p. 101
9.3.2 Multi-Component Transformations	p. 104
9.4 Canonical Reaction SMILES	p. 106
References	p. 107
Chapter 10 PostgreSQL Extensions	p. 109
10.1 Introduction	p. 109
10.2 Composite Data Types	p. 109
10.3 Composite Data Type for Experimental Values	p. 111
10.4 Array Data Types for Two- and Three-Dimensional Coordinates	p. 115
10.5 Functions in Other Languages	p. 117
10.5.1 Plpgsql	p. 117
10.5.2 Plperl, Plpython, Pltcl	p. 118
10.5.3 Core Chemical Functions	p. 119
10.5.4 C Language Functions	p. 120
10.6 Object RDBMS	p. 121
References	p. 121
Chapter 11 Three-Dimensional Molecular Structure Tables	p. 123
11.1 Introduction	p. 123
11.2 Using Tables Instead of Files	p. 123
11.3 Molfile and Other Common File Formats	p. 124
11.4 Processing SDF Files	p. 125
11.5 Using Tables Instead of Files in Client Programs	p. 131
11.6 File Import, Export, and Conversions	p. 132
11.7 Functions Using Three-Dimensional Atomic Coordinates	p. 133
11.8 Conformations	p. 135
11.9 Other Representations of Three-Dimensional Molecular Structure	p. 136
References	p. 136
Chapter 12 More on Client and Web Interfaces to RDBMS	p. 137
12.1 Introduction	p. 137
12.2 Store All Possible Data in the RDBMS	p. 139
12.3 Advanced SQL Techniques	p. 140
12.3.1 Placeholders in SQL Statements	p. 141
12.3.2 Bind Values in SQL Statements	p. 142
12.4 Web Applications	p. 143
12.5 R Programs	p. 147
12.5.1 Hierarchical Clustering	p. 147
12.5.2 Linear Models	p. 148
References	p. 153
Chapter 13 Applications	p. 155
13.1 Introduction	p. 155
13.2 Compound Registration	p. 155
13.3 Experimental Chemical and Biological Data Integration	p. 162
13.4 Data from External Sources	p. 164
13.5 Utilities	p. 167
13.5.1 molgrep	p. 168
13.5.2 molcat	p. 168
13.5.3 molview	p. 169
13.5.4 molarb	p. 170
13.5.5 molrandom	p. 170
13.5.6 molnear	p. 171
13.5.7 molsame	p. 171
References	p. 172
Appendix	p. 173
A.1 Introduction	p. 173
A.2 Symbols and Bonds from Simplified Molecular Input Line Entry System (SMILES)	p. 173
A.3 Normalizing Data	p. 175
A.4 SQL Functions	p. 176
A.4.1 Public166keys	p. 176
A.4.2 Orsum	p. 176
A.4.3 Tanimoto	p. 176
A.4.4 Euclid	p. 177
A.4.5 Hamming	p. 177
A.4.6 Nbits_set	p. 177
A.4.7 Amw	p. 177
A.4.8 Tpsa	p. 181
A.5 Tables Used in Functions	p. 182
A.5.1 Amw	p. 183
A.5.2 Tpsa	p. 183
A.5.3 Public166keys	p. 183
A.6 Core Function Implementation for PostgreSQL	p. 188
A.6.1 PerlMol/plperlu	p. 188
A.6.2 FROWNS/plpythonu	p. 191
A.6.3 OpenBabel/python	p. 197
A.7 C Language PostgreSQL Functions	p. 203
A.8 Database Utilities Dbutils	p. 205
A.9 Loading Files into Simple Tables	p. 206
A.9.1 Smiloader	p. 207
A.9.2 Sdfloader	p. 208
References	p. 210
Index	p. 211

Available:*

On Order

Summary

Summary

Table of Contents