From cf046924be6f0d2e68c19b74bc2de92acf59c89a Mon Sep 17 00:00:00 2001 From: "Karl O. Pinc" Date: Sun, 28 May 2023 17:21:39 -0500 Subject: [PATCH] Add to docs on introduction, architecture, and an appendix --- doc/src/appendices.m4 | 84 ++++++++++++++++++ doc/src/architecture.m4 | 31 +++++++ doc/src/architecture/databases.m4 | 58 +++++++++++++ doc/src/architecture/schemas.m4 | 85 ++++++++++++++++++ .../permissions.m4 => architecture/users.m4} | 10 ++- doc/src/epilog.inc.m4 | 2 + doc/src/index.m4 | 3 +- doc/src/intro.m4 | 5 +- doc/src/intro/about_db.m4 | 87 ++++++++++++++++++- doc/src/intro/about_doc.m4 | 47 ++++++++-- 10 files changed, 398 insertions(+), 14 deletions(-) create mode 100644 doc/src/appendices.m4 create mode 100644 doc/src/architecture.m4 create mode 100644 doc/src/architecture/databases.m4 create mode 100644 doc/src/architecture/schemas.m4 rename doc/src/{intro/permissions.m4 => architecture/users.m4} (88%) diff --git a/doc/src/appendices.m4 b/doc/src/appendices.m4 new file mode 100644 index 0000000..8878b52 --- /dev/null +++ b/doc/src/appendices.m4 @@ -0,0 +1,84 @@ +.. Copyright (C) 2023 The Meme Factory, Inc. www.karlpinc.com + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU Affero General Public License as + published by the Free Software Foundation, either version 3 of the + License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Affero General Public License for more details. + + You should have received a copy of the GNU Affero General Public License + along with this program. If not, see . + +.. M4 setup +include(constants.m4) +include(macros.m4) +sdb_rst_quotes(`on') +sdb_generated_rst() + + +APPENDIXES +========== + + +.. contents:: + :depth: 2 + + +Technologies Used +----------------- + +These technologies are used by SokweDB. The desire is to keep the +number of technologies to a minimum to keep development simple. + +Operating System Components and Services +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* The Microsoft Azure cloud +* The Linux Operating System Kernel +* The Ubuntu Linux Distribution +* The Internet/Web/Web Browsers and related technology +* The PostgreSQL database engine +* The Postfix Mail Transfer Agent +* The Nginx webserver +* The gitweb source code repository web interface +* The Letsencrypt.org security certificate toolset and services +* The MediaWiki wiki engine +* The php-fpm PHP interpreter (depreciated) + +Although the operating system level components have been chosen with +care, they are more-or-less interchangeable with similar, stock, +components. Each may be swapped out when this is found convenient. +The exception is the PostgreSQL database engine. SokweDB depends upon +specific PostgreSQL characteristics and features. + + +Development Tools +^^^^^^^^^^^^^^^^^ + +* The SQL database query and construction language +* The PL/pgSQL PostgreSQL database extension language +* The Python3 programming language +* The Pyramid_ web development framework +* The M4_ macro programming language +* The PHP programming language (deprecated) +* The git revision control system +* The bash shell scripting language +* The make build system tool + + +Documentation Tools +^^^^^^^^^^^^^^^^^^^ + +* The `ReStructured Text`_ (RST) markup lanugage +* The Sphinx_ RST processor +* The Inkscape SVG vector graphics editor + +.. _M4: https://en.wikipedia.org/wiki/M4_(computer_language) +.. _Pyramid: https://trypyramid.com/ +.. _Sphinx: https://www.sphinx-doc.org/ +.. _ReStructured Text: + https://docutils.sourceforge.io/docs/ref/rst/introduction.html diff --git a/doc/src/architecture.m4 b/doc/src/architecture.m4 new file mode 100644 index 0000000..d475b1a --- /dev/null +++ b/doc/src/architecture.m4 @@ -0,0 +1,31 @@ +.. Copyright (C) 2023 The Meme Factory, Inc. www.karlpinc.com + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU Affero General Public License as + published by the Free Software Foundation, either version 3 of the + License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Affero General Public License for more details. + + You should have received a copy of the GNU Affero General Public License + along with this program. If not, see . + +.. M4 setup +include(constants.m4) +include(macros.m4) +sdb_rst_quotes(`on') +sdb_generated_rst() + + +System Architecture +=================== + +.. toctree:: + :maxdepth: 3 + + architecture/databases.rst + architecture/users.rst + architecture/schemas.rst diff --git a/doc/src/architecture/databases.m4 b/doc/src/architecture/databases.m4 new file mode 100644 index 0000000..e925b47 --- /dev/null +++ b/doc/src/architecture/databases.m4 @@ -0,0 +1,58 @@ +.. Copyright (C) 2023 The Meme Factory, Inc. www.karlpinc.com + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU Affero General Public License as + published by the Free Software Foundation, either version 3 of the + License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Affero General Public License for more details. + + You should have received a copy of the GNU Affero General Public License + along with this program. If not, see . + +.. M4 setup +include(constants.m4) +include(macros.m4) +sdb_rst_quotes(`on') +sdb_generated_rst() + +.. _databases: + +Databases +--------- + +SokweDB utilizes at least 3 databases, each for a different purpose. +There may also be other databases available.\ [#f8]_ + +sokwedb +^^^^^^^ + +The ``sokwedb`` database contains the final, "official", data. All +research takes place in this database. + +sokwedb_test +^^^^^^^^^^^^ + +The ``sokwedb_test`` database is used, by everyone, for testing. +Typically, this database contains a copy of the ``sokwedb`` data. It +may be desirable to upload new data into the ``sokwedb_test`` +database before uploading into the ``sokwedb`` database. This allows +the data to be cleaned and examinations made before upload into +production. + +sokwedb_copy +^^^^^^^^^^^^ + +The ``sokwedb_copy`` database is a 2nd database for testing and +development. It might be used by the system's developers to try new +features, or be used by anyone to test something that seems +particularly dangerous and might interfere with normal operations if +tested in the ``sokwedb_test`` database. + +.. rubric:: Footnotes + +.. [#f8] Particularly during periods of heavy software development, + there may be a separate database dedicated to each developer. diff --git a/doc/src/architecture/schemas.m4 b/doc/src/architecture/schemas.m4 new file mode 100644 index 0000000..42430f6 --- /dev/null +++ b/doc/src/architecture/schemas.m4 @@ -0,0 +1,85 @@ +.. Copyright (C) 2023 The Meme Factory, Inc. www.karlpinc.com + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU Affero General Public License as + published by the Free Software Foundation, either version 3 of the + License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Affero General Public License for more details. + + You should have received a copy of the GNU Affero General Public License + along with this program. If not, see . + +.. M4 setup +include(constants.m4) +include(macros.m4) +sdb_rst_quotes(`on') +sdb_generated_rst() + +.. _schemas: + +Schemas +------- + +Each SokweDB database contains a number of :ref:`schemas `. +Some of these schemas will not be documented +herein; it is likely that some schemas will be created to hold shared +data, data not part of SokweDB itself but related to it. + +The sokwedb schema +^^^^^^^^^^^^^^^^^^ + +The ``sokwedb`` schema contains the data collected in the field. It +is the primary schema of interest to the researcher. + +The codes schema +^^^^^^^^^^^^^^^^ + +The ``codes`` schema contains those tables which control the data +vocabulary defining the codes able to be recorded in the database. +Because the codes defined in this schema are often used and well-known +the schema's tables are not often of interest. This schema exists so +that the ``sokwedb`` schema is not cluttered with un-interesting +tables. + +Most of the tables in this schema contain one row per defined +code. The codes are usually kept in a column named ``Value``. There +is also a ``Description`` column, which describes the coded value. + +A few of the tables in this schema are more complex, and contain more +than 2 columns. These are often tables which contain 'meta +information' involving the mechanics of the data collection process. +Things like lists of researchers, observers, or equipment used in the +data collection process. + + +The upload schema +^^^^^^^^^^^^^^^^^ + +The ``upload`` schema contains the views used during the data upload +process. These are, usually, of interest only to those who upload +data into the database. + + +The per-user private schemas +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Individuals are given their own schemas in which they can do whatever +they wish. So each regular account/login/username has an associated +schema with a name the same as that of the account. + +.. caution:: + It is usually bad practice to grant another person access to + something in a private schema. It is often better to create + another, shared, schema. In this way individuals, and their + accounts and private schemas, can come and go without affecting the + work of the institution. + +Because of the schema search order the schema name must be used to +qualify anything created in the user's schema. E.g., to create the +table ``foo`` in the user ``mylogin``'s schema:: + + CREATE TABLE mylogin.foo (somecolumn INTEGER); diff --git a/doc/src/intro/permissions.m4 b/doc/src/architecture/users.m4 similarity index 88% rename from doc/src/intro/permissions.m4 rename to doc/src/architecture/users.m4 index 09dc3d3..25414d4 100644 --- a/doc/src/intro/permissions.m4 +++ b/doc/src/architecture/users.m4 @@ -20,8 +20,14 @@ sdb_rst_quotes(`on') sdb_generated_rst() -Database Permissions --------------------- +Users and Database Permissions +------------------------------ + +Each person should have their own login/username, which should not be +shared. + +The database associates each login with specific permissions to +objects (tables, etc.) within the database. To access the data in the database permission must be granted. This is done per user login. diff --git a/doc/src/epilog.inc.m4 b/doc/src/epilog.inc.m4 index fdf9f82..e762dfe 100644 --- a/doc/src/epilog.inc.m4 +++ b/doc/src/epilog.inc.m4 @@ -25,6 +25,8 @@ sdb_generated_rst()dnl .. URLs of external pages. +.. _PostgreSQL: https://postgresql.org + .. _UUID: https://en.wikipedia.org/wiki/Universally_unique_identifier .. _technical debt: https://en.wiktionary.org/wiki/technical_debt diff --git a/doc/src/index.m4 b/doc/src/index.m4 index b6f020b..933a5fe 100644 --- a/doc/src/index.m4 +++ b/doc/src/index.m4 @@ -30,11 +30,12 @@ Technical Documentation :maxdepth: sdb_if_htmlout(3, 5) intro.rst + architecture.rst er_diagrams.rst + appendices.rst .. Add these back later.... schemas.rst - appendices.rst Indices and tables diff --git a/doc/src/intro.m4 b/doc/src/intro.m4 index 3a40190..43c8f5b 100644 --- a/doc/src/intro.m4 +++ b/doc/src/intro.m4 @@ -22,12 +22,11 @@ sdb_generated_rst() Introduction ============ -This documentation describes the design and capabilities of the -SokweDB database and related systems. +This document describes the design and capabilities of the SokweDB +database and related systems. .. toctree:: :maxdepth: 3 intro/about_doc.rst intro/about_db.rst - intro/permissions.rst diff --git a/doc/src/intro/about_db.m4 b/doc/src/intro/about_db.m4 index a1667c4..f732539 100644 --- a/doc/src/intro/about_db.m4 +++ b/doc/src/intro/about_db.m4 @@ -19,9 +19,62 @@ include(macros.m4) sdb_rst_quotes(`on') sdb_generated_rst() +System Design +------------- -About the Database ------------------- +The system design emphasizes first, data integrity, and second, low +long-term cost. The database engine chosen, PostgreSQL_\, supports +concurrent multi-user data entry and retrieval, which minimizes the +amount of inter-user coordination required and enhances usability. + +These design goals lead to the following design elements. + +Data integrity is enforced within the database. This allows any +program to be used to interact with the database and update database +content. Costs are kept down because generic `Open Source`_ user +interfaces may be used to interact with the database. No matter the +tool used, the integrity of the data is maintained. + +A web-based user interface, including a wiki which allows rapid +web-page development, maximizes availability. The primary method of +interacting with the database is SQL, the industry standard relational +query language. The amount of SQL which must be learned can be, +depending on the amount of development effort spent, reduced to an +absolute minimum through the use of :ref:`views ` -- in short, +pre-packaged queries. + +Data is kept secure though industry-standard practices. These include +the encryption of communications, the association of accounts with +individuals, the secure authentication required for account access, and +the use of in-database access controls to limit the permissions of +user accounts. Because individual people are granted direct, but +controlled, access to the database itself there is no "middleware" +which, when bypassed, has unlimited access to the data. + +A minimal number of bespoke programs limits the amount of code +development required, and, even more significantly, limits the +long-term maintenance costs. The Open Source licensing of the SokweDB +system minimizes cost by sharing long-term development of those +portions of the system used by more than one institution. + +Costs are kept down by minimizing the amount of user-interaction +available through bespoke programs. Interacting with a person, +particularly reporting errors which arise, requires a lot of +programming. Therefore the system is designed around bulk input and +output. + +Individual accounts are given their own, private, workspaces +(:ref:`schemas `). This separates private from shared data, +which allows for better long-term data maintenance. + +About Databases +^^^^^^^^^^^^^^^ + +In PostgreSQL a database is a stand-alone data store. Queries can +easily interact with and combine all data kept within a single +database. Access to data outside a database, from within the +database, is possible but requires additional work that depends upon +the data source. Databases store data in tables. Related singleton datum, such as a single name, a single birthdate, a mother, are kept together in a @@ -40,7 +93,37 @@ the "birthdate"column. .. _views: +About Views +^^^^^^^^^^^ + Views appear to be tables but they are not. Views are virtual, when queried they deliver the results of a query run against the database's actual tables. When setup to do so, changing the data in a view can change data in the database's underlying tables. + + +.. _about_schemas: + +About Schemas +^^^^^^^^^^^^^ + +Schemas partition databases.\ [#f7]_ \ [#f2]_ They work like +directories or folders do in filesystems, but can be only one level +deep. A schema cannot contain another schema. + +Schemas organize database content. One purpose is to allow a user to +focus on the content of some schema(s) and ignore what is in other +schemas. + +.. rubric:: Footnotes + +.. [#f7] The term "schema" is overloaded. A separate meaning defines + a schema as the tables, columns and relationships between + tables that exist within a database. So a schema can denote + the design of a particular database. + +.. [#f2] A PostgreSQL schema can be thought of as a MySQL database, or + vice versa. + + +.. _Open Source: https://en.wikipedia.org/wiki/Open_source diff --git a/doc/src/intro/about_doc.m4 b/doc/src/intro/about_doc.m4 index f2d3fe1..316cc7a 100644 --- a/doc/src/intro/about_doc.m4 +++ b/doc/src/intro/about_doc.m4 @@ -22,16 +22,41 @@ sdb_generated_rst() About This Document ------------------- +This document describes the SokweDB data management system, it's +features and capabilities. +This includes the system design +principals, a description of the database's tables, and the intended +usage of related programs. +It does not describe the procedures used +to enter data into the system or extract data from it. +Nor does it +describe the details of how to operate all related programs, or +maintain the underlying software. + +Conventions +^^^^^^^^^^^ + +In other words, this document describes the system's capabilities. +How to use the system on a day-to-day basis, which features are used +in which ways, and similar, are to be found elsewhere.\ [#f1]_ + +This section describes the conventions used within this document, +which also speaks to the conventions of the overall database design. + +There are a number of conventions regarding character case. The database is case insensitive when it comes to the names used to -identify content -- table names and column names and so forth.\ -[#f6]_ Within this document schema names are written with an +identify content -- table names and column names and so forth.\ [#f6]_ +Within this document schema names are written with an initial capital letter, table and view names are written in all upper case, and column names are written with initial capital letters. -Unless otherwise noted all columns must contain a value. Note that an +Unless otherwise noted all columns must contain a value. +Note that an empty string, the series of characters consisting of zero characters, -is a value. The empty string\ [#f4]_ is a string and so it is -reasonable to compare it with other non-empty strings. The empty +is a value. +The empty string\ [#f4]_ is a string and so it is +reasonable to compare it with other non-empty strings. +The empty string is distinct from ``NULL``\ [#f5]_, which means "no information", and is used when there is no value/no data at all. @@ -40,13 +65,23 @@ mentioned.\ [#f15]_ Care must be taken when updating data unless there is an explicit guarantee of data consistently. This document uses a particular vocabulary, which informs the table -names within the database. The selection of the vocabulary is based on +names within the database. +The selection of the vocabulary is based on the terminology in use at the JGI and may need some study for those not familiar with it. +A Guide for the Reader +^^^^^^^^^^^^^^^^^^^^^^ + +This is a reference document, and as such is not expected to be read +from front-to-back. + .. rubric:: Footnotes +.. [#f1] The wiki, found at https://sokwe.janegoodall.org/, is a good + place to document day-to-day procedures and practices. + .. [#f6] To be precise, *unless special steps are taken* the database, and SQL, is case-insensitive when it comes to names. -- 2.34.1