Mathesar is a tool (still in the process of development, but already highly usable) for exploring a database using a spreadsheet-like interface. Pretty much everybody uses spreadsheets, which are a useful tool, if overused. Databases are more mysterious, although in fact databases are the bedrock of almost all modern computing. Databases have some key differences from spreadsheets, although there are also some (sperficial) similarities:
In a database, the data is separate from the use of it. A library might have a database of all its (e)books, magazines, and other media, which anybody can search. Find books on lunar geography published since 2000? You can search the database and find them. Similarly an educational institution will have a database of all its students, which staff members can search.
Unlike a spreadsheet, a database can contain any sort of information: text, images, media files; any digitized or electronic item can be stored in a database.
A database will store its information in sets of tables, which are similar to spreadsheets only in that they are tabular. A database record (analogous to a row in a spreadsheet), will consist of a number of fields (analogous to spreadsheet columns), each of which will contain a specified type of data element: text, data, numerical information, image etc. Unlike a spreadsheet, it's impossible to enter the wrong type of information into a field.
In a spreadsheet, all actions are carried out in the spreadsheet itself; thus a column can be over-ridden with (for example) with the sum of values from other columns. And these new values can be used for further calculations. This is impossible in a database. For that reason, spreadsheets are prone to errors, and errors are very hard to find.
Amending and editing a database usually requires permissions that an ordinary user will not have. Some databases have different levels of permissions. A university, for example, may allow students to see, but not edit, their individual records. A class teacher may have access to all the students in their class, and may have some minor editing capacity (such as entering marks and grades). And so on, right up to the database administrators, who have ultimate control.
Databases can be much much bigger than spreadsheets, not only in the number of records, but also in file sizes. Databases maintained by Amazon (of its customers) and of the Library of Congress have hundreds of millions of records, and multiple terabytes of data.
In comparison, a spreadsheet may only have a limited size: a few thousand rows maybe.
Database management systems are thus highly optimized for speed of searching and of filtering. There are many such systems, many based on SQL ("Structured Query Language"); some commercial, some open-source. All have different advantages, depending on use case. Mathesar uses Postgres (more formally, PostgreSQL), which is one of the world's most powerful such systems. There have been some requests from the user community for support for other systems (MySQL, MariaDB etc), but the developers have more urgent matters.
Because of the size and importance of databases, their study remains an active area of research, both in computer science, and in information management.
To use mathesar to view and explore the ATCM database, go to:
https://mathesar.numbersandshapes.net
Log in with:
guest
g.u.e.s.t.
To explore the papers, there should be several databases shown, choose "ATCM Papers".
You can open either of the tables and browse through them.
Explorations
a. Suppose we want to find out which papers refer to origami.
Select "Explorations"
Then "Open Data Explorer"
From the drop down menu in "Select Table" (in the upper left), select "atcm_papers"
We can now select the columns we wish to explore, say "Title" and "Abstract": just go to the list of columns, and "Add" should appear when you move the cursor on it. These columns should appear in the main central area.
Now choose "Transform results" - above the list of columns.
Choose "Filter", and column "atcm_papers_Title"
Change "equals" in the pull down menu to "contains"
in the text box, enter "origami"
if you want to find out the authors, just add "Authors" from the list of columns.
b. How many authors have come from Malaysia?
Choose "Start over".
Select the "atcm_authors" table
choose "Author" and "Country" columns
for transformations, choose first "Filter" on atcm_authors_Country
, using
equals
, and enter "Malaysia" in the text box.
Next, add another transformation, "Summarization" on atcm_authors_Author
and
choose List
as the summary method. This will give a list of all the
individual authors. You can see the number below.
It's clear that there are some authors who are misrepresented by two (or more) spellings of their names: hyphens or spaces, or neither. These can all be fixed.
Explorations with small output. If the output is less than 10 records, a quick way (for example, using 2a from above):
Open up "atcm_papers" and click on the little search button to the right of its name, at the top.
You now have "Open a Record from atcm_papers".
in the space a the opt of the "Title" column, enter "origami". As you enter the word, you'll see that the number of records decreases with every letter you enter: this automatically assumes you are using the "contains" operator.
However, if there are more records than 10, only the first 10 will be shown. Try entering "Geogebra" in the Title box, for example.
Then enter "Malaysia" in the "Location" box.
This is a quick and dirty method; for proper analysis you'd need to filter and transform as described above.
Explore these databases as you like, and report errors (there will be many) to Alasdair McAndrew at amca01@gmail.com