The Lipad Blog

Tutorial: Opening the Lipad Dataset in PostgreSQL

If you are familiar with basic SQL but have never used PostgreSQL, the following tutorial will help you get started querying the Lipad dataset. It covers installation of PostgreSQL on a Windows machine, importation of the Lipad dataset, and basic querying in pgAdmin3 (the graphical interface for PostgreSQL).

A. Installation

1) Download the PostgreSQL 9.6.1 visual installer, available at https://www.bigsql.org/postgresql/installers.jsp

2) Run the installer. First, the Setup Wizard will ask you to select the installation directory; leave this as default and click Next to proceed.

install1

3) On the Select Components screen, un-check BigSQL Manager II and check pgAdmin3 LTS as shown in the diagram.

install2

4) The next screen selects the location for database storage; leave this as default and proceed.

5) The Password screen asks you to provide a password for the default database user, "postgres". Select a password you will remember and make a note of it as you will need it at a later step in this tutorial.

install3

6) At the Summary screen, click Next to proceed. Wait for the installer to finish, then click Finished when it has completed.

install4

B. Database Setup

7) Start pgAdmin3. On a Windows machine, it should be located at Start Menu > Programs > PostgreSQL > pgAdmin3 LTS by BigSQL.

install5

8) Click on the "plug" icon at the top left of the pgAdmin3 window to begin setting up the connection to our local database. This will open a New Server Registration window.

9) In the New Server Registration window, enter the following information:

Name: localhost

Host: localhost

Port: 5432

Username: postgres

Password: (enter the password you created earlier during installation)

install6

10) When finished, click OK. Look at Object Browser pane of the pgAdmin3 window. You will now have a connection saved called "localhost" in the Object Browser, under Servers. Click the plus sign to expand "localhost" and view its databases. By default, there is one database created for our default user, "postgres"--it is used for maintenance and should not be modified.

11) Next, we will create a new database to hold the Lipad dataset. Right click on Databases and select New Database from the dropdown menu. Fill in the following information in the New Database... window:

Name: lipad

Owner: postgres

When finished, click OK and wait a few moments as the database is created.

install7

12) Click on the plus sign beside Databases and you should see that the new database "lipad" has been created. It will appear with a red X on its icon to signify that you are not currently connected to (working with) this database. Click on "lipad" to connect to the database.

C. Loading The Dataset

13) We are now ready to load the dataset into our newly-created "lipad" database. Download the latest database dump, basehansard-1.0.2.sql.bz2 from the Data page. You may need to right click and select Save Link As... from the menu.

14) Before using the dump file, it must be uncompressed. The free program 7zip, available at http://www.7-zip.org/download.html, is a great utility for this purpose. For a quick tutorial on using 7zip, click here.

Use 7zip to uncompress the file into basehansard-1.0.2.sql and make a note of where this new file is located. It may take some time to uncompress.

15) You will need to use a terminal window to load in the database. On Windows, type "cmd" in the Start Menu search and click on cmd.exe to start a terminal window.

16) You will see a command prompt ending with >. At the prompt, type the following command in the window, replacing the path to the file with the path where your basehansard-1.0.2.sql file is located. For example, I have saved the file at D:\Downloads\basehansard-1.0.2.sql

psql lipad < D:\Downloads\basehansard-1.0.2.sql postgres

install15

17) Hit the Enter key to start the loading process. Words like "ALTER TABLE" and "COPY" will fill the window as the load progresses.

Please note: The restore process can take a few minutes to a few hours depending on your computer speed. The terminal window must be left open until the process completes.

18) When the loading process is complete, the terminal window will return to a prompt as shown in the diagram. You can now close the terminal.

install10

D. Using the Database

19) Return to the pgAdmin3 window.

First, verify you are still connected to the "lipad" database. There should be no red X next to its name; if there is, click on its name to connect (as in step 12).

20) You can now see the database's tables by clicking the plus signs to expand in the following order: lipad > Schemas > Public > Tables.

The main Lipad dataset is contained in the table dilipadsite_basehansard. Other supplementary data tables include dilipadsite_constituency, dilipadsite_member, dilipadsite_party, and dilipadsite_position.

The remainder of tables are related to website functions. They are empty and can be ignored.

21) First, we will verify that the import has been successful. Right click on the main data table, dilipadsite_basehansard, and follow the dropdown menu to View Data > View Top 100 Rows. This will show you the first 100 rows of the table in a new window.

install11

22) To perform SQL queries from the visual interface, click on the "SQL magnifying glass" at the top of the main pgAdmin3 window. A new Query window will open and a blank query tab is ready for you to type your query.

install12

Click the green arrow in the toolbar to run the query and view output visually in this window.

install13

If you wish to save the results of the query to a .csv file, click on the green arrow with the disk icon. Use the following settings to output a .csv file readable in Excel:

install14

E. More Helpful Hints

Returning to Work: When starting a new session in pgAdmin3, you will need to connect to the database to work with it. Start by right clicking the "localhost" entry under Servers in the Object Browser, then select Connect. You may need to enter the password you created earlier. Next, connect to the "lipad" database by clicking on it as normal. You can now begin to query the database.

Updating the Dataset: We update the Lipad database regularly with additions and error corrections. To update your local database with a newer copy, delete the existing "lipad" database by right clicking on its name and selecting Delete/Drop..., then clicking Yes to confirm the deletion. Then, simply repeat from step 11 to create a new database and load a newly-downloaded copy of the data file.

< Return to Blog

Tanya Whyte

PhD Student, University of Toronto