Using DB Browser for SQLite

DB Browser for SQLite (it’s also called SQLite Browser for short) is an excellent tool for practicing SQL without having to get connected to a real live server. This post will walk through how to install, open, and use SQLite Browser.

Install SQLite Browser

Go to the SQLite Browser website and choose the download for whichever operating system you are using. Open the file and follow installation instructions.

Step 1: Get Set Up

If you want to follow along, download each of these csv files: ad_info.csv, facebook_info.csv, and ad_results.csv. Save them somewhere that you’ll be able to find them (like in a folder dedicated for this example, or your Desktop). We’ll be using them as the tables in our database. Open SQLite Browser the same way you would any other program! You should see a window that looks like this:

opening screen for sqlite browswer

Step 2: Make a Database

To do anything in SQLite Browser, you need to be working within a database. That means every time you start SQLite Browser, you need to either create a new database, or open an existing one. For this example, we’ll create a new one using the New Database button in the top-left corner. SQLite Browser will ask you to save your database – do this just like you would any other file. You can call it whatever you’d like, but for this example we’ll name our database “marketing-db”. Make sure you save it in a folder where you’ll be able to easily find it again, like the Desktop. Then click Save.

Step 3: Import our First Table

Once you click save, this window should appear:

opening screen for sqlite browswer

This window is for typing in your data by hand. That’s super tedious and we won’t be doing that, so close this window. Instead, we’re going to go to File > Import > “Table from CSV file…” A file explorer should pop up (just like it would if you were opening any other document). In that window, browse to where you saved the .csv files. Let’s import the ad_info table first. Double click the ad_info.csv file to open it (or select the file and choose “Open”). You should then see this window:

It should have automatically populated the “Table name” box with ad_info. If it didn’t, or you want to change the table name, go ahead and type ad_info. If you’re doing this on your own for a different project, you can rename the table to whatever you’d like in this box. But for this example, let’s keep the name ad_info so it’s easier to follow. Also, make sure “Column names in first line” is checked.

Finally, take a glance at the preview of the import. For our example, you shouldn’t have to touch any of the other options. But sometimes, depending on how your .csv file is saved, DB Browser may try to squish all of your columns into one. In that case, you usually need to change the Field separator option. But again, you shouldn’t have to touch it for this example.

Step 4: Add the Other Tables

Repeat this process for the facebook_info and ad_results tables. When you’re done, the main screen (Database Structure tab) should look like this:

Step 5: Understand the Interface

Let’s take a second to understand the buttons in SQLite Browser. There are four tabs at the top: Database Structure, Browse Data, Edit Pragmas, and Execute SQL.

opening screen for sqlite browswer

The Database Structure tab is like your schema. It tells you about the tables that are in your database, and the columns in each table. We’ll be coming back to this in the next step to modify our tables.

The Browse Data tab allows you to do just that – browse your data. You can check out the data in the tables, and use the drop-down to switch between tables:

The Edit Pragmas tab allows you to set more advanced options. It’s unlikely you’ll need to use this tab often.

Finally, the Execute SQL tab is where you will actually write SQL queries and run them! Which is what we will do in the next step.

Run Your First Query

Let’s run a query to see how it works! Copy and paste this query into the top box in the Execute SQL tab:


SELECT *
FROM ad_info

If you aren’t familiar with SQL, don’t worry too much about this query/syntax right now. What it’s doing is selecting all of the columns from the ad_info table. Hit the triangle button to run the query. The whole thing should look like this:

Cool right! We are able to run SQL queries without actually setting up a real live server. Also take note that SQLite Browser gives us some information about the query in the box below the preview of the results. We can see that there are 149 rows total in our result, and how long it took SQLite to run the query.

Let’s run one more query. Copy and paste this SQLite browser next and hit run:


SELECT *
FROM ad_info
WHERE total_budget > 60000

Hmmmm nothing changed. We still go the same number of rows in the result (149), and there are still rows that have a total_budget of greater than $60,000. Why? DB Browser imports all columns as text columns by default. So it isnt’ recognizing total_budget as a number, and therefore doesn’t know how to find values greater than $60,000. Don’t worry, we can fix this!

Modify the Column Types in the Tables

Since SQLite Browser automatically imports all columns in all tables as TEXT, we need to manually change the data type of the non-text columns. Go back to the Database Structure tab, and click on the ad_info table. You can tell you’ve selected it because it should be highlighted in blue. Then click the Modify Table button. Finally, change the Type dropdown for the total_budget column to integer. Once again, a GIF of the whole thing:

Now go back to the Execute SQL tab and try running the query again (just click the triangle again to re-run it).

Notice it now only returns 61 rows! And these are the correct rows – with total budgets over $60,000. It’s important to remember to change the data types as soon as you import data into SQLite Browser.

Takeaways

Congrats! If you made it here, you now have a pretty good idea of how to use SQLite Browser. This page is also a great reference to keep handy in case you forget how to do something like change the data type, or import tables.

Understanding JOINs in SQL

Joins in SQL

Joins are one of the most important (if not THE most important) concepts in SQL. If you take the time to solidly understand how joins work, you’ll be in an excellent place for writing queries. So, let’s dive in!

Join Definitions

Joining tables in SQL is a way of combining them. It is directly comparable to a VLOOKUP in Excel, so if you are familiar wtih VLOOKUPs you already understand how joins work. For example, let’s say you have a table that contains orders placed by your customers (each row in that table represents a unique order). Then you have a second table that is only items that have been returned by your customers (each row in that table represents a unique order too, but not all orders are returned so the table has fewer rows).

Orders Table

order_idproductcost_per_unittotal_amountprofitshipping_method
1A5705Air
2B1012097Air
3C15150135Air
4A52010Air
5B1011096Air
6C15305Air
7A5459Air
8B1015064Air
9C1521063Air
10A56544Air
11B1012076Air
12C153018Air
13A52019Ground
14B1010059Ground
15C153014Ground
16A55515Ground
17B10101Ground
18C1510528Ground
19A57537Ground
20B105041Ground
21C15159Ground
22A53022Ground
23B10120117Ground
24C15195166Ground
25A5201Ground

Returns Table

idamount_returnedReason
1510Broken
2115No Reason Given
2485Unsatisfied
715Unsatisfied
1220Unsatisfied
9130Unsatisfied
1710No Reason Given
2115No Reason Given
1870Broken
1710Broken

You can’t just copy and paste these tables next to each other to figure out what was returned, because the rows won’t match up. You need a way of comparing each row in the orders table to the returns table to see if a match exists, and then bring back the information from the returns table that you’re interested in. Enter joins!

Let’s take a look at join syntax first:


SELECT *
FROM table_1 INNER JOIN table_2
ON table_1.id = table_2.id
	

Notice the ON statement, which is required for every join. You need to pick a column from each table that you want SQL to compare in order to join them together, and that’s what the ON statement is specifying. This example says compare the id column in table_1 to the id column in table_2 when trying to match up rows between the two tables during the join.

How do you choose which colunns to use for the ON statement? Generally you want to pick a column that uniquely identifies each row, like an id number. The two columns don’t have to have the exact same name, but they need to be the same type (for example, integer or text) and represent the same thing. The order_id columnn from the orders table above is the same thing as the id column in the returns table – they both represent the id of the order.

There are four types of joins: left, right, inner and outer. The right join functionally does the same thing as the left join, so many SQL programs are choosing not even to support it anymore. For that reason, we won’t be covering it in this article.

As usual, we’ll start with a basic example. Say you have a database with the following two tables in it:

nutrition1

idnamecalciumserving_sizeweight
1apples101 ea138
2asparagus221/2 cup90
3avocado191 ea173
4bamboo shoots101 cup131
5banana71 ea114
6beets91/2 cup85
7blackberries461 cup144
8Blueberries raw91 cup145
9broccoli2051 spear180
10brussels sprouts561 cup156

nutrition2

idpercent_waterenergyproteinfood_type
184800.3FRUITS
292222.3VEGETABLES AND LEGUMES
3733054FRUITS
494252.3VEGETABLES AND LEGUMES
5741051.2FRUITS
1190942.4FRUITS
1288310.7VEGETABLES AND LEGUMES
1392151.2VEGETABLES AND LEGUMES
149560.3VEGETABLES AND LEGUMES
1581490.8FRUITS

The nutrition1 table contains information about foods, and the nutrition2 table contains different information about foods. We’re going to join on the id column (this will be our ON statement) because it’s the unique identifier for each row in both of the tables. So, take note that nutrition1 has id numbers 1-10, but nutrition2 has id numbers 1-5 and then skips to 11-15.

Inner Joins

When you inner join two different tables together, SQL will compare the columns you specify in the ON statement and only keep rows that those columns have in common. Looking at our nutrition tables, the only id numbers that the two have in common are 1 through 5. So when we run this query:


SELECT *
FROM nutrition1 INNER JOIN nutrition2
ON nutrition1.id = nutrition2.id

we’ll get this result:

idnamecalciumserving_sizeweightidpercent_waterenergyproteinfood_type
1apples101 ea138184800.3FRUITS
2asparagus221/2 cup90292222.3VEGETABLES AND LEGUMES
3avocado191 ea1733733054FRUITS
4bamboo shoots101 cup131494252.3VEGETABLES AND LEGUMES
5banana71 ea1145741051.2FRUITS

Cool! Now we have more information than we previously did about apples, asparagus, etc. Notice that the result displays 1-5 from nutrition1, lined up next to those columns 1-5 from nutrition2, and everything else is gone. No other rows appear in the result because inner joins only keep what the two tables have in common, and nothing else.

Left Joins

When you perform a left join, SQL keeps everything from the left table, no matter what, and then only pulls in information from the right table that matches the left based on the columns you specify in the ON statement. So when we run this query:


SELECT *
FROM nutrition1 INNER JOIN nutrition2
ON nutrition1.id = nutrition2.id

we’ll get this result:

idnamecalciumserving_sizeweightidpercent_waterenergyproteinfood_type
1apples101 ea138184800.3FRUITS
2asparagus221/2 cup90292222.3VEGETABLES AND LEGUMES
3avocado191 ea1733733054FRUITS
4bamboo shoots101 cup131494252.3VEGETABLES AND LEGUMES
5banana71 ea1145741051.2FRUITS
6beets91/2 cup85NULLNULLNULLNULLNULL
7blackberries461 cup144NULLNULLNULLNULLNULL
8Blueberries raw91 cup145NULLNULLNULLNULLNULL
9broccoli2051 spear180NULLNULLNULLNULLNULL
10brussels sprouts561 cup156NULLNULLNULLNULLNULL

For this result, we have the entire nutrition 1 table (id numbers 1-10) and lined up next to it the rows from the nutrition2 table that matched base on the id (just 1-5). Notice the NULL values. The nutrition2 table didn’t have rows for id numnbers 6-10, but we had to keep those rows because the definition of a left join tells us to keep everything from the LEFT table, so the nutrition2columns for id numbers 6-10 get filled in with NULLs.

Outer Joins

Finally, let’s take a look at what an outer join does. Here’s the query:


SELECT *
FROM nutrition1 INNER JOIN nutrition2
ON nutrition1.id = nutrition2.id

and the result:

idnamecalciumserving_sizeweightidpercent_waterenergyproteinfood_type
1apples101 ea138184800.3FRUITS
2asparagus221/2 cup90292222.3VEGETABLES AND LEGUMES
3avocado191 ea1733733054FRUITS
4bamboo shoots101 cup131494252.3VEGETABLES AND LEGUMES
5banana71 ea1145741051.2FRUITS
6beets91/2 cup85NULLNULLNULLNULLNULL
7blackberries461 cup144NULLNULLNULLNULLNULL
8Blueberries raw91 cup145NULLNULLNULLNULLNULL
9broccoli2051 spear180NULLNULLNULLNULLNULL
10brussels sprouts561 cup156NULLNULLNULLNULLNULL
NULLNULLNULLNULLNULL1190942.4FRUITS
NULLNULLNULLNULLNULL1288310.7VEGETABLES AND LEGUMES
NULLNULLNULLNULLNULL1392151.2VEGETABLES AND LEGUMES
NULLNULLNULLNULLNULL149560.3VEGETABLES AND LEGUMES
NULLNULLNULLNULLNULL1581490.8FRUITS

Outer joins keep everything, no matter what, and line up rows that match. Here, we have the entire nutrition1 table, and next to it the nutrition2 table but broken up so that it matches the id numbers of nutrition1 properly.

Takeaways

Understanding the definitions of the joins and how each fundamentally works is very important in SQL. Usually, you’ll be working with so much data that you have to join the tables together in SQL and can’t just pull everything into Excel and do a VLOOKUP. Excel can only handle so many rows (a little over a million) and so many columns (a little over 16,000). That sounds like a lot but SQL databases can hold millions and with more and more data being generated and available every day, there’s a good chance that even at a small company you’ll be dealing with more information than can fit into Excel. Sometimes too, your data will fit into Excel but the VLOOKUP will be so computationally taxing that Excel will simply crash or go so slow it isn’t worth it. SQL is built for stuff like this, so it will compute joins much faster. Check out my understanding SQL blog post for a more in-depth discussino of what SQL is and when to use it.

Understanding Multiple JOINs in SQL

Multiple joins are one of the toughest SQL concepts – in this post we’ll decode them and review some common pitfalls.

One of the best ways to learn is with an example. If you’d like to follow along, you can download this zip file that contains the three tables as .csv files here, and import them into DB Browser for SQLite. Read my post on how to use SQLite Browser here.

Say you work on the marketing team at a software company. Your database has the following tables.

ad_info

Each row represents a unique campaign your company has run. All campaigns are included in this table.

ad_id

INTEGER

date_launched

DATE

total_budget

INTEGER

launching_team

TEXT

internal_purpose

TEXT

fb_info

Each row represents a unique Facebook campaign. Only Facebook campaigns will show up in this table!

ad_id

INTEGER

fb_id

INTEGER

impressions

INTEGER

unique_reaach

INTEGER

unique_clicks

INTEGER

engagement_rate

FLOAT

ad_results

Each row represents a unique campaign your company has run. All campaigns are included in this table.

ad_id

INTEGER

customers_engaged

INTEGER

revenue

INTEGER

products_sold

INTEGER

employees_hired

INTEGER

new_customers

INTEGER

success_score

INTEGER

Your boss asks you: what percentage of ad campaigns that were launched by the Europe or Australia teams and had success scores above 4 were Facebook campaigns?

To answer this question, we need to do a few things. Let’s break it down:

  • Filter results so we are only seeing European and Australian campaigns that had success scores above 4. We’ll do this in the WHERE clause. Don’t pay too much attention to this part for this example – we’re doing this to establish why we need all three tables, but the point of this example is to understand the joins. I’m not trying to trick anyone with the WHERE clause 🙂

  • Join the tables together so all the information is in one place.

  • Get the percentage of these specific campaigns that were Facebook campaigns. To do that, we’re going to count the number of rows in our result that have any information in the fb_information columns, and divide by the total number of rows in the result.

Okay, now we have all of the information we need to get started. You go back to your desk and you think “hmmmm…. I’m not sure how to do this.” We’re going to go through a few incorrect answers first before we look at the correct one so you can understand why the correct one is the right way to approach the problem.

Attempt 1: All Inner Joins

So, first you try all inner joins. Here’s a graphical representation of the joins:

A box represents the table
—- a line represents an inner join
and an arrow represents a left outer join.

ad_info

facebook_info

ad_results

And this is what the query looks like. I chose only to select the columns we need to answer the question, plus a few extras in the SELECT statement because if I selected all the columns, the table would be too large to fit easily into this webpage. But, for your own practice, you could do SELECT * if you want to see all the columns.


SELECT ad_info.ad_id, launching_team, fb_id, impressions, unique_reach, unique_clicks, engagement_rate, success_score
FROM ad_info INNER JOIN facebook_info
ON facebook_info.ad_id = ad_info.ad_id
INNER JOIN ad_results ON ad_results.ad_id = ad_info.ad_id
WHERE success_score > 4 AND launching_team IN ('Europe', 'Australia');

And finally, this is the result it produces:

Result:

ad_idlaunching_teamfb_idimpressionsunique_reachunique_clicksengagement_ratesuccess_score
9Europefb-76475346371330540.8225154867
15Europefb-89829603950850750.5337610436
16Australiafb-37639997685451070.7451123435
22Australiafb-20113636246416140.6550324686
27Europefb-74482677170213940.8190364287
33Europefb-15267135418541430.9899641585
39Europefb-651310188626990.8109048727
40Australiafb-36808755578552120.9009507358
52Australiafb-15844808440927500.62372425
124Australiafb-38668283820943720.5325861866
130Australiafb-29772964287825560.88811674810
135Europefb-83596316530837580.7079879438
136Australiafb-95692174207818130.8724735327
142Australiafb-17907966574351860.9030123638
147Europefb-29334202303028870.9528052815

Remember, we are looking for the number of rows with data in the Facebook columns (fb_id, impressions, unique_reach, unique_clicks, engagement_rate) divided by the total number of rows in the result. This result looks like 100% of the European and Australian campaigns with a success_score greater than 4 were Facebook campaigns because all of the rows in the result have values in the Facebook columsn.

That should set off a red flag! Whenever you get something like this (100%, 0 rows, etc.) you should double-check your work. Remember, the scariest thing in SQL is not an error, but an incorrect result that we think is correct! What actually happened here is we’ve eliminated all of the non-Facebook European and Australian campaigns with success scores greater than 4.

Before I explain why, there are a few very important things to understand.

When you are doing all inner joins or all outer joins, whether you are joining 3 tables or 300, the order of the joins does not matter. Think back to the definition of the joins (check out this blog post for a refresher). Inner joins only keep what the tables have in common. So it doesn’t matter if I switch the order of the tables, I’ll still get a result that only contins rows that all tables have in common. An outer join will keep all rows from all tables, no matter what. So again, it doesn’t matter if I switch the order of the tables because SQL will always keep all rows.

It’s not the way it looks. First, it takes the entire ad_info table and inner joins it to the facebook_info table, and gets that result. Then it takes THAT RESULT and inner joins it to the ad_results table! So it’s not ad_info joined to facebook_info, and then facebook_info joined to ad_results. It’s the RESULT of ad_info-Inner-Join-facebook_info joined to ad_results.

It’s like you’re creating a cumulative table (but not always one that’s getting larger, it dependso n the type of JOINs you are doing). This makes multiple joins more difficult, because SQL will not show you the result of each join. You have to have an idea of what each result looks like at each

Therefore, the reason we got 100% is because the only thing these tables have in common is Facebook campaigns, since that is all that is in the facebook_info table. So, all inner joins is not the way to go.

Attempt 2: Mixed Joins Starting with facebook_info

Now let’s say we start with the facebook_info table, do a left outer join on the ad_info table next, and then an inner join to the ad_results table.

facebook_info

ad_info

ad_results

Here’s the query:


SELECT ad_info.ad_id, launching_team, fb_id, impressions, unique_reach, unique_clicks, engagement_rate, success_score
FROM facebook_info LEFT JOIN ad_info
ON facebook_info.ad_id = ad_info.ad_id
INNER JOIN ad_results ON ad_results.ad_id = ad_info.ad_id
WHERE success_score > 4 AND launching_team IN ('Europe', 'Australia');

Result:

ad_idlaunching_teamfb_idimpressionsunique_reachunique_clicksengagement_ratesuccess_score
9Europefb-76475346371330540.8225154867
15Europefb-89829603950850750.5337610436
16Australiafb-37639997685451070.7451123435
22Australiafb-20113636246416140.6550324686
27Europefb-74482677170213940.8190364287
33Europefb-15267135418541430.9899641585
39Europefb-651310188626990.8109048727
40Australiafb-36808755578552120.9009507358
52Australiafb-15844808440927500.62372425
124Australiafb-38668283820943720.5325861866
130Australiafb-29772964287825560.88811674810
135Europefb-83596316530837580.7079879438
136Australiafb-95692174207818130.8724735327
142Australiafb-17907966574351860.9030123638
147Europefb-29334202303028870.9528052815

We get the exact same result as the inner joins. Why? Again, think back to the definition of the joins. A left join keeps everything from the left table (no matter what) and then only pulls in information from the right table that matches. So here, we kept everything from the facebook_info table, and only pulled in information from the ad_info table that matched as our first join. And of course, as we saw from the first example, the only rows from ad_info that match facebook_info are Facebook campaigns. Then when we take that result of the first join (which only contains Facebook campaigns) and join it to the ad_results table, we still remove the non-Facebook campaigns because the inner join only keeps what’s common between the result of the first join and the ad_results table.

Attempt 3: Mixed Joins Starting with ad_info

Finally, let’s try starting with the ad_info table, left outer joining the facebook_info table, and then inner joining the ad_results table.

ad_info

facebook_info

ad_results

And the query:


SELECT ad_info.ad_id, launching_team, fb_id, impressions, unique_reach, unique_clicks, engagement_rate, success_score
FROM ad_info LEFT JOIN facebook_info
ON facebook_info.ad_id = ad_info.ad_id
INNER JOIN ad_results ON ad_results.ad_id = ad_info.ad_id
WHERE success_score > 4 AND launching_team IN ('Europe', 'Australia');

Result:

ad_idlaunching_teamfb_idimpressionsunique_reachunique_clicksengagement_ratesuccess_score
9Europefb-76475346371330540.8225154867
15Europefb-89829603950850750.5337610436
16Australiafb-37639997685451070.7451123435
22Australiafb-20113636246416140.6550324686
27Europefb-74482677170213940.8190364287
33Europefb-15267135418541430.9899641585
39Europefb-651310188626990.8109048727
40Australiafb-36808755578552120.9009507358
52Australiafb-15844808440927500.62372425
63EuropeNULLNULLNULLNULLNULL6
69EuropeNULLNULLNULLNULLNULL8
70AustraliaNULLNULLNULLNULLNULL10
75EuropeNULLNULLNULLNULLNULL7
81EuropeNULLNULLNULLNULLNULL8
82AustraliaNULLNULLNULLNULLNULL10
87EuropeNULLNULLNULLNULLNULL6
88AustraliaNULLNULLNULLNULLNULL10
94AustraliaNULLNULLNULLNULLNULL10
99EuropeNULLNULLNULLNULLNULL6
100AustraliaNULLNULLNULLNULLNULL7
105EuropeNULLNULLNULLNULLNULL5
111EuropeNULLNULLNULLNULLNULL8
124Australiafb-38668283820943720.5325861866
130Australiafb-29772964287825560.88811674810
135Europefb-83596316530837580.7079879438
136Australiafb-95692174207818130.8724735327
142Australiafb-17907966574351860.9030123638
147Europefb-29334202303028870.9528052815

That gets us the result we want! We can see that only 15 out of the 28 European and Australian campaigns with success rates greater than 4 were Facebook campaigns. That’s 53.57%, not 100%!

Takeaways


Okay, let’s review!

If you have all inner joins or all outer joins, the order doesn’t matter. But as soon as you start mixing and matching join types, the order can change your result. It’s important to understand the definitions of the joins, and also imagine the result of each join as you do it, because that result is what’s being joined to the next table. Think about what rows may be lost or gained with each join.

What if you didn’t know that the facebook_info table only contained rows Facebook ad compaigns? You might have stopped at the first attempt (which is incorrect!). You could be SQL Syntax Expert of the Universe, but if you don’t know the data, you’ll produce wrong answers.

It’s critically important to know what type of information can show up in any given table, any given column, and if you’re working across multiple servers, any given database or server! Think of another example: let’s say you’re looking at geographic data and there’s a STATE column. Throughout the history of your company, some people coded Pennsylvania as “PA”; others coded it as “Pa.”; still others coded it as “Penn.”, etc. SQL doesn’t know these are all Pennsylvania! So if you are tryign to query the database for all customers living in Pennsylvania and you don’t incorporate this knowledge into your query, you’ll get the wrong answer. Your query may be syntactically perfect, but you’ll still get the wrong answer.

This doesn’t apply 100% of the time, but it’s a good rule of thumb to get started if this is the first time you are working with multiple joins. As we saw in Attempt 2, when you start with the smaller table, it’s impossbile to get that data back without doing right or outer joins. Right joins are not supported by many SQL programs anymore since it is the same as the left join but backwards, and outer joins often will bring back more information than you need. In my experience, left and inner joins are the most common types of joins and what you’ll mostly use in practice. But, you *could* do this exact problem differently using outer or right joins.

People get frustrated when they see errors in SQL, but getting an error is WAY BETTER than getting an incorrect result and thinking it’s correct. Just because SQL produces a result, doesn’t mean it’s what you originally wanted. You have to understand all of the things mentioned above (the definitions of the joins, how multiple joins work, what your data looks like. etc.) to ensure you are getting the correct answer.

Phew! You made it to the end, congratulations! Comments are welcome, let me know if you have questions, spot a typo, want to suggest a correction, or simply post your thoughts!