Refresh on SQL

Definition of SQL, relational databases, tables, fields, records, and data types

The required classes to complete my MSDS touched on SQL, but as I’ve started to explore ways to branch out from my roots in social and marketing, I’m finding that many data related roles require more experience than I’ve been able to get since graduating. As part of my roadmap to eventually get into a more data focused position, I decided to spend some time refamiliarizing myself. In the process, figured I’d type my notes up and share on my blog in case others find it helpful.

First, I looked back at my coursework, then checked out some online courses. I pay for an annual Data Camp membership, but there are many other online resources that are free of charge such as https://www.w3schools.com/sql/sql_intro.asp. My local library has a lot of great books as well, although sometimes they’re a little dated. Regardless they will still be a great springboard.

Okay now, onto the good stuff. What is SQL? It’s structured query language that enables you to update, extract or parse out data from a relational database. What is a relational database? “A relational database is a type of database that stores and provides access to data points that are related to one another,” (source: https://www.oracle.com/database/what-is-a-relational-database/). In the example on Data Camp they reference a database from a library as an example. Their system tracks users, checked-out books, fines, etc. in various tables of data. This allows them in turn to analyze for things like how active their patrons are, how much they owe on average in fines, and so on.

When I was first learning about SQL, I remember thinking that these tables reminded me of Excel, and wondered why use databases over Excel files. There are a few key differences that can make databases a better option:

  • They have more storage than spreadsheet applications.
  • The storage is more secure.
  • Many users can write queries to gather insights from the data at the same time.

When data is queried, the data itself doesn’t change, rather it’s accessed and presented based on the instructions in the query. This is kind of like when you run a search on LinkedIn to find a social media professional in your market. Or if you’re a social media marketer and you used software like Brandwatch for monitoring brand chatter – you plug in a query for keywords, exclusions, and so forth until you retrieve the intel you’re seeking out. But at no point are you changing the data itself.

Going back to tables, which are a key component of relational databases, the rows are often referred to as “records,” and the columns as “fields.” Fields are limited to whatever is initially created, but the number or records ins unlimited. When naming tables, you should use all lower case, use underscores instead of spaces, and use plural form – e.g. “patrons” instead of “patron.”

Similarly, when working with fields (or columns), you should also use lowercase, exclude spaces, but should use singular since it’s referring to a singular record. Field names should be unique and lastly, should never be the same as a table name. This way there’s no confusion as to whether you’re referring to a field or table.

Records should also have a unique identifier. This is because in many cases you may have duplicate names or titles. For example, if you have a member database of first names, you may have multiple people named James.

Another helpful nugget of info is that it’s generally better to have more tables than fewer. This is because when you combine too much info you end up with duplicate records, which could make it more complicated to extract and analyze. So instead of combining customer_id with item_purchased, might be best to have one table for customers and one table for sales.

Data types are also important in SQL, and any programming language, because they take up different amounts of space, and effect the operators that can be used. For example, you can’t add text like you can add numbers.

Some other important terms to understand include:

  • Strings: sequences of characters (letter or punctuation). Remember how I mentioned data storage above? Well strings can be short (e.g. 250 characters), and short strings take up far less data. VARCHAR: popular string data type in SQL
  • Integers: store whole numbers. INT is a popular integer data type in SQL
  • Floats: store numbers that include a fractional part, like 5.03. NUMERIC is a popular float data type
  • Schemas: blueprints of databases. Shows its design, relationships for tables, and what type of data type is stored.

That feels like a good stopping point for now. Hopefully you have a better understanding of what SQL is, why it’s beneficial, and better understand relational databases!

Leave a comment