Choosing appropriate database based on type of data
Hello guys, was just reading a bit about structured and unstructured data.
I read that normally, social media posts unstructured data. Basically, in order to store unstructured data, we use databases like MongoDB (a kind of NoSQL database I think).
My question is, I remember that I was doing a small project on a social networking site and I read in a post somewhere that it's better to use a relational database like Postgress rather than mongodb. Can someone elaborate why please. Because here we have both structured and unstructured data, so which kind of database/types of database would be the most appropriate.... is it possible to use a combination of both?
18 Replies
I'm not sure I see how social media is unstructured data
you have posts, maybe media associated with posts, users, friends lists, replies... all are very easily normalized into a relational database
the way you determine whether you use a document database (like mongo) or an RDBMS like MySQL or Postgres is very simple: You use the rdbms, unless you have very well defined reasons why you have to use a document db
the main reason for me personally, is that an RDBMS forces you to think about the data you're trying to store. You have to define columns and relationships, you have to define datatypes and lengths, choose tables, and use all of that consistently. With a document DB, you're entirely free to just dump whatever garbage you want in there, and it's your own responsibility to keep it organized and sensible
with a documentdb you can have multiple versions of your database schema existing simultaneously, which also means your code needs to be able to handle that. With an rdbms you're guaranteed that all your queries will return the same type and shape of data
the reality is that practically no one needs a document db over a relational one, especially with the robust JSON support that modern rdbmses have. You can have a JSON column in most databases if you truly have some unstructured data that you can't fit in a database in another sensible way
Mongo gained popularity because it's easy to teach the very basics, because the freedom it gives you to just dump whatever data you want in there means that you can set up a quick and dirty todo app without having to teach about tables and data types and normalization of data
This, basically, yeah. Use SQL unless you have a very good reason/need not to. MongoDB gained popularity with the Nodejs crowd because it was basically the ability to store a JS object in the database. No need to worry about what the data is, it's just data.
Then Mongoose came around and tried to add structure to it, defining what could be stored and what type of data it was…or you could just use a SQL database and have that right out of the box :p
Though for a social media type setup a graph DB might be a better fit. Vertices are users or posts, and edges are the relationships between them: friends/family member edges between users or like/replies between posts. Graph DBs are better at larger traversals that would either kill a RDBMS or require several SELECT calls.
Yep I see now, I was initiated to MongoDB without really being explained why we are going to use that but for sure it's surely due to the "beginner friendly". For the unstructured part, didn't know that we can use PG for both structured and "semi-structured" data, seem interesting
One thing, graph db is for sure the best thing to do (I think), because we can traverse the graph to find "hidden" relationships, like recommends things etc...
My question is, in social networking sites, say hmm X for example or meta, do we use one single db for everything? or we have a combination of types of db, like graph db and relational db?
I would assume one type of DB to make the DB management easier, but I don't run a site like that so I can't say for sure. I'm trying to think of data that would be better served as a relational DB than graph but everything I can think of would be better off as a graph DB. Likes/reshares, block lists, DMs, profiles, etc. Having them all in the one DB makes querying so much easier.
yep I see, thanks !
many many many many many many replicas of the most recent data in many many many many servers
and it all then goes from master to master to master, intil it gets to one of the big boy master databases, which usually are replicated once or twice, and kept in sync
that's only due to scale
yes
if you're writing one solo, you use one database on one server, because managing two or three different DBMSes or replication is a pain in the ass you won't need until you have hundreds and hundreds of thousands of users
exactly, but it's good to know that nobody is insane enough to just use a single database fot everything, when you reach a certain size
your project will never get to that size
so for the solo, learning dev the answer is: Yes, you use one database. Pick any SQL RDBMS and stick with it for a bit until you're comfortable with queries and database design. Even sqlite will serve you perfectly fine up to tens of thousands of unique daily users, probably up past 100k too
so, dont worry ... be happy
yes, for the solo. but the question was about x/twitter
by the way, it's not uncommon to use multiple types of databases either, at that scale. using some memory-only database (like redis or memcached) and some relational (like postgresql) and a document database (like cassandra) isnt uncommon
but you are very far from needing all of that, and you shouldnt worry about it
sorry, I was a bit late, so data is used from database to database ?
replicated
that's one of the correct terms
ah ok
I see
so basically, we can use a Graph database to detect relationships and all and another database storing data can be used for another thing ?
nothing stops you from doing that, yes
yep I see, that was good to know, thanks guys, very insightful conversation, even though I'm seeing myself using only 1 db for now but it's good to know that things like that do exist, thanks 👍
using a single (or replicating to a 2nd database) is perfectly normal and nothing bad, so, don't worry about it