Choosing appropriate database based on type of data

Hello guys, was just reading a bit about structured and unstructured data. I read that normally, social media posts unstructured data. Basically, in order to store unstructured data, we use databases like MongoDB (a kind of NoSQL database I think). My question is, I remember that I was doing a small project on a social networking site and I read in a post somewhere that it's better to use a relational database like Postgress rather than mongodb. Can someone elaborate why please. Because here we have both structured and unstructured data, so which kind of database/types of database would be the most appropriate.... is it possible to use a combination of both?
18 Replies
Jochem
Jochem6d ago
I'm not sure I see how social media is unstructured data you have posts, maybe media associated with posts, users, friends lists, replies... all are very easily normalized into a relational database the way you determine whether you use a document database (like mongo) or an RDBMS like MySQL or Postgres is very simple: You use the rdbms, unless you have very well defined reasons why you have to use a document db the main reason for me personally, is that an RDBMS forces you to think about the data you're trying to store. You have to define columns and relationships, you have to define datatypes and lengths, choose tables, and use all of that consistently. With a document DB, you're entirely free to just dump whatever garbage you want in there, and it's your own responsibility to keep it organized and sensible with a documentdb you can have multiple versions of your database schema existing simultaneously, which also means your code needs to be able to handle that. With an rdbms you're guaranteed that all your queries will return the same type and shape of data the reality is that practically no one needs a document db over a relational one, especially with the robust JSON support that modern rdbmses have. You can have a JSON column in most databases if you truly have some unstructured data that you can't fit in a database in another sensible way Mongo gained popularity because it's easy to teach the very basics, because the freedom it gives you to just dump whatever data you want in there means that you can set up a quick and dirty todo app without having to teach about tables and data types and normalization of data
13eck
13eck6d ago
This, basically, yeah. Use SQL unless you have a very good reason/need not to. MongoDB gained popularity with the Nodejs crowd because it was basically the ability to store a JS object in the database. No need to worry about what the data is, it's just data. Then Mongoose came around and tried to add structure to it, defining what could be stored and what type of data it was…or you could just use a SQL database and have that right out of the box :p Though for a social media type setup a graph DB might be a better fit. Vertices are users or posts, and edges are the relationships between them: friends/family member edges between users or like/replies between posts. Graph DBs are better at larger traversals that would either kill a RDBMS or require several SELECT calls.
Faker
FakerOP6d ago
Yep I see now, I was initiated to MongoDB without really being explained why we are going to use that but for sure it's surely due to the "beginner friendly". For the unstructured part, didn't know that we can use PG for both structured and "semi-structured" data, seem interesting One thing, graph db is for sure the best thing to do (I think), because we can traverse the graph to find "hidden" relationships, like recommends things etc... My question is, in social networking sites, say hmm X for example or meta, do we use one single db for everything? or we have a combination of types of db, like graph db and relational db?
13eck
13eck6d ago
I would assume one type of DB to make the DB management easier, but I don't run a site like that so I can't say for sure. I'm trying to think of data that would be better served as a relational DB than graph but everything I can think of would be better off as a graph DB. Likes/reshares, block lists, DMs, profiles, etc. Having them all in the one DB makes querying so much easier.
Faker
FakerOP6d ago
yep I see, thanks !
ἔρως
ἔρως6d ago
many many many many many many replicas of the most recent data in many many many many servers and it all then goes from master to master to master, intil it gets to one of the big boy master databases, which usually are replicated once or twice, and kept in sync
Jochem
Jochem6d ago
that's only due to scale
ἔρως
ἔρως6d ago
yes
Jochem
Jochem6d ago
if you're writing one solo, you use one database on one server, because managing two or three different DBMSes or replication is a pain in the ass you won't need until you have hundreds and hundreds of thousands of users
ἔρως
ἔρως6d ago
exactly, but it's good to know that nobody is insane enough to just use a single database fot everything, when you reach a certain size your project will never get to that size
Jochem
Jochem6d ago
so for the solo, learning dev the answer is: Yes, you use one database. Pick any SQL RDBMS and stick with it for a bit until you're comfortable with queries and database design. Even sqlite will serve you perfectly fine up to tens of thousands of unique daily users, probably up past 100k too
ἔρως
ἔρως6d ago
so, dont worry ... be happy yes, for the solo. but the question was about x/twitter by the way, it's not uncommon to use multiple types of databases either, at that scale. using some memory-only database (like redis or memcached) and some relational (like postgresql) and a document database (like cassandra) isnt uncommon but you are very far from needing all of that, and you shouldnt worry about it
Faker
FakerOP5d ago
sorry, I was a bit late, so data is used from database to database ?
ἔρως
ἔρως5d ago
replicated that's one of the correct terms
Faker
FakerOP5d ago
ah ok I see so basically, we can use a Graph database to detect relationships and all and another database storing data can be used for another thing ?
ἔρως
ἔρως5d ago
nothing stops you from doing that, yes
Faker
FakerOP5d ago
yep I see, that was good to know, thanks guys, very insightful conversation, even though I'm seeing myself using only 1 db for now but it's good to know that things like that do exist, thanks 👍
ἔρως
ἔρως5d ago
using a single (or replicating to a 2nd database) is perfectly normal and nothing bad, so, don't worry about it

Did you find this page helpful?