Advice on choosing technology for large project (database 600gb + )

October 9, 2020 at 2:28 PM

Report Discussion We won't let 🔥Carlos know who reported this.

Advice on choosing technology for large project (database 600gb + )

Hello guys, I gonna develop soon a project for a client that has a large database (over 600gb) We will need to create a new system and migrate to new database system.

I need consistency. (no missing keys or invalid data) I need uptime no strange bugs etc. I need to do some updates (no deletes) , a lot of reads and rights. I need to generate some statistics so i will need to be able to filter large amounts of data and generate statistics from that data.

Programming language will be php. Anyone have experience with large databases ?

Thank you

0 Likes 3 Comments

wullinkm

October 11, 2020 at 1:52 PM

Hi Carlos,

Cool, that sounds like a challenging project! It's not so easy to provide a single solution that fits all scenarios. It depends allot on the database as well; how much data is in each table? Another important question is, do you really need such a big database? Often it makes sense to archive data to reporting only databases. Supporting 600 Gb of data with reads and writes is definitely not easy.

Where is the data currently stored? Both MySQL and PostgreSQL are good databases that can handle a lot of data. Migrating that quantity of data to another database is definitely something that will take a long time. So if their data is already in MySQL, it's definitely easier to leave it there.

I hope that sends you a bit in the right way, but if you have some more questions,, hit me up!

0 Likes

🔥Carlos Branco Author

October 11, 2020 at 2:28 PM

Thanks for your comment , I'm thinking about PostgreSQL as well. Maybe also archive old data (like previous years) is a good idea. Thanks again!

0 Likes

Sumit Datta

October 12, 2020 at 4:58 PM

Hello Carlos, I have been part of teams handling data of these sizes or even larger. Here are some pointers I can share:

Size of the database is less important than ingress/egress (write/read) throughput needed
Measure the current throughput so you know what you need and might need in future
INDEX, Index, index - yes really
Find out slow queries, optimize from slowest + most used ones first
Use separate search appliance like Solr/Elastic if your users do search
If you have budget and are OK with cloud managed database, then get that - they are optimized for loads
Egress is easier to scale with read replicas
PostgreSQL has support for parallel queries if you need that for high read/query loads
Use cache heavily with Memcached/Redis and make sure to fine tune cache invalidation as per your needs
Depending on your use case you can do multi-node caches bound to app servers or central cache, whatever works
HTTP handlers should not to compute intensive work, use a task queue (RabbitMQ/Redis)
If you have async (notifications/chat/etc) patterns or "server push" then embrace async in your application layer
Analytics/statistics should not be done on production database, use a data warehouse for that
If you are processing data after user input in multiple stages or have streaming data patterns, look at stream processors like Kafka

Hope that helps, Sumit

Edit:

Added point on slow queries

0 Likes

🔥Carlos Branco Author

October 13, 2020 at 8:53 AM

Thank your for your advice.

0 Likes

Ryan Glass

October 13, 2020 at 9:58 AM

This is gold. Good knowledge!

0 Likes

Krunal Shah

October 14, 2020 at 4:09 PM

I was working on a similar kind of project ( 80GB of DB ). Here is what you can do infrastructure wise.

80GB data was divided into the multiple tables so I made microservices with the Lumen(Swoole) to expose the rest API and consume it with react on the frontend.

DB: RDS with different instances for read and write operations.
ECS/ECR: To host the lumen-swoole microservices.
Amplify: To host the react frontend.
S3: To host the static files and DB backups.
Redis: ElastiCache to store the reports output.

0 Likes

Please sign in to leave a comment.

Welcome to the new Makerlog!

There is still quite a bit of work to be done yet, I apologize if you expected more.

Our goal is to get it out as quickly as possible and iterate from there.

Don't hesitate to let us know if you have any constructive criticism by clicking the feedback button next to your notifications bell.

Thanks for being a part of Makerlog!

— Josh Manders