Streamlining EdTech Data Management
In the realm of education technology (EdTech), data is king. The sheer volume of data generated by online learning platforms is staggering from student performance metrics to transaction records. As EdTech companies seek to leverage this data for insights and analytics, they often face challenges in managing costs associated with data storage and processing.
In this blog, we present a case study of an EdTech company struggling with the high costs associated with Amazon Relational Database Service (RDS). Our analysis revealed that the company could significantly reduce costs by splitting its data into online transaction processing (OLTP) and online analytical processing (OLAP) streams.
Current Data Landscape
The client’s application utilizes APIs to store vast amounts of data in AWS RDS. The data stored in RDS is used both for immediate transactional purposes (OLTP) and for analytical processing (OLAP), such as generating insights and reports. This setup, while robust, requires provisioning for peak capacity, leading to high costs. As their business grew, they were forced to migrate their database to a larger instance. This was also an operational burden along with increasing costs.
Analysis and Proposal
Our initial analysis focused on understanding the nature of the data being stored in RDS. We found that the data consisted of transaction records, such as student logins, exam submissions, and quiz scores. This data was essential for the company’s day-to-day operations, and it needed to be stored in a highly performant database like RDS.
However, we also discovered that the company was storing a significant amount of metrics data in RDS. This data was used for analytics and reporting purposes, but it did not require the same level of performance as the transaction data.
Based on our analysis, we proposed a solution to split the company’s data into OLTP and OLAP streams.
- OLTP data: This data would continue to be stored in RDS.
- OLAP data: Redirect OLAP data to a more cost-effective storage and processing solution using AWS native services.
- API Enhancement: Modify the existing APIs to bifurcate the data flow based on its usage. OLTP continues to RDS, while OLAP data is rerouted.
- ETL Pipeline for OLAP:
- Data Ingestion: Utilize AWS Lambda functions to intercept OLAP data and push it into an Amazon Kinesis stream.
- Data Transformation and Storage:
– Use AWS Glue for data transformation and cleansing.
– Store the transformed data in Amazon S3, an efficient and scalable object storage service.
- Data Analysis and Reporting:
– Employ AWS Athena for running SQL queries directly against data in S3.
– Integrate with Tableau for advanced analytics and report generation, leveraging the existing BI tool.
- Data Extraction: The data extraction component would use AWS Database Migration Service (DMS) to extract data from RDS. DMS would be configured to replicate only the OLAP data, ensuring that the transaction data remained in RDS.
- Cost Savings: By moving the OLAP data out of RDS and into S3, the company could significantly reduce its RDS costs. S3 is a much more cost-effective storage option for long-term data retention.
- Improved Performance: By separating the OLTP and OLAP workloads, the company could improve the performance of both systems. RDS would be able to focus on handling transaction traffic, while S3 would be able to handle analytics queries without impacting the performance of the OLTP system.
- Scalability: The proposed solution was highly scalable. The ETL pipeline could be easily scaled to handle increased data volumes, and S3 could be scaled to provide unlimited storage capacity.
- Enhanced Analytics: AWS Glue and Athena provide powerful tools for data transformation and querying, facilitating deeper insights.
- Seamless Integration: The proposed solution integrates smoothly with the existing Tableau setup, ensuring a familiar environment for analytics.
By splitting its data into OLTP and OLAP streams, the EdTech company was able to reduce its RDS cost by 70% and improve transaction performance by 23%. The cost of scaling their analytics engine using AWS serverless services was a fraction of the previous cost.