How Twitter Replicates Petabytes of Data to Google Cloud Storage (Cloud Next '19)

1 215
Published on 23 Apr 2019, 1:00
Twitter collects petabytes of data every day and has the challenge of replicating it to multiple destinations based on users' use cases. One such destination is Google Cloud Storage, which acts as primary storage for tools such as BigQuery, Cloud Dataproc, and Cloud Dataflow. In this session, we deep dive into design of this system and challenges we faced at scale and share our learnings in extending Twitter's Replication Service to Cloud Storage. We explain how this self-service enables uses to set up and manage replication of datasets to Google Cloud Service. Today our Replication Service has transferred several tens of petabytes of data and is built to be used by thousands of users replicating hundreds of petabytes to Cloud Storage.

Build with Google Cloud → bit.ly/2KdoExq

Watch more:
Next '19 Data Analytics Sessions here → bit.ly/Next19DataAnalytics
Next ‘19 All Sessions playlist → bit.ly/Next19AllSessions

Subscribe to the GCP Channel → bit.ly/GCloudPlatform


Speaker(s): Lohit VijayaRenu

Session ID: DA300
product:BigQuery,Kubernetes Engine,Cloud IoT Core; fullname:Paul Caponetti;
newstechmusic