Cloud Computing

背景 2022 年 10 月 27 日马斯克以 440 亿美元完成 Twitter私有化交易并改名X.com，整整一年过去后，X的工程架构也发生巨大的变化，今年10月27日X工程团队发表了一个推文，总结一年的巨大变化，我觉得这个帖子很有代表性，给所有做engineering的同学新的启发。原文 https://twitter.com/XEng/status/1717754398410240018 This has been a year full of engineering excellence that sometimes can go unnoticed. Besides all the visible changes you see on our app, here are some of the most important improvements we have made under the hood. Consolidated the tech stacks for For you, Following, Search, Profiles, Lists, Communities and Explore around a singular product framework. Completely rebuilt the For you serving and ranking systems from the ground up, resulting in a decrease 90% reduction in lines of code from 700K to 70K, a 50% decrease in our compute footprint, and an 80% increase in the throughput of posts scored per request. Unified the For you and video personalization and ranking models, which significantly improved video recommendation quality. Refactored the API middleware layer of our tech stack and in doing so simplified the architecture by removing more than 100K lines of code and thousands of unused internal endpoints and eliminating unadopted client services. Reduced post metadata sourcing latency by 50%, and global API timeout errors by 90%. Blocked bots and content scrapers at a rate +37% greater than 2022. On average, we prevent more than 1M bots signup attacks each day and we’ve reduced DM spam by 95%. Shutdown the Sacramento data center and re-provisioned the 5,200 racks and 148,000 servers, which generated more than $100M in annual savings. In total, we freed up 48 MW of capacity and tore down 60k lbs. of network ladder rack before re-provisioning it to other data centers. Optimized our usage of cloud service providers and began doing much more on-prem. This shift has reduced our monthly cloud costs by 60%. Among the changes we made was a shift of all media/blob artifacts out of the cloud, which reduced our overall cloud data storage size by 60%, and separately, we succeeded in reducing cloud data processing costs by 75%. Built on-prem GPU Supercompute clusters and designed, developed, and delivered 43.2Tbps of new network fabric architecture to support the clusters. Scaled network backbone capacity and redundancy, which resulted in $13.9M/year in savings. Started automated peak traffic failover tests to validate the scalability and availability of the entire platform continuously. 中文版这是工程的卓越一年，但有时却以不被关注的方式进行。除了你在我们的应用程序上看到的所有可见的变化之外，以下是我们在幕后做出的一些最重要的改进。 ...