以“升舱”之名,谈谈云原生数据仓库AnalyticDB的核心技术
整体架构
使用场景与生态集成
产品形态与硬件平台
向量化执行引擎
每次读取和使用相同逻辑处理一批记录数据,能获得更高的CPU指令和数据缓存命中率[7]。 从一次函数调用处理一条记录,到一次函数调用处理一批数据,同时JIT则直接避免了函数调用,总体函数调用次数和开销[8]减少。 内存的分配回收,也从每条记录的分配回收,到每批记录的分配和回收,整体减少内存分配回收次数和碎片管理开销[9]。 在按批处理模型下,代码实现能更好地以向量化方式实现,一方面有利于CPU进行数据预取,另一方面尽可能减少程序的条件跳转(来自if...else...,switch等分支判断)和无条件跳转(来自函数调用),让CPU获得更好的指令流水线执行[10],减少分支预测[11]失败,同时也有利于编译器生成SIMD[12]指令,提高执行效率。
多态化存储引擎
自适应优化器
多租户资源隔离
云原生架构升级
[1]https://www.aliyun.com/product/gpdb
[2]https://arxiv.org/pdf/2103.11080.pdf
[3]https://www.postgresql.org/
[4]https://www.postgresql.org/docs/11/jit.html
[5]https://www.youtube.com/watch?v=Hn4l8XC3-PQ
[6]https://www.youtube.com/watch?v=DvhqgmRQuAk
[7]https://medium.com/@tilakpatidar/vectorized-and-compiled-queries-part-2-cd0d91fa189f
[8]https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.0?topic=performance-reducing-function-call-overhead
[9]https://en.pingcap.com/blog/linux-kernel-vs-memory-fragmentation-1/
[10]https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-1-execution-stages-and-throughput/
[11]https://www.geeksforgeeks.org/correlating-branch-prediction/
[12]http://ftp.cvut.cz/kernel/people/geoff/cell/ps3-linux-docs/CellProgrammingTutorial/BasicsOfSIMDProgramming.html
[13]https://cvw.cac.cornell.edu/codeopt/memtimes
[14]https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-2-dependencies-and-data-hazard/
[15]https://www.interdb.jp/pg/pgsql01.html
[16]https://aws.amazon.com/cn/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-playbook-compound-and-interleaved-sort-keys/
[17]https://clickhouse.com/docs/en/guides/improving-query-performance/skipping-indexes/
[18]https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html
[19]https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html
[20]https://docs.snowflake.com/en/user-guide/tables-clustering-keys.html
[21]https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/
[22]https://docs.databricks.com/delta/optimizations/file-mgmt.html
[23]https://aws.amazon.com/cn/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-playbook-compound-and-interleaved-sort-keys/
[24]https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html
[25]https://docs.singlestore.com/managed-service/en/create-a-database/physical-database-schema-design/concepts-of-physical-database-schema-design/columnstore.html
[26]https://www.postgresql.org/docs/10/rules-materializedviews.html
[27]https://gpdb.docs.pivotal.io/6-2/ref_guide/sql_commands/CREATE_MATERIALIZED_VIEW.html
[28]https://oracle-base.com/articles/misc/materialized-views
[29]https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html
[30]https://docs.snowflake.com/en/user-guide/views-materialized.html
[31]https://www.postgresql.org/docs/current/routine-vacuuming.html
[32]https://www.postgresql.org/docs/current/routine-vacuuming.html
[33]https://docs.vmware.com/en/VMware-Tanzu-Greenplum/6/greenplum-database/GUID-best_practices-analyze.html
[34]https://www.aliyun.com/product/oss
[35]https://www.postgresql.org/docs/current/postgres-fdw.html
[36]https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html
[37]https://docs.snowflake.com/en/user-guide/tables-external-intro.html
[38]https://www.postgresql.org/docs/current/continuous-archiving.html
[39]https://docs.vmware.com/en/VMware-Tanzu-Greenplum-Backup-and-Restore/1.25/tanzu-greenplum-backup-and-restore/GUID-admin_guide-managing-backup-gpbackup.html?hWord=N4IghgNiBcIOYAcBOBTAzgFwPapAXyA
[40]https://www.aliyun.com/product/dbs
[41]https://15721.courses.cs.cmu.edu/spring2017/papers/15-optimizer2/p337-soliman.pdf
[42]https://gpdb.docs.pivotal.io/6-11/admin_guide/workload_mgmt.html
[43]https://gpdb.docs.pivotal.io/6-11/admin_guide/workload_mgmt_resgroups.html
[44]https://github.com/greenplum-db/diskquota
[45]http://event.cwi.nl/lsde/papers/p215-dageville-snowflake.pdf
[46]https://www.vertica.com/wp-content/uploads/2018/05/Vertica_EON_SIGMOD_Paper.pdf
[47]https://help.aliyun.com/document_detail/383266.html
[48]https://mp.weixin.qq.com/s/h15UK4--Js5ApFwuY96GKA
「阿里云数据库升舱计划实战峰会」
全网首发新一代云原生数仓解决方案,解构企业数智化转型新范式
7月19日,“千仓万库,轻云直上——阿里云数据库升舱计划实战峰会”即将在线上召开。本次峰会,我们将从技术、实践、生态三大维度,与您一起深入洞察企业级数据平台和数据仓库的发展现状、最佳实践和未来趋势。峰会汇聚了包括申万宏源等在内的多位金融、运营商领域知名企业领袖、技术大咖、生态合作伙伴,共议企业数字化转型新范式。同时,会上将发布升舱计划最新版本,进一步释放云原生红利,让业务价值敏捷化、在线化。
点击阅读原文查看详情。
微信扫码关注该文公众号作者