Blog | Databend

This Week in Databend #81

February 15, 2023 · 6 min read

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Accepted RFCs :flight_departure:

rfc: query result cache (#10014)

Features & Improvements ✨

Planner

support EXPLAIN ANALYZE statement to profile query execution (#10023)
derive new filter and push down (#10021)

Query

alter table add/drop column SQL support (#9851)
new table function infer_schema (#9936)
add privilege check for select (#9924)
improve signed numeric keys (#9987)
support to parse jwt metadata and add multiple identity issuer configuration (#9971)
support create file format (#10009)

Storage

adopt OpenDAL's native scan support (#9985)
add drop_table_by_id api (#9990)

Expression

add operation for decimal (#9926)

Functions

support array_any function (#9953)
support array_sort (#9941)

Sqllogictest

add time travel test for alter table (#9939)

Code Refactoring 🎉

Meta

move application level types such as user/role/storage-config to crate common-meta/app (#9944)
fix abuse of ErrorCode (#10056)

Query

use transform_sort_merge use heap to sort blocks (#10047)

Storage

introduction of FieldIndex and ColumnId types for clear differentiation of use (#10017)

Build/Testing/CI Infra Changes 🔌

run benchmark for clickbench result format (#10019)
run benchmark both s3 & fs (#10050)

Bug Fixes 🔧

Privilege

add privileges on system.one to PUBLIC by default (#10040)

Catalog

parts was not distributed evenly (#9951)

Planner

type assertion failed on subquery (#9937)
enable outer join to inner join optimization (#9943)
fix limit pushdown outer join (#10043)

Query

fix add column update bug (#10037)

Storage

fix sub-column of added-tuple column return default 0 bug (#9973)
new bloom filter that bind index with Column Id instead of column name (#10022)

What's On In Databend

Stay connected with the latest news about Databend.

RFC: Query Result Cache

Caching the results of queries against data that doesn't update frequently can greatly reduce response time. Once cached, the result will be returned in a much shorter time if you run the query again.

How to Write a Scalar / Aggregate Function

Did you know that you can enhance the power of Databend by creating your own scalar or aggregate functions? Fortunately, it's not a difficult task!

The following guides are intended for Rust developers and Databend users who want to create their own workflows. The guides provide step-by-step instructions on how to create and register your own functions using Rust, along with code snippets and examples of various types of functions to walk you through the process.

Profile-Guided Optimization

Profile-guided optimization (PGO) is a compiler optimization technique that collects execution data during the program runtime and allows for tailoring optimizations tailored to both hot and cold code paths.

In this blog, we'll guide you through the process of optimizing Databend binary builds using PGO. We'll use Databend's SQL logic tests as an example to illustrate the step-by-step procedure.

Please note that PGO always requires generating perf data using workloads that are statistically representative. However, there's no guarantee that performance will always improve. Decide whether to use it based on your actual needs.

Learn More

Blog | Optimizing Databend Binary Builds with Profile-guided Optimization

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

To make our documentation clearer and easier to understand, we plan to restructure our function-related documentation to follow the same format as DuckDB's documentation. This involves breaking down the task into smaller sub-tasks based on function categories, so that anyone who wants to help improve Databend's documentation can easily get involved.

Issue 10029: Tracking: re-org the functions doc

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	b41sh	Big-Wuu	BohuTANG	dantengsky	dependabot[bot]


drmingdrmer	everpcpc	flaneur2020	johnhaxx7	leiysky	lichuang


mergify[bot]	PsiACE	RinChanNOWWW	soyeric128	sundy-li	TCeason


wubx	Xuanwo	xudong963	xxchan	youngsofun	yufan022


zhang2014	ZhiHanZ	zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

Optimizing Databend Binary Builds with Profile-guided Optimization

February 13, 2023 · 4 min read

PsiACE

Recently someone in the community suggested that we try profile-guided optimization (#9387). Let's see how we can use Rust to build a PGO-optimized Databend!

Background

Profile-guided Optimization is a compiler optimization technique, which collects typical execution data (possible branches) during program execution and then optimizes for inlining, conditional branches, machine code layout, register allocation, etc.

The reason to introduce this technique is that static analysis techniques only consider code performance improvements without actually executing the program. However, these optimizations may not be fully effective. In the absence of runtime information, the compiler cannot take into account the actual execution of the program.

PGO allows data to be collected based on application scenarios in a production environment, so the optimizer can optimize the speed for hot code paths and size for cold code paths and produce faster and smaller code for applications.

rustc supports PGO by building data collection into the binaries, then collecting perf data during runtime to prepare for the final compilation optimization. The implementation relies entirely on LLVM.

Workflow

Follow the workflow below to generate a PGO-optimized program:

Compile the program with instrumentation enabled.
Run the instrumented program to generate a profraw file.
Convert the .profraw file into a .profdata file using LLVM's llvm-profdata tool.
Compile the program again with the profiling data.

Preparations

The data collected during the run will be eventually converted with llvm-profdata. To do so, install the llvm-tools-preview component via rustup, or consider using the program provided by a recent LLVM or Clang version.

rustup component add llvm-tools-preview

After the installation, llvm-profdata may need to be added to the following PATH:

~/.rustup/toolchains/<toolchain>/lib/rustlib/<target-triple>/bin/

Step-By-Step

The following procedure uses Databend's SQL logic tests for demonstration purposes only to help us understand how it works, so you may not get positive results for performance. Use a typical workload for your production environment.

The caveat, however, is that the sample of data fed to the program during the profiling stage must be statistically representative of the typical usage scenarios; otherwise, profile-guided feedback has the potential to harm the overall performance of the final build instead of improving it.

Make sure there is no left-over profiling data from previous runs.
```
rm -rf /tmp/pgo-data
```
Build the instrumented binaries (with release profile), using the RUSTFLAGS environment variable in order to pass the PGO compiler flags to the compilation of all crates in the program.
```
RUSTFLAGS="-Cprofile-generate=/tmp/pgo-data" \
    cargo build --release --target=x86_64-unknown-linux-gnu
```
Instrumented binaries were run with some typical workload and we strongly recommend using workload that is statistically representative of the real scenario. This example runs SQL logic tests for reference only.

Start a stand-alone Databend via a script, or a Databend cluster. Note that a production environment is more likely to run in cluster mode.

Import the dataset and run a typical query workload.

BUILD_PROFILE=release ./scripts/ci/deploy/databend-query-standalone.sh
ulimit -n 10000;ulimit -s 16384; cargo run -p sqllogictests --release -- --enable_sandbox --parallel 16 --no-fail-fast

Merge the .profraw files into a .profdata file with llvm-profdata.

llvm-profdata merge -o /tmp/pgo-data/merged.profdata /tmp/pgo-data

Use the .profdata file for guiding optimizations. In fact, you can notice that both builds use the --release flag, because in an actual runtime case we always use the release build binary.
```
RUSTFLAGS="-Cprofile-use=/tmp/pgo-data/merged.profdata -Cllvm-args=-pgo-warn-missing-function" \
    cargo build --release --target=x86_64-unknown-linux-gnu
```

Run the compiled program again with the previous workload and check the performance:

BUILD_PROFILE=release ./scripts/ci/deploy/databend-query-standalone.sh
ulimit -n 10000;ulimit -s 16384; cargo run -p sqllogictests --release -- --enable_sandbox --parallel 16 --no-fail-fast

References

This Week in Databend #80

February 8, 2023 · 4 min read

PsiACE

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

add databend-meta config grpc_api_advertise_host (#9835)

AST

select from stage with files/pattern (#9877)
parse decimal type (#9894)

Expression

add Decimal128 and Decimal256 type (#9856)

Functions

support array_indexof (#9840)
support array function array_unique, array_distinct (#9875)
support array aggregate functions (#9903)

Query

add column id in TableSchema; use column id instead of index when read and write data (#9623)
support view in system.columns (#9853)

Storage

ParquetTable support topk optimization (#9824)

Sqllogictest

leverage sqllogictest to benchmark tpch (#9887)

Code Refactoring 🎉

Meta

remove obsolete meta service api read_msg() and write_msg() (#9891)
simplify UserAPI and RoleAPI by introducing a method update_xx_with(id, f: FnOnce) (#9921)

Cluster

split exchange source to reader and deserializer (#9805)
split and eliminate the status for exchange transform and sink (#9910)

Functions

rename some array functions add array_ prefix (#9886)

Query

TableArgs preserve info of positioned and named args (#9917)

Storage

ParquetTable list file in read_partition (#9871)

Build/Testing/CI Infra Changes 🔌

support for running benchmark on PRs (#9788)

Bug Fixes 🔧

Functions

fix nullable and or domain cal (#9928)

Planner

fix slow planner when ndv error backtrace (#9876)
fix order by contains aggregation function (#9879)
prevent panic when delete with subquery (#9902)

Query

fix insert default value datatype (#9816)

What's On In Databend

Stay connected with the latest news about Databend.

Why You Should Try Sccache

Sccache is a ccache-like project started by the Mozilla team, supporting C/CPP, Rust and other languages, and storing caches locally or in a cloud storage backend. The community first added native support for the Github Action Cache Service to Sccache in version 0.3.3, then improved the functionality in v0.4.0-pre.6 so that the production CI can now use it.

Now, opendal, open-sourced by Datafuse Labs, acts as a storage access layer for sccache to interface with various storage services (s3/gcs/azlob etc.).

Learn More

Blog | Why You Should Try Sccache

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Try using `build-info`

To get information about git commits, build options and credits, we now use vergen and cargo-license.

build-info can collect build-information of your Rust crate. It might be possible to use it to refactor the relevant logic in common-building.

pub struct BuildInfo {
    pub timestamp: DateTime<Utc>,
    pub profile: String,
    pub optimization_level: OptimizationLevel,
    pub crate_info: CrateInfo,
    pub compiler: CompilerInfo,
    pub version_control: Option<VersionControl>,
}

Issue 9874: Refactor: Try using build-info

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	ariesdevil	b41sh	BohuTANG	dependabot[bot]	drmingdrmer


everpcpc	flaneur2020	johnhaxx7	leiysky	lichuang	mergify[bot]


PsiACE	RinChanNOWWW	soyeric128	sundy-li	TCeason	Xuanwo


xudong963	youngsofun	zhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

Why You Should Try Sccache

February 2, 2023 · 7 min read

Xuanwo

Sccache is a ccache-like project started by the Mozilla team, supporting C/CPP, Rust and other languages, and storing caches locally or in a cloud storage backend. In v0.3.3, Sccache added native support for the Github Action Cache Service; in the subsequent v0.4.0-pre.6, the community has continued to improve this functionality, and it is now ready for use in the production CI.

Alt text

I recently added sccache to the PyO3/maturin CI for testing, and found it to have the following advantages

Easy deployment configuration: No need to specify shared key, no need to worry about GHA's internal cache-from/cache-to logic, just configure SCCACHE_GHA_ENABLED: "true".
Multi-language support: sccache supports caching different compilers for C/CPP, Rust and nvcc at the same time.
Faster in most scenarios: sccache caches the compilation product, no need to load the entire cache in advance, and no need to upload the cached content after the build is complete.
Concurrent job friendly: sccache can share caches between multiple concurrent jobs/workflows, no need to wait until the end of the build.
No cache conflicts: sccache performs hash calculations on each build product input (parameters, environment variables, files, etc.) to build a global conflict-free cache without cache conflicts and without the need to specify additional different cache keys.
No vendor lock-in: sccache is built on [opendal] (https://github.com/datafuselabs/opendal) and naturally supports a variety of different storage services, allowing seamless migration to s3/gcs/azlob etc. in future CI evolutions without relying on GHA cache services.
Actively maintained: sccache is currently actively maintained by me, you can submit feedback directly if you encounter problems using it.

The following are the results of the maturin project tests.

Cases	Sccache vs rust-cache (2nd)	Sccache vs rust-cache (3rd)
Test (ubuntu-latest, 3.7)	59.72%	-1.68%
Test (ubuntu-latest, 3.8)	-4.70%	-8.22%
Test (ubuntu-latest, 3.9)	30.72%	10.81%
Test (ubuntu-latest, 3.10)	1.03%	12.15%
Test (ubuntu-latest, 3.11)	-10.16%	-29.35%
Test (ubuntu-latest, pypy3.8)	18.34%	-3.84%
Test (ubuntu-latest, pypy3.9)	5.13%	22.90%
Test (macos-latest, 3.7)	11.87%	5.65%
Test (macos-latest, 3.8)	-7.82%	-13.65%
Test (macos-latest, 3.9)	-17.98%	-45.20%
Test (macos-latest, 3.10)	-13.20%	-15.38%
Test (macos-latest, 3.11)	-17.44%	-29.55%
Test (macos-latest, pypy3.8)	14.83%	-23.32%
Test (macos-latest, pypy3.9)	-28.03%	-38.56%
Test (windows-latest, 3.7)	30.08%	24.22%
Test (windows-latest, 3.8)	35.11%	41.14%
Test (windows-latest, 3.9)	9.24%	-5.28%
Test (windows-latest, 3.10)	-8.56%	-15.81%
Test (windows-latest, 3.11)	-1.39%	-36.49%
Test (windows-latest, 3.8)	-19.99%	-35.54%
Test (windows-latest, 3.9)	18.95%	-8.55%

The table compares the difference between a second/third run with sccache and a run with rust-cache, with bolded plural entries indicating that sccache is faster than rust-cache. As you can see, as the cache hit rate increases, sccache achieves a maximum improvement of almost 50% over rust-cache.

Give sccache a try, and if you don't like it, go to issues for feedback, and feel free to contribute!

In the next section I will first describe the internal API of the Github Action Cache Service and how it works, and then compare the differences between the rust-cache / sccache implementations to show why sccache is better / faster.

Github Action Cache Service Principle

The Github Action Cache Service is essentially an immutable storage service that supports prefix queries and provides the following non-public API.

Query Cache

GET /cache?keys=abc,ab,a&version=v1

keys: Specify a comma-separated set of query keys, the result will be the latest key with the same version prefix match
version：Specify the namespace used by the cache key

Reserve Cache

POST /caches

inputs: {key: <cache_key>, version: <cache_version>}
outputs: {cache_id: <cache_id>}

After each set of (key, version) has been reserved, all subsequent requests with the same key & version will return an already exists error, meaning that the cache cannot be overwritten. A successful request will return a numeric cache_id, which will be used for subsequent uploads and cache creation.

Upload Cache

PATCH /caches/[cache_id]

Upload specific cache content, using Content-Range to mark the cache location for this upload.

Create Cache

POST /caches/[cache_id]

This API can be used to create a cache after all the cached content has been created. The cache will only be queried after a successful response from this API.

rust-cache implementation

On top of this internal API (which is reasonably suspected to be provided by Azure DevOps services), Github provides actions/cache for users to use, and rust-cache is based on the @actions/cache implementation.

rust-cache calculates a cache key based on the github job_id, rustc version, environment variables, Cargo.lock, etc., and then packages ~/.cargo and . /target into one package and upload it. If the cache hits, it will be loaded locally and unpacked, if the cache misses, the cache will be uploaded in the post action.

The advantage of rust-cache is that the GHA Cache API is only called once for the whole process and the rate limit is rarely triggered.

sccache implementation

The GHA implementation of sccache is completely file based.

Alt text

sccache calculates a hash as a cache key based on the environment variables, binary, compilation parameters, input files, etc. passed during each rustc call, and loads the file directly from the storage service if it exists, skipping this compilation operation, otherwise it compiles and writes the result to the storage service. This means that

sccache does not have to deal with cache conflicts caused by different inputs, and can use an always unique hash as the cache key.
sccache can download the required cache at rustc compile time, without having to load the entire contents beforehand, or upload them all after the job has finished.
Sccache's cache loading logic is not heavily dependent on Cargo.lock itself, so the cache can be reused even if a large number of dependencies change.
sccache can reuse the cache between concurrent jobs, as they all share the same non-conflicting memory space.

In addition, sccache can be used for compilation caching of languages such as c/cpp, and if compilers such as gcc/clang are also used in the project, the cache can be shared with a simple configuration.

To help users make better use of Sccache in Github Action, we have developed sccache-action. However, my PR is not currently merged and can be used first with my fork:

- name: Sccache Setup
  # Just for test, come back to upstream after released
  uses: Xuanwo/sccache-action@c94e27bef21ab3fb4a5152c8a878c53262b4abb0
  with:
    version: "v0.4.0-pre.6"

Next, only two environment variables need to be configured.

env:
    SCCACHE_GHA_ENABLED: "true"
    RUSTC_WRAPPER: "sccache"

At the end of each Job, sccache-action outputs the usage of this Cache.

/opt/hostedtoolcache/sccache/0.4.0-pre.6/x64/sccache --show-stats
Compile requests                   1887
Compile requests executed          1035
Cache hits                          836
Cache hits (C/C++)                   22
Cache hits (Rust)                   814
Cache misses                        189
Cache misses (Rust)                 189
Cache timeouts                        0
Cache read errors                     0
Forced recaches                       0
Cache write errors                    0
Compilation failures                 10
Cache errors                          0
Non-cacheable compilations            0
Non-cacheable calls                 852
Non-compilation calls                 0
Unsupported compiler calls            0
Average cache write               0.051 s
Average compiler                  1.132 s
Average cache read hit            0.000 s
Failed distributed compilations       0

Non-cacheable reasons:
crate-type                          521
-                                   320
unknown source language              11

Cache location                  ghac, name: sccache-v0.4.0-pre.6, prefix: /sccache/

Users can use this information to adjust their strategy for using sccache.

Conclusion

Sccache uses GHA Cache in a new way to accelerate the compilation of Rust projects, with the following advantages over existing solutions

Easy deployment and configuration
Multiple language support
Faster in most scenarios
Concurrent task friendly
No cache conflicts
No vendor lock-in
Active maintenance

Feel free to try and use Sccache in your own projects!

This Week in Databend #79

February 1, 2023 · 4 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

AST

add syntax about parsing presign options with content type (#9771)

Format

add TSV file format back (#9732)

Functions

support array functions prepend and append (#9844)
support array concat (#9804)

Query

add topn runtime filter in native storage format (#9738)
enable hashtable state pass from partial to final (#9809)

Storage

add pruning stats to EXPLAIN (#9724)
cache bloom index object (#9712)

Code Refactoring 🎉

'select from stage' use ParquetTable (#9801)

Meta

expose a single "kvapi" as public interface (#9791)
do not remove the last node from a cluster (#9781)

AST/Expression/Planner

unify Span and Result (#9713)

Executor

merge simple pipe and resize pipe (#9782)

Bug Fixes 🔧

Base

fix not linux and macos jemalloc fallback to std (#9786)

Config

fix table_meta_cache can't be disabled (#9767)

Meta

when import data to meta-service dir, the specified "id" has to be one of the "initial_cluster" (#9755)

Query

fix and refactor aggregator (#9748)
fix memory leak for data port (#9762)
fix panic when cast jsonb to string (#9813)

Storage

fix up max_file_size may oom (#9740)

What's On In Databend

Stay connected with the latest news about Databend.

DML Command - UPDATE

Modifies rows in a table with new values.

Note: Databend guarantees data integrity. In Databend, Insert, Update, and Delete operations are guaranteed to be atomic, which means that all data in the operation must succeed or all must fail.

Syntax

UPDATE <table_name>
SET <col_name> = <value> [ , <col_name> = <value> , ... ]
    [ FROM <table_name> ]
    [ WHERE <condition> ]

Learn More

Docs | DML Command - UPDATE

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Support Arrow Flight SQL Protocol

Currently Databend supports the MySQL protocol, and it would be great if Databend could support the Arrow Flight SQL protocol as well.

Typically a lakehouse stores data in parquet files using the MySQL protocol while Databend has to do deserialization from parquet to arrow and then back to MySQL data types. Again on the caller end users use data frames or MySQL result iterators, which also requires serialization of types. With Arrow Flight SQL all of these back and forth serialization costs can be avoided.

Issue 9832: Feature: Support Arrow Flight SQL protocol

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	ariesdevil	b41sh	BohuTANG	dantengsky	dependabot[bot]


drmingdrmer	everpcpc	flaneur2020	johnhaxx7	leiysky	mergify[bot]


PsiACE	RinChanNOWWW	soyeric128	sundy-li	TCeason	Xuanwo


youngsofun	yufan022	zhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

What's Fresh in Databend v0.9

January 31, 2023 · 5 min read

sundyli

Databend v0.9.0 was officially released on January 13, 2023! This release is the last major release of Databend before version 1.0, and it is also the version in which we have refactored the core code the most so far. Compared to v0.8.0, we have made more than 5,000 commits, 700+ optimizations and fixes, 4,347 file changes, and 340,000 lines of code changes in v0.9.0. Thanks to all the community partners who participated and to everyone who made Databend better!

In v0.9.0, we introduced a new type system, a new expression calculation framework, JSONB support, complete JOIN support and optimization, CBO support, a native storage format and other major feature optimizations. A lot of optimization and enhancements have also been made in terms of performance, stability, and usability.

Performance comparison

In the new version, we have made a lot of optimizations in the execution engine, optimizer, and storage layer. The efficiency gets doubled in most scenarios. The following is a performance comparison between v0.8 and v0.9 benchmarking the Hits dataset with the default FUSE engine on S3:

Alt text

Brand new type system

To give databend an easy-to-understand yet powerful type derivation system, we learned from the compiler internals of a number of good programming languages and then refined a subset of them for SQL. Based on the current purely static type system, we have a sophisticated type derivation mechanism that infers the execution of expressions as much as possible at the compile time of SQL, a minimalist expression function registration logic, and the implementation of generic derivation at the database type level.

On top of the new type system, modules such as constant folding, type derivation, function registration, query data trimming, etc. can benefit from the new type system.

Support for JSONB

In the new version, we implemented the Rust version of JSONB, the default JSON data type will be stored in JSONB, and it is also compatible with the old JsonText format. Based on the binary JSON format, both storage space and query performance have been significantly optimized.

Alt text

More info: https://databend.rs/doc/contributing/rfcs/json-optimization

Full support for JOINs

Full Join types are supported: inner/natural/cross/outer/semi/anti joins. In the past few months, according to the feedback from the community and online users, hash join has been deeply optimized to meet the performance requirements in most scenarios.

Support for CBO

We have added the statistical calculation logic of NDV in the statistical information, and users can now generate statistical information tables through the "Analyze" command similar to presto. JOINs can use the existing statistical information to optimize the logical plan based on cost. After the subsequent CBO support is improved, we will update the query performance data comparison of TPCH 100G data.

Native Storage Format

Databend supports Git-Like's Fuse engine. Based on this engine, we can quickly go back to a certain historical point in time to query, and realize "time travel" inside the database. Inside the Fuse engine, we also support a new Storage Format besides Parquet --- strawboat: https://github.com/sundy-li/strawboat.

Strawboat is based on Arrow's native storage format. Based on it, we can read data more efficiently than Parquet. In the hits dataset, the full table scan native format can be 2-3 times faster. In the hits data set, a very considerable improvement has been achieved in the local deployment scenario, and we will improve the performance comparison in clickbench later.

Efficient bloom filter filtering

In the new version, we introduced the xor filter to calculate and store the bloom filter for each column. Compared with the previous version, the new bloom filter has improved a lot in import query performance and occupied space. See https://databend.rs/blog/xor-filter

Designing and open-sourcing serverless DataSharing protocol

In the new version we created a zero-trust data sharing solution among multi-tenants based on object storage presign short-term access token. In the case of consistent basic performance, aws lambda will be used to implement data sharing in a serverless manner.

Stage

We implemented UserStage in the new version, similar to the home directory of linux: COPY INTO my_table FROM @~;

Stage's data import supports meta storage status, which means we can always save and import new files from stage.
Support exporting multiple files in different formats from Stage.
Importing tables from Stage supports parallelization.

Other improvements

The new release also includes these improvements:

Read_parquet of duckdb supports reading local parquet files directly without importing
Commonly used function performance optimization, commonly used GEO function support
Distinct performance optimization
Adaptive String HashTable
SQLancer integration
Parquet reading acceleration
The previous python version sqllogictest was rewritten to use Rust
NDJSON and JSON output format support
ALTER TABLE supports recluster
Support hyperloglog update and delete: https://db.in.tum.de/~freitag/papers/p23-freitag-cidr19.pdf

Download v0.9.0

If you are interested in Databend v0.9.0, go to https://github.com/datafuselabs/databend/releases/tag/v0.9.0-nightly to view all changelogs or download the release.

If you are using an old version of Databend, you can directly upgrade to the new version. For the upgrade process, please refer to: https://databend.rs/doc/operations/upgrade

Feedback

If you need any help when working with the new release, submit an issue on Github Issue. GitHub: https://github.com/datafuselabs/databend/

This Week in Databend #78

January 25, 2023 · 4 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

SQL

eliminate extra group by scalars (#9708)

Query

add privilege check for insert/delete/optimize (#9664)
enable empty projection (#9675)
add aggregate limit in final aggregate stage (#9716)
add optional column names to create/alter view statement (#9715)

Storage

add prewhere support in native storage format (#9600)

Code Refactoring 🎉

move io constants to common/io (#9700)
refine fuse/io/read (#9711)

Planner

rename Scalar to ScalarExpr (#9665)

Storage

refactor cache layer (#9672)
pruner.rs -> fuse_bloom_pruner.rs (#9710)
make pruner hierarchy to chain (#9714)

Build/Testing/CI Infra Changes 🔌

support setup minio storage & external s3 storage in docker image (#9676)

Bug Fixes 🔧

Expression

fix missing simple_cast (#9671)

Query

fix efficiently_memory_final_aggregator result is not stable (#9685)
fix max_result_rows only limit output results nums (#9661)
fix query hang in two level aggregator (#9694)

Storage

may get wrong datablocks if not sorted by output schema (#9470)
bloom filter is using wrong cache key (#9706)

What's On In Databend

Stay connected with the latest news about Databend.

Databend All-in-One Docker Image

Databend Docker Image now supports setting up MinIO storage and external AWS S3 storage.

Now you can easily use a Docker image for your first experiment with Databend.

Run with MinIO as backend

docker run \
    -p 8000:8000 \
    -p 9000:9000 \
    -e MINIO_ENABLED=true \
    datafuselabs/databend

Run with self managed query config

docker run \
    -p 8000:8000 \
    -e DATABEND_QUERY_CONFIG_FILE=/etc/databend/mine.toml \
    -v query_config_file:/etc/databend/mine.toml \
    datafuselabs/databend

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Vector Search

Vector search captures the meaning and context of unstructured data, and is commonly used for text or image processing, enabling the use of semantics to find similar results and obtain more valid results than traditional keyword retrieval.

Databend plans to provide users with a richer and more efficient means of querying by supporting vector search, and the introduction of Faiss Index may be an initial solution.

Issue 9699: feat: vector search (Faiss index)

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	ariesdevil	b41sh	BohuTANG	dantengsky	dependabot[bot]


everpcpc	flaneur2020	johnhaxx7	leiysky	mergify[bot]	PsiACE


RinChanNOWWW	sandflee	sundy-li	xudong963	zhang2014	zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

This Week in Databend #77

January 18, 2023 · 4 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

use expression::TableSchema to replace obsolete datavalues::DataSchema (#9506)
iter() iterate every tree and every records in theses trees (#9621)

Expression

add other base geo functions (#9588)

Optimizer

improve cardinality estimation for join based on histogram (#9594)

Planner

improve join reorder algorithm (#9571)

Query

support insert with placeholder (#9575)
set setting support expr (#9574)
add information_schema for sharding-jdbc (#9583)
support named params for table functions (#9630)

Storage

read_parquet page index (#9563)
update interpreter and storage support (#9261)

Code Refactoring 🎉

refine on_error mode (#9473)

Meta

remove unused meta types and conversion util (#9584)

Parser

more strict parser for format_options (#9635)

Expression

rearrange common_expression and common_function (#9585)

Build/Testing/CI Infra Changes 🔌

run sqllogictests with binary (#9603)

Bug Fixes 🔧

Expression

constant folder should run repeatedly until stable (#9572)
check_date() and to_string(boolean) may panic (#9561)

Planner

fix stack overflow when applying RuleFilterPushDownJoin (#9645)

Storage

fix range filter read stat with index (#9619)

Sqllogictest

sqllogic test hangs (cluster mod + clickhouse handler) (#9615)

What's On In Databend

Stay connected with the latest news about Databend.

Upgrade Databend Query from 0.8 to 0.9

Databend-query-0.9 introduces incompatible changes in metadata, these metadata has to be manually migrated. Databend provides a program for this job: databend-meta-upgrade-09, which you can find in a release package or can be built from source.

Upgrade

databend-meta-upgrade-09 --cmd upgrade --raft-dir "<./your/raft-dir/>"

Learn More

Release Proposal: Nightly v1.0

The call for proposals for the release of v1.0 is now open.

The preliminary plan is to release in March, mainly focusing on alter table, update, and group by spill.

Learn More

Release proposal: Nightly v1.0

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Add Type Checker for Sqllogictest

We can check if each row's each element's type is correct.

databend/tests/sqllogictests/src/client/mysql_client.rs

 // Todo: add types to compare 
 Ok(DBOutput::Rows { 
     types, 
     rows: parsed_rows, 

Issue 9647: Feature: Add type checker for sqllogictest

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

We're gearing up for the v0.9 release of Databend. Stay tuned.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	ariesdevil	b41sh	BohuTANG	dantengsky	drmingdrmer


everpcpc	leiysky	mergify[bot]	PsiACE	RinChanNOWWW	soyeric128


sundy-li	TCeason	Xuanwo	xudong963	youngsofun	yufan022


zhang2014	zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

This Week in Databend #76

January 11, 2023 · 4 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

add reader-min-msg-ver and msg-min-reader-ver in proto-conv (#9535)

Planner

support tuple.1 and get(1)(tuple) (#9493)
support display estimated rows in EXPLAIN (#9528)

Query

efficiently memory two level group by in standalone mode (#9504)

Storage

support nested type in read_parquet (#9486)
add build options table (#9502)

Code Refactoring 🎉

merge new expression (#9411)
remove and rename crates (#9481)
bump rust version (#9540)

Expression

move negative functions to binder (#9484)
use error_to_null() to eval try_cast (#9545)

Functions

replace h3ron to h3o (#9553)

Format

extract AligningStateTextBased (#9472)
richer error context (#9534)

Query

use ctx to store the function evaluation error (#9501)
refactor map access to support view read tuple inner (#9516)

Storage

bump opendal for streaming read support (#9503)
refactor bloom index to use vectorized siphash function (#9542)

Bug Fixes 🔧

HashTable

fix memory leak for unsized hash table (#9551)

Storage

fix row group stats collection (#9537)

What's On In Databend

Stay connected with the latest news about Databend.

New Year, New Expression!

We're so thrilled to tell you that Databend now fully works with New Expression after more than a half year of dedicated work. New Expression introduces a formal type system to Databend and supports type-safe downward casting , making the definition of functions easier.

New Expression is still being tuned, and a new version (v0.9) of Databend will be released once the tuning work is complete.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

`UNNEST` Function

The UNNEST function takes an array as a parameter, and returns a table containing each element of the array in a row.

Syntax

UNNEST(ARRAY) [WITH OFFSET]

If you're interested in becoming a contributor, helping us develop the UNNEST function would be a good start.

Issue 9549: Feature: Support unnest

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

We're gearing up for the v0.9 release of Databend. Stay tuned.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	ariesdevil	b41sh	BohuTANG	ClSlaid	dantengsky


dependabot[bot]	drmingdrmer	everpcpc	flaneur2020	leiysky	mergify[bot]


PsiACE	RinChanNOWWW	soyeric128	sundy-li	TCeason	wubx


Xuanwo	xudong963	youngsofun	zhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

This Week in Databend #75

January 4, 2023 · 4 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Format

basic output format JSON (#9447)

Query

check connection params (#9437)
add max_query_row_nums (#9406)

Storage

support prewhere in hive (#9427)
add generic cache trait for different object reader (#9436)
add metrics for new cache (#9445)

New Expression

migrate hash func to func-v2 (#9402)

Sqllogictest

run all tests in parallel (#9400)

Code Refactoring 🎉

Storage

add to_bytes and from_bytes for CachedObject (#9439)
better table-meta and parquet reader function (#9434)
convert fuse_snapshot unit tests to sqlloigc test (#9428)

Bug Fixes 🔧

Format

catch unwind when read split (#9420)

User

lazy load JWKS (#9446)

Planner

create Stage URL's path should ends with / (#9450)

What's On In Databend

Stay connected with the latest news about Databend.

Databend 2022 Recap

Let's look back and see how Databend did in 2022.

Open source: got 2,000+ stars, merged 2,400+ PRs, and solved 1,900 issues.
From data warehouse to lakehouse: Brand-new design with enhanced capabilities.
Rigorous testing: SQL Logic Tests, SQLancer, and https://perf.databend.rs.
Building the ecosystem: More customers chose, trusted, and grew with Databend, including Kuaishou and SAP.
Databend Cloud: Built on top of Databend, the next big data analytics platform.

We wish everyone a Happy New Year and look forward to engaging with you.

Learn More

Blog | Databend 2022 Recap

Databend 2023 Roadmap

As the new year approaches, Databend is also actively planning its roadmap for 2023.

We will continue to polish the Planner and work on data and query caching. Enhancing storage and query issues for PB-level data volumes is also on our list.

Try Databend and join the roadmap discussion.

Learn More

Issue | Roadmap 2023

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Profile-Guided Optimization (PGO)

The basic concept of PGO is to collect data about the typical execution of a program (e.g. which branches it is likely to take) and then use this data to inform optimizations such as inlining, machine-code layout, register allocation, etc.

rustc supports doing profile-guided optimization (PGO). We expect to be able to use it to enhance the build.

Issue 9387: Feature: Add PGO Support

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


ariesdevil	BohuTANG	dantengsky	dependabot[bot]	everpcpc	flaneur2020


hantmac	leiysky	mergify[bot]	PsiACE	sandflee	soyeric128


sundy-li	TCeason	Xuanwo	xudong963	youngsofun	zhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

Databend 2022 Recap

December 31, 2022 · 6 min read

PsiACE

The year is coming to an end, and Databend is about to enter its third year. Before we count down the new year, it's a good idea to look back and see how Databend did in 2022.

Open Source: Receiving Increased Attention

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

databend

The open-source philosophy has guided Databend from the very beginning. The entire team works seamlessly on GitHub where the Rust community and many data pros are fully involved. In 2022, the Databend repository:

Got 2,000+ stars, totaling 5,000 .
Merged 2,400+ PRs, totaling 5,600 .
Solved 1,900 issues, totaling 3,000 .
Received 16,000 commits, totaling 23,000 .
Attracted more contributors, totaling 138 .

Development: Inspired by Real Scenarios

Databend brought many new features and improvements in 2022 to help customers with their real work scenarios.

databend arch

Brand-New Data Warehouse

As a data warehouse inspired by and benchmarking itself against Snowflake and Clickhouse, Databend fully took advantage of "Cloud Native" to bring you a new design and implementation without breaking the balance between performance and maintainability:

Added support for Stage and Data Sharing, helping users manage their data life cycle with more options.
Introduced a new planner with user-friendly error prompts and efficient optimization techniques for the execution plan.
Redesigned the type system to support type checking and type-safe downward casting.
Enhanced the new processor framework: It can now work in both Pull and Push modes.
Added experimental support for Native Format to improve performance when running on a local disk.

Databend as Lakehouse

Storing and managing massive data is key to our vision "Databend as Lakehouse" . A lot of efforts have been made in 2022 for a larger data payload and a wider range of accepted data sources:

Adopted OpenDAL in the data access layer as a unified interface.
Expanded support for structured and semi-structured data.
Added the ability to keep multiple catalogs: This makes integrations with custom catalogs such as Hive much easier.
Added the ability to query data directly from a local, staged, or remote file.

Optimal Efficiency Ratio

After a year of continuous tuning, we brought Databend to a new stage featuring elastic scheduling and separating storage from compute. We're thrilled to see a significant improvement in the efficiency ratio:

In some scenarios, Databend works as efficiently as Clickhouse.
Lowered costs by 90% compared to Elasticsearch, and by 30% compared to Clickhouse.

Testing: Put Us at Ease

Comprehensive tests help make a database management system robust. While optimizing performance, we also care about the accuracy and reproducibility of SQL results returned from Databend.

databend perf

Correctness Testing

In 2022, we replaced stateless tests with SQL Logic Tests for Databend in the first place. We also migrated a large number of mature test cases to cover as many scenarios as possible. Afterward, we started to use a Rust native test program called sqllogictest-rs instead of the previous Python one, which saved us a lot of time on CI without losing the maintainability of the tests.

Furthermore, we also planned and implemented three types of automated testing (TLP, QPS, and NoREC) supported by SQLancer. All of them have been successfully merged into the main branch with dozens of bug fixes.

Performance Testing

Performance testing is also essential for us. In 2022, we launched a website (https://perf.databend.rs/) to track daily performance changes and spot potential issues. Meanwhile, we actively evaluated Databend against Clickbench and some other benchmarks.

Ecosystem: Give and Take

The Databend ecosystem and users benefit from each other. More and more users were attracted to the ecosystem and joined the community in 2022. As they brought their own creative ideas to Databend and made them come true, the Databend ecosystem made tremendous progress and started to flourish in the field.

Positive Expansion

We build and value the Databend ecosystem. Databend is now compatible with the MySQL protocol and Clickhouse HTTP Handler, and can seamlessly integrate with the following data services or utilities:

Airbyte
DBT
Addax (Datax)
Vector
Jupyter Notebook
DBeaver

To help users develop and customize services based on Databend, we developed drivers in multiple languages, including Python and Go.

Growing with Users

Users are the basis of Databend. They help develop Databend and stir up the whole community.

In 2022, Databend added support for the Hive Catalog with the help of Kuaishou Technology. This connected Databend to the Hive ecosystem and encouraged us to consider the possibility of multiple catalogs. DMALL implemented and verified data archiving with Databend. We also appreciate SHAREit, Voyance, DigiFinex, and Weimob for their trust and support.

The Databend ecosystem includes a few projects that are loved and trusted by other products:

OpenDAL now manages the data access layer for sccache , which provides further support for Firefox CI . Other database and data analysis projects, such as GreptimeDB and deepeth/mars , also used OpenDAL for data access.
OpenRaft was used to implement a Feature Registry (database to hold feature metadata) in Azure/Feathr. SAP, Huobi, and Meituan also used it in a few internal projects.
The MySQL protocol implementation in OpenSrv has been applied to multiple database projects such as GreptimeDB and CeresDB .

In 2022, the Databend community launched the "Data Infra Club" for knowledge sharing. Our friends from PingCAP, Kuaishou Technology, DMALL, and SHAREit were invited to share their insights on big data platforms, Data Mesh, and Modern Data Stack. You can find all the video replays on Bilibili if you're interested.

Going Cloud: Sky's the Limit

Going cloud is part of Databend's business strategy where most Databend users come from the cloud.

Built on top of Databend, Databend Cloud is a big-data analytics platform of the next generation, featuring easy-to-use , low-cost , and high-performance . Two versions of Databend Cloud are now available and open for trial:

https://databend.com (for international users)
https://databend.cn (for China users only)

Rewrite sqllogictest framework in rust

December 29, 2022 · 3 min read

Xudong

Rewriting sqllogictest Framework with Rust

This post is about a big move we've made for Databend. We successfully switched the sqllogictest framework from Python to Rust using sqllogictest-rs, a robust implementation of the sqllogictest framework for the Rust ecosystem. Sqllogictest was designed with SQLite in mind. Benefiting from its neutrality towards database engines, we can use Sqllogictest to verify the accuracy of a SQL database engine as well. This is done by comparing query results from multiple SQL engines running the same query.

Why sqllogictest-rs?

The original sqllogictest framework (RFC for sqllogictest) was written in Python. We planned a switch to sqllogictest-rs for the following reasons:

The entire Databend team is proficient in Rust. Working with a unified codebase written in Rust would boost our productivity over the long term.
The previous framework lacked a strict parser at the front end and resulted in errors going undetected.
As the sqllogictest-rs crate is maturing, building a new sqllogictest framework based on it would save us a lot of effort in the long run.
We expected a 10x performance boost from the switch to Rust. The Python sqllogictest had been experiencing suboptimal runtime that resulted in a slower CI.

How We Nailed It

Our first version of sqllogictest doesn't strictly follow the sqllogictest wiki. This is a little bit frustrating because we have to manually adjust the format of the test files, for example, in some cases like these:

Extra blank lines between the query and ----.
Non-identical comment formats.
Confusing empty strings: It displays results from queries like select ' ' with , rather than (empty).

Databend supports three types of client handlers: MySQL, HTTP, and ClickHouse. Each type of them returns content in a different format. The HTTP handler returns content in JSON format and the ClickHouse handler returns it in TSV, both of which require the following substitutions:

inf -> Infinity
nan -> NaN
\N -> NULL

We introduced sandbox tenant to increase parallelism. Each test file now runs in parallel in its own sandbox environment that is separated from each other. The benefits of doing so include preventing a database or table from being dropped by mistake and significantly reducing test time.

Unsolved Issues

We're still figuring out the most effective way to test a query that returns dynamic results. For example, the Create_time in the result returned from SHOW TABLE STATUS.

After the Switch

We're glad to see an efficiency improvement after going with sqllogictest-rs and this will benefit the entire Databend community. Our special thanks go to sqllogictest-rs for the great support, and everyone who has been involved. If you're also a fan of sqllogictest, stay tuned for more exciting news by visiting the following links:

This Week in Databend #74

December 28, 2022 · 4 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

remove stream when a watch client is dropped (#9334)

Planner

support selectivity estimation for range predicates (#9398)

Query

support copy on error (#9312)
support databend-local (#9282)
external storage support location part prefix (#9381)

Storage

rangefilter support in (#9330)
try to improve object storage io read (#9335)
support table compression (#9370)

Metrics

add more metrics for fuse compact and block write (#9399)

Sqllogictest

add no-fail-fast support (#9391)

Code Refactoring 🎉

adopt rustls entirely, removing all deps to native-tls (#9358)

Format

remove format_xxx settings (#9360)
adjust interface of FileFormatOptionsExt (#9395)

Planner

remove SyncTypeChecker (#9352)

Query

split fuse source to read data and deserialize (#9353)
avoid io copy in read parquet data (#9365)
add uncompressed buffer for parquet reader (#9379)

Storage

add read/write settings (#9359)

Bug Fixes 🔧

Format

fix align_flush with header only (#9327)

Settings

use logical CPU number as default value of num_cpus (#9396)

Processors

the data type on both sides of the union does not match (#9361)

HTTP Handler

false alarm (warning log) about query not exists (#9380)

Sqllogictest

refactor sqllogictest http client and fix expression string like (#9363)

What's On In Databend

Stay connected with the latest news about Databend.

Introducing databend-local

Inspired by clickhouse-local, databend-local allows you to perform fast processing on local files, without the need of launching a Databend cluster.

> export CONFIG_FILE=tests/local/config/databend-local.toml
> cargo run --bin=databend-local -- --sql="SELECT * FROM tbl1" --table=tbl1=/path/to/databend/docs/public/data/books.parquet

exec local query: SELECT * FROM tbl1
+------------------------------+---------------------+------+
| title                        | author              | date |
+------------------------------+---------------------+------+
| Transaction Processing       | Jim Gray            | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
| Transaction Processing       | Jim Gray            | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+---------------------+------+
4 rows in set. Query took 0.015 seconds.

Learn More

PR | support databend-local

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Compressing Short Strings

When processing the same queries with short strings involved, Databend usually reads more data than other databases, such as Snowflake.

SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;

Such queries might be more efficient if short strings (URLs, etc) are compressed.

Issue 9001: performance: compressing for short strings

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	ariesdevil	BohuTANG	dantengsky	drmingdrmer	eastfisher


everpcpc	leiysky	mergify[bot]	PsiACE	RinChanNOWWW	soyeric128


sundy-li	Xuanwo	xudong963	youngsofun	zhang2014	zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

This Week in Databend #73

December 21, 2022 · 4 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Multiple Catalogs

implement show tables (from|in catalog.database) (#9153)

Planner

introduce histogram in column statistics (#9310)

Query

support attaching stage for insert values (#9249)
add native format in fuse table (#9279)
add internal_enable_sandbox_tenant config and sandbox_tenant (#9277)

Sqllogictest

introduce rust native sqllogictest framework (#9150)

Code Refactoring 🎉

unify apply_file_format_options for copy & insert (#9323)

remove unused code (#9266)

meta

test watcher count (#9324)

Planner

replace TableContext in planner with PlannerContext (#9290)

Bug Fixes 🔧

Base

try fix SIGABRT when catch unwind (#9269)
replace #[thread_local] to thread_local macro (#9280)

Query

fix unknown database in query without relation to this database (#9250)
fix wrong current_role when drop the role (#9276)

What's On In Databend

Stay connected with the latest news about Databend.

Introduced a Rust Native Sqllogictest Framework

Sqllogictest verifies the results returned from a SQL database engine by comparing them with the results of other engines for the same queries.

In the past, Databend ran such tests using a program written in Python and migrated a large number of test cases from other popular databases. We implemented the program again with sqllogictest-rs in recent days.

Learn More

Experimental: Native Format

PA is a native storage format based on Apache Arrow. Similar to Arrow IPC, PA aims at optimizing the storage layer.

Databend is introducing PA as a native storage format in the hope of getting a performance boost, though it's still at an early stage of development.

create table tmp (a int) ENGINE=FUSE STORAGE_FORMAT='native';

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Checking File Existence Before Returning Presigned URL

When presigning a file, Databend currently returns a potentially valid URL based on the filename without checking if the file really exists. Thus, the 404 error might occur if the file doesn't exist at all.

Issue 8702: Before return presign url add file exist judgement

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


ariesdevil	b41sh	BohuTANG	ClSlaid	drmingdrmer	everpcpc


leiysky	mergify[bot]	PsiACE	sandflee	soyeric128	sundy-li


Xuanwo	xudong963	youngsofun	zhang2014	ZhiHanZ	zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

CPU and Memory Profiling for Rust

December 19, 2022 · 2 min read

BohuTANG

Profiling CPU and memory for Go applications is easy and can be of great help in performance troubleshooting, for example, with flamegraphs. For Rust applications, however, the profiling requires extra work. This post explains how to use flamegraphs to visualize performance data of your CPU and memory for Databend.

To support CPU and memory profiling, some APIs need to be included in the application. For example, Databend includes the following in the code:

CPU Profiling

To do a CPU profiling, simply run the following command on the Databend server:

go tool pprof -http="0.0.0.0:8081" http://localhost:8080/debug/pprof/profile?seconds=30

localhost:8080: Databend management address.
0.0.0.0:8081: pprof server address.
seconds=30: Profiling lasts for 30 seconds.

Then open the URL <your-ip>:8081/ui/flamegraph in your browser to view the flamegraph:

Alt text

Memory Profiling

Compared to CPU profiling, memory profiling is a bit more involved, and can be done in the following steps:

1. Enable Memory Profiling

cargo build --bin databend-query --release --features memory-profiling

2. Start with MALLOC_CONF

MALLOC_CONF=prof:true,lg_prof_interval:30 ./target/release/databend-query

lg_prof_interval:30: Profiles are dumped into a file for each allocation of 1 GiB (2^30 bytes).

3. Replace add2line with a Faster One

This will rocket your jeprof from 30 minutes to 3 seconds.

git clone https://github.com/gimli-rs/addr2line
cd addr2line
cargo b --examples -r
cp ./target/release/examples/addr2line <your-addr2line-find-with-whereis-addr2line>

4. Upgrade jeprof to the Latest Version

jeprof needs an upgrade because the old version doesn't support some parameters for creating flamegraphs. jeporf is a perl script, so the way to upgrade it is a little bit rough-and-ready.

First, find out the path of your local jeprof file:

whereis jeprof

Open and copy the latest version of jeprof, then overwrite your local copy with the copied script EXCEPT for the following two parameters:

my $JEPROF_VERSION = "5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756";
my $PPROF_VERSION = "2.0";

5. Create a Flamegraph

jeprof ./databend-query-main ./jeprof.206330.563.i563.heap --collapse | flamegraph.pl --reverse --invert --minwidth 3 > heap.svg

flamegraph.pl: Download from GitHub.
databend-query-main: Path to your executable.
jeprof.206330.563.i563.heap: Selects a heap file.

Alt text

References

FlameGraph
https://github.com/jemalloc/jemalloc/blob/dev/bin/jeprof.in
Databend, Cloud Lakehouse: https://github.com/datafuselabs/databend

This Week in Databend #72

December 14, 2022 · 5 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Multiple Catalogs

extends show databases SQL (#9152)

Stage

support select from URI (#9247)

Streaming Load

support file_format syntax in streaming load insert sql (#9063)

Planner

push down limit to union (#9210)

Query

use analyze table instead of optimize table statistic (#9143)
fast parse insert values (#9214)

Storage

use distinct count calculated by the xor hash function (#9159)
read_parquet read meta before read data (#9154)
push down filter to parquet reader (#9199)
prune row groups before reading (#9228)

Open Sharing

add prototype open sharing and add sharing stateful tests (#9177)

Code Refactoring 🎉

simplify the global data registry logic (#9187)

Storage

refactor deletion (#8824)

Build/Testing/CI Infra Changes 🔌

release databend deb package and databend with hive (#9138, #9241, etc.)

Bug Fixes 🔧

Format

support ASCII control code hex as format field delimiter (#9160)

Planner

prewhere_column empty and predicate is not const will return empty (#9116)
don't push down topk to Merge when it's child is Aggregate (#9183)
fix nullable column validity not equal (#9220)

Query

address unit test hang on test_insert (#9242)

Storage

too many io requests for read blocks during compact (#9128)
collect orphan snapshots (#9108)

What's On In Databend

Stay connected with the latest news about Databend.

Breaking Change: Unified File Format Options

To simplify, we're rolling out a set of unified file format options as follows for the COPY INTO command, the Streaming Load API, and all the other cases where users need to describe their file formats:

[ FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | XML} [ formatTypeOptions ] ) ]

Please note that the current format options starting with format_* will be deprecated.
... FORMAT CSV ... will still be accepted by the ClickHouse handler.
Support for customized formats created by CREATE FILE FORMAT ... will be added in a future release: ... FILE_FORMAT = (format_name = 'MyCustomCSV') .... .

Learn More

Open Sharing is a simple and secure data-sharing protocol designed for databend-query nodes running in a multi-cloud environment.

Simple & Free: Open Sharing is open-source and basically a RESTful API implementation.
Secure: Open Sharing verifies incoming requesters' identities and access permissions, and provides an audit log.
Multi-Cloud: Open Sharing supports a variety of public cloud platforms, including AWS, Azure, GCP, etc.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

We're about to run stage-related tests again using the Streaming Load API to move files to a stage instead of an AWS command like this:

aws --endpoint-url ${STORAGE_S3_ENDPOINT_URL} s3 cp s3://testbucket/admin/data/ontime_200.csv s3://testbucket/admin/stage/internal/s1/ontime_200.csv >/dev/null 2>&1

This is because Databend users do not need to take care of, or do not even know the stage paths that the AWS command requires.

Issue 8528: refactor stage related tests

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


ariesdevil	b41sh	BohuTANG	Chasen-Zhang	ClSlaid	dantengsky


drmingdrmer	hantmac	lichuang	mergify[bot]	PsiACE	RinChanNOWWW


soyeric128	sundy-li	wubx	Xuanwo	xudong963	youngsofun


ZhiHanZ	zhyass	zzzdong

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

Databend Indexing Explained

December 13, 2022 · 4 min read

wubx

When working with Databend, you don't bother maintaining indexes. Databend takes advantage of these indexing techniques to automatically build and manage indexes on the fly:

Min/Max index
Bloom index
Cluster key

Min/Max Index

Min/Max Index is the key indexing technique for OLAP databases. Databend Fuse Engine uses it as the main indexing method to build indexes and store them in snapshots, segments, and blocks. The following shows how the Min/Max Index works for a table in Databend.

First, use SHOW CREATE TABLE to find the initial snapshot file created for the table:

show create table ontime(
`Year` INT,  -- First column
...
) ENGINE=FUSE SNAPSHOT_LOCATION='1/458/_ss/71b460c61fa943d1a391d3118ebd984c_v1.json'

Downdoad and open the snapshot file with VSCODE:

{
    "format_version": 1,
    "snapshot_id": "71b460c6-1fa9-43d1-a391-d3118ebd984c",
    "timestamp": "2022-11-29T03:44:03.419194Z",
    "prev_snapshot_id": null,
    "schema": {
        "fields": [
            ... -- Field definitions
        ],
        "metadata": {}
    },  
    "summary": {
    "row_count": 90673588,
    "block_count": 200,
    "perfect_block_count": 0,
    "uncompressed_byte_size": 65821591614,
    "compressed_byte_size": 2761791374,
    "index_size": 1194623,
    "col_stats": {
           ...
           "0": {  --  Min/Max indexes for the first column 'Year' in the table
                "min": {
                    "Int64": 1987
                },
                "max": {
                    "Int64": 2004
                },
                "null_count": 0,
                "in_memory_size": 362694352,
                "distinct_of_values": 0
            },
           ...
        }，
    },
    "segments": [
        ...
        [
            "1/458/_sg/ddccbb022ba74387be0b41eefd16bbbe_v1.json",
            1
        ],
        ...
    ],
    "cluster_key_meta": null
}

The file above indicates that the min value of the first column is 1987 and the max is 2004. The indexes in a snapshot file can tell you whether the data you want to retrieve exists in the table. For example, no data would be returned for the following query if Databend cannot find a matching Min/Max interval in all snapshots:

select avg(DepDelay) from ontime where Year='2003';

Databend Fuse Engine stores the most important indexes in segment files. At the end of a snashot file, you can find information about which segments are related to the snapshot. Here's a sample segment file:

{
   "format_version":1,
   "blocks":[
       { -- block ...
           ...
                   "row_count": 556984,
                   "block_size": 405612604,
                   "file_size": 25302413,
                    "col_stats": {
                    ...
                   "0": {
                   "min": {
                       "Int64": 2003
                   },
                   "max": {
                       "Int64": 2003
                   },
                   "null_count": 0,
                   "in_memory_size": 2227936,
                   "distinct_of_values": 1
               },
                    ...
                    },
         "col_metas": {
          -- Used to record the start position and length of each column
         },
           "cluster_stats": null,
           "location": [
               "1/458/_b/e4f3795c79004f22b80ed5ee821edf23_v0.parquet",
               0
           ],
           "bloom_filter_index_location": [
               "1/458/_i_b_v2/e4f3795c79004f22b80ed5ee821edf23_v2.parquet",
               2
           ],
           "bloom_filter_index_size": 60207,
           "compression": "Lz4Raw"
           ...
       }    
   ],
   "summary": {
       "row_count": 11243809,
       "block_count": 25,
       "perfect_block_count": 25,
       "uncompressed_byte_size": 8163837349,
       "compressed_byte_size": 339392734,
       "index_size": 1200133,
       "col_stats": {
       ...
          "0": {
               "min": {
                   "Int64": 1988
               },
               "max": {
                   "Int64": 2003
               },
               "null_count": 0,
               "in_memory_size": 44975236,
               "distinct_of_values": 0
           },
       ...
       }
   }
}

From the sample above, we can see that a segment file contains its own Min/Max index information. So does a block file. The Min/Max indexes are layered and distributed among snapshots, segments, and blocks like this:

Alt text

When retrieving data for a query, Databend starts from the snapshot indexes and locates the corresponding segment by matching the Min/Max interval. Then, it looks up the indexes in the segment file to find the block where the required data is stored and reads data from the block file with information about the start position from col_metas. So Databend literally processes a query by finding the right segments and blocks with the Min/Max Index.

Bloom Index

For queries requiring an exact string match, Databend uses the Min/Max Index to find the right block first, and then locates the offsets with the bloom index information in bloom_filter_index_location to retrieve data from the block.

For more information about the Bloom Index, see https://databend.rs/blog/xor-filter.

Cluster Key

The Min/Max Index seems to work perfectly for Databend, but in fact, data is usually written into a table out of order. As a result, segments and blocks might be created with overlapped Min/Max intervals.

For example, you need to access up to three parquet files for a query condition like Age = 20 & Age = 35. If Age is set as the cluster key, Databend will sort the data by the Age column and combine as many small parquet files as possible.

Alt text

For more information about the cluster key, see https://databend.rs/doc/sql-commands/ddl/clusterkey/.

This Week in Databend #71

December 7, 2022 · 4 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Planner

optimize topk in cluster mode (#9092)

Query

support select * exclude [column_name | (col_name, col_name,...)] (#9009)
alter table flashback (#8967)
new table function read_parquet to read parquet files as a table (#9080)
support select * from @stage (#9123)

Storage

cache policy (#9062)
support hive nullable partition (#9064)

Code Refactoring 🎉

Memory Tracker

keep tracker state consistent (#8973)

REST API

drop ctx after query finished (#9091)

Bug Fixes 🔧

Configs

add more tests for hive config loading (#9074)

Planner

try to fix table name case sensibility (#9055)

Functions

vector_const like bug fix (#9082)

Storage

update last_snapshot_hint file when purge (#9060)

Cluster

try fix broken pipe or connect reset (#9104)

What's On In Databend

Stay connected with the latest news about Databend.

RESTORE TABLE

By the snapshot ID or timestamp you specify in the command, Databend restores the table to a prior state where the snapshot was created. To retrieve snapshot IDs and timestamps of a table, use FUSE_SNAPSHOT.

-- Restore with a snapshot ID
ALTER TABLE <table> FLASHBACK TO (SNAPSHOT => '<snapshot-id>');
-- Restore with a snapshot timestamp
ALTER TABLE <table> FLASHBACK TO (TIMESTAMP => '<timestamp>'::TIMESTAMP);

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Adding Build Information to Error Report

An error report currently only contains an error code and some information about why the error occurred. When build information is available, troubleshooting will become easier.

"Code: xx. Error: error msg... (version ...)"

Issue 9117: Add Build Information to the Error Report

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	b41sh	BohuTANG	dantengsky	drmingdrmer	everpcpc


lichuang	mergify[bot]	PsiACE	RinChanNOWWW	sandflee	soyeric128


sundy-li	TCeason	Xuanwo	xudong963	youngsofun	zhang2014


ZhiHanZ

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

This Week in Databend #70

November 30, 2022 · 5 min read

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Format

better checking of format options (#8981)
add basic schema infer for parquet (#9043)

Query

QualifiedName support 'db.table.' and 'table.' (#8965)
support bulk insert without exprssion (#8966)

Storage

add cache layer for fuse engine (#8830)
add system table system.memory_statistics (#8945)
add optimize statistic ddl support (#8891)

Code Refactoring 🎉

Base

remove common macros (#8936)

Format

TypeDeserializer get rid of FormatSetting (#8950)

Planner

refactor extract or predicate (#8951)

Processors

optimize join by merging build data block (#8961)

New Expression

allow sparse column id in chunk, redo #8789 with a new approach. (#9008)

Documentation 📔

i18n support with crowdin (#8987, #8997, etc.)

Bug Fixes 🔧

Base

try fix lost tracker (#8932)

Meta

fix share db bug, create DatabaseIdToName if need (#9006)

Mysql handler

fix mysql conns leak (#8894)

Processors

try fix update list memory leak (#9023)

Storage

read and write block in parallel when compact (#8921)

What's On In Databend

Stay connected with the latest news about Databend.

Infer Schema at a Glance

You usually need to create a table before loading data from a file stored on a stage or somewhere. Unfortunately, sometimes you might not know the file schema to create the table or are unable to input the schema due to its complexity.

Introducing the capability to infer schema from an existing file will make the work much easier. You will even be able to query data directly from a stage using a SELECT statement like select * from @my_stage.

INFER 's3://mybucket/data.csv' FILE_FORMAT = ( TYPE = CSV );
+-------------+---------+----------+
| COLUMN_NAME | TYPE    | NULLABLE |
|-------------+---------+----------|
| CONTINENT   | TEXT    | True     |
| COUNTRY     | VARIANT | True     |
+-------------+---------+----------+

We've added support for inferring the basic schema from parquet files in #9043, and we're now working on #7211 to implement select from @stage.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Add Tls Support for Mysql Handler

opensrv-mysql v0.3.0 that was released recently includes support for TLS. It sounds like a good idea to introduce it to Databend.

let (is_ssl, init_params) = opensrv_mysql::AsyncMysqlIntermediary::init_before_ssl(
    &mut shim,
    &mut r,
    &mut w,
    &Some(tls_config.clone()),
)
.await
.unwrap();

opensrv_mysql::secure_run_with_options(shim, w, ops, tls_config, init_params).await

Issue 8983: Feature: tls support for mysql handler

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	ariesdevil	b41sh	BohuTANG	dantengsky	drmingdrmer


everpcpc	flaneur2020	leiysky	lichuang	mergify[bot]	PsiACE


sandflee	soyeric128	sundy-li	TCeason	TracyZYJ	Xuanwo


xudong963	youngsofun	yufan022	zhang2014	zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Website
GitHub Discussions (Feature/Bug reports, Contributions)
Twitter (Get the news fast)
Slack Channel (For live discussion with the Community)

Speed up hive queries with databend

November 28, 2022 · 2 min read

sandflee

Backgroup

now databend support hive catalog to run hive queries, this docs shows how to set up databend-hive environment and run hive sqls.

How to set up databend-hive cluster

hiveserver&metastore&hdfs is supposed to be pre-installed.

download a databend-release with hive support, or build from source

## make sure JAVA_HOME is set
export JAVA_HOME=/path/to/java
export LD_LIBRARY_PATH=${JAVA_HOME}/lib/server:${LD_LIBRARY_PATH}
cargo build --features hive,storage-hdfs

setup a databend cluster, refer to deploying-databend
add hive catalog and hdfs storage to databend-query.toml

[storage]
type = "hdfs"

[storage.hdfs]
# hdfs namenode address,such as 127.0.0.1:8020
name_node = "xx"
root = ""

[catalogs.hive]
type = "hive"
# hive metastore address, such as 127.0.0.1:9083
address = "xx"

run databend-query with java&hadoop environment

export HADOOP_HOME=xxx
export JAVA_HOME=xxx, such as /Library/Java/JavaVirtualMachines/openjdk-11.jdk/Contents/Home
export LD_LIBRARY_PATH=$JAVA_HOME/lib/server:$LD_LIBRARY_PATH

./bin/databend-query -c ./databend-query.toml > query.log 2>&1 &

setup hive related settings with mysql client

set global sql_dialect = 'hive';

suggest settings:

-- for chinese users
set global timezone = 'Asia/Shanghai';
set global max_execute_time = 180000;

-- support hive nvl function
create FUNCTION nvl as (a,b) -> ifnull(a,b);

query hive data using mysql client or mysql jdbc client. Note: hive tables must be referred as hive.db.table

select * from hive.$db.$table limit 10;

Limititions

only support parquet table, not support orc,txt
not support struct&map&decimal hive data types
only support hive select queries, not support DDL, insert, DML sqls
not support hive udfs, hive functions are limited supported

hive features is now in beta stage, please feel free to report bugs&suggestions in databend issues.

Replace Bloom Filter with Xor Filter to Speed up Queries

November 23, 2022 · 4 min read

BohuTANG

Many optimizations are usually required for big data analytics to "reduce the distance to data." For example, using the Bloom Filter Index, queries can be filtered to determine whether data should be fetched from backend storage:

Alt text

Why We Replaced the Bloom Filter

Most popular databases use Bloom Filters to handle equivalent queries and avoid useless data readings. Databend also used the classic Bloom Filter algorithm in the first version (databend#6639). However, we found the Bloom Filter Index required plenty of storage space which even exceeded the data storage size (Databend automatically created Bloom indexes for some data types to make it easier for users to work with). The Bloom Filter Index didn't show a significant performance improvement because it is not much different from reading data directly from storage.

The reason is that Bloom Filter does not know the cardinality of the data when it is generated. Take the Boolean type as an example, the algorithm allocates space for it without considering the cardinality (2, True or False).

As a result, the Databend community began exploring new solutions and determined a feasible solution that uses HyperLoglog to sort out distinct values before allocating space.

At the TiDB User Conference on a Saturday in September, I met XP (@drmingdrmer) and talked about the solution again with him. He came up with using Trie to solve the problem. An excellent idea, but lots of work.

XP is a master for Trie, so I'm sure implementation would be no big deal for him. But I think some existing technologies might be able to help.

Alt text

Why Xor Filter?

A few explorations later, the Xor Filter algorithm, proposed by Daniel Lemire and his team in 2019: [Xor Filters: Faster and Smaller Than Bloom Filters, caught my attention(https://lemire.me/blog/2019/12/19/xor-filters-faster-and-smaller-than-bloom-filters/).

Alt text

I did a test (Xor Filter Bench) with the Rust version (xorfilter) and got a very positive result, so we replaced Bloom Filters with Xor Filters by databend#7860. Let's do a test and see how it performs with Xor Filters.

u64: 
xor bitmap encode:1230069 bytes, raw:8000000 bytes, ratio:0.15375863

bool:
xor bitmap encode:61 bytes, raw:1000000 bytes, ratio:0.000061

string:
xor bitmap encode:123067 bytes, raw:3000000 bytes, ratio:0.041022334

100000 records of the same key:
xor bitmap encode: 61 bytes, raw:3000000 bytes, ratio:0.000020333333

Test Environment

Databend: v0.8.122-nightly, single node
VM: 32 vCPU, 32 GiB (Cloud VM)
Object Storage: S3
Dataset: 10 billion records, 350G Raw Data, Xor Filter Index 700MB, all indexes and data are stored in object storage.
Table:

mysql> desc t10b;
+-------+-----------------+------+---------+-------+
| Field | Type            | Null | Default | Extra |
+-------+-----------------+------+---------+-------+
| c1    | BIGINT UNSIGNED | NO   | 0       |       |
| c2    | VARCHAR         | NO   |         |       |
+-------+-----------------+------+---------+-------+

Deploying Databend

Step 1: Download installation package

wget https://github.com/datafuselabs/databend/releases/download/v0.8.122-nightly/databend-v0.8.122-nightly-x86_64-unknown-linux-musl.tar.gz 
tar zxvf databend-v0.8.122-nightly-x86_64-unknown-linux-musl.tar.gz

You can find the following content after extracting the package:

tree
.
├── bin
│   ├── databend-meta
│   ├── databend-metabench
│   ├── databend-metactl
│   └── databend-query
├── configs
│   ├── databend-meta.toml
│   └── databend-query.toml
├── readme.txt
└── scripts
    ├── start.sh
    └── stop.sh

Step 2: Start Databend Meta

./bin/databend-meta -c configs/databend-meta.toml

Step 3：Configure Databend Query

For more information, refer to https://databend.rs/doc/deploy/deploying-databend

vim configs/databend-query.toml

... ...

[meta]
endpoints = ["127.0.0.1:9191"]
username = "root"
password = "root"
client_timeout_in_second = 60
auto_sync_interval = 60

# Storage config.
[storage]
# fs | s3 | azblob | obs
type = "s3"

# To use S3-compatible object storage, uncomment this block and set your values.
 [storage.s3]
bucket = "<your-bucket-name>"
endpoint_url = "<your-s3-endpoint>"
access_key_id = "<your-key>"
secret_access_key = "<your-access-key>"

Step 4: Start Databend Query

./bin/databend-query -c configs/databend-query.toml

Step 5: Construct the test data set

mysql -uroot -h127.0.0.1 -P3307

Construct 10 billion rows of test data (time: 16 min 0.41 sec):

create table t10b as select number as c1, cast(rand() as string) as c2 from numbers(10000000000)

Run a query (no cache, all data and indexes are in object storage):

mysql> select * from t10b where  c2='0.6622377673133426';
+-----------+--------------------+
| c1        | c2                 |
+-----------+--------------------+
| 937500090 | 0.6622377673133426 |
+-----------+--------------------+
1 row in set (20.57 sec)
Read 40000000 rows, 1009.75 MiB in 20.567 sec., 1.94 million rows/sec., 49.10 MiB/sec.

With the Xor Filter Index, it takes about 20 seconds for a single-node Databend to complete a point query on a scale of 10 billion rows. You can also speed up point queries by expanding a single Databend node into a cluster. Refer to https://databend.rs/doc/deploy/expanding-to-a-databend-cluster#deploying-a-new-query-node for details.

References

[1] Arxiv: Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

[2] Daniel Lemire’s blog: Xor Filters: Faster and Smaller Than Bloom Filters

[3] Databend, Cloud Lakehouse: https://github.com/datafuselabs/databend

Time Travel - Going Back to Move Forward

October 10, 2022 · 4 min read

wubx

Life is a journey, sometimes you must go back to move forward. When working with databases, you need to access the data history now and then. Time Travel is one of the most valuable features that Databend has rolled out. The time travel feature acts as a data recovery utility that enables you to restore a dropped table or get back a previous version of your data in a table. For example, when you accidentally delete a table or update some rows by mistake, you will need the help from Time Travel. This post sheds some light on what you can do with Time Travel in Databend.

First things first, you must know that not all the historical data can be restored based on the Databend retention policy. The default retention period is 24 hours, which means you can restore your historical data within 24 hours after it is deleted or outdated.

If you run SHOW TABLES HISTORY against a database, you will find the dropped tables (if any) and their drop time. The command does not list the dropped tables that have passed their retention period.

Restore a Dropped Table

When you delete a file from your computer, the file goes to the trash bin and you can restore the file by putting it back to its original folder.

In Databend, restoring a table is as easy as you restore a file from the trash bin. The UNDROP TABLE command makes a dropped table become available again with the data of the latest version. The "latest version" means that Databend recovers a table as well as the data that the table was holding when you deleted it.

Query Old Data

This is the most glorious part of the Time Travel story in Databend. When we say that you can get back a previous version of your data in a table, it does not mean that you roll back your table to an earlier point in time, it shows you the table's data at that point instead.

Databend automatically takes and saves a snapshot of your tables after each transaction that updates your table data. A version of a table's data practically refers to a snapshot that saves the data of the table when the snapshot was taken.

Databend provides a system function named FUSE_SNAPSHOT that enables you to find the saved snapshots. Each snapshot comes with a snapshot ID and a timestamp.

The saved snapshots are the behind-the-scenes heroes that make the time travel become true. So when you try to get back your history data, you need to tell Databend which version you want by the snapshot ID or the timestamp with an AT clause in the SELECT statement.

Create a New Table from Old Data

The Time Travel feature makes it possible to create an OLD table, which means you can create a table to hold and move on from a previous version of your data.

The CREATE TABLE statement can include a SNAPSHOT_LOCATION clause that allows you to specify a snapshot file that holds your old data. This command enables you to insert the data stored in the snapshot file when you create a table. Please note that the table you create must have same column definitions as the data from the snapshot.

Go without Time Travel

Tables in Databend support Time Travel out-of-the-box. However, you might not need it for some cases, for example, when you're running low of your storage space or the data is big but unimportant. Databend currently does not provide a setting to switch it off, but you can CREATE TRANSIENT TABLE.

Transient tables are used to hold transitory data that does not require a data protection or recovery mechanism. Dataebend does not hold historical data for a transient table so you will not be able to query from a previous version of the transient table with the Time Travel feature, for example, the AT clause in the SELECT statement will not work for transient tables. Please note that you can still drop and undrop a transient table.

Designing and Querying JSON in Databend

September 14, 2022 · 6 min read

baishen

JSON (JavaScript Object Notation) is a commonly used semi-structured data type. With the self-describing schema structure, JSON can hold all data types, including multi-level nested data types, such as Array, Object, etc. JSON takes advantage of high flexibility and easy dynamic expansion compared with the structured data types that must strictly follow the fields in a tabular data structure.

As data volume increases rapidly in recent years, many platforms have started to use and get the most out of semi-structured data types (such as JSON). For example, the JSON data shared by various platforms through open interfaces, and the public datasets and application logs stored in JSON format.

Databend supports structured data types, as well as JSON. This post dives deeply into the JSON data type in Databend.

Working with JSON in Databend

Databend stores semi-structured data as the VARIANT (also called JSON) data type:

CREATE TABLE test
  (
     id INT32,
     v1 VARIANT,
     v2 JSON
  );

The JSON data needs to be generated by calling the parse_json or try_parse_json function. The input string must be in the standard JSON format, including Null, Boolean, Number, String, Array, and Object. In case of parsing failure due to invalid string, the parse_json function will return an error while the try_parse_json function will return a NULL value.

INSERT INTO test VALUES
  (1, parse_json('{"a":{"b":1,"c":[1,2]}}'), parse_json('[["a","b"],{"k":"a"}]')),
  (2, parse_json('{"a":{"b":2,"c":[3,4]}}'), parse_json('[["c","d"],{"k":"b"}]'));

SELECT * FROM test;
+----+-------------------------+-----------------------+
| id | v1                      | v2                    |
+----+-------------------------+-----------------------+
| 1  | {"a":{"b":1,"c":[1,2]}} | [["a","b"],{"k":"a"}] |
| 2  | {"a":{"b":2,"c":[3,4]}} | [["c","d"],{"k":"b"}] |
+----+-------------------------+-----------------------+

JSON usually holds data of Array or Object type. Due to the nested hierarchical structure, the internal elements can be accessed through JSON PATH. The syntax supports the following delimiters:

:: Colon can be used to obtain the elements in an object by the key.
.: Dot can be used to obtain the elements in an object by the key. Do NOT use a dot as the first delimiter in a statement, or Databend would consider the dot as the delimiter to separate the table name from the column name.
[]: Brackets can be used to obtain the elements in an object by the key or the elements in an array by the index.

You can mix the three types of delimiters above.

SELECT v1:a.c, v1:a['b'], v1['a']:c, v2[0][1], v2[1].k FROM test;

+--------+-----------+-----------+----------+---------+
| v1:a.c | v1:a['b'] | v1['a']:c | v2[0][1] | v2[1].k |
+--------+-----------+-----------+----------+---------+
| [1,2]  | 1         | [1,2]     | "b"      | "a"     |
| [3,4]  | 2         | [3,4]     | "d"      | "b"     |
+--------+-----------+-----------+----------+---------+

The internal elements extracted through JSON PATH are also of JSON type, and they can be converted to basic types through the cast function or using the conversion operator ::.

SELECT cast(v1:a.c[0], int64), v1:a.b::int32, v2[0][1]::string FROM test;

+--------------------------+---------------+------------------+
| cast(v1:a.c[0] as int64) | v1:a.b::int32 | v2[0][1]::string |
+--------------------------+---------------+------------------+
| 1                        | 1             | b                |
| 3                        | 2             | d                |
+--------------------------+---------------+------------------+

Parsing JSON from GitHub

Many public datasets are stored in JSON format. We can import these data into Databend for parsing. The following introduction uses the GitHub events dataset as an example.

The GitHub events dataset (downloaded from GH Archive) uses the following JSON format:

{
  "id":"23929425917",
  "type":"PushEvent",
  "actor":{
    "id":109853386,
    "login":"teeckyar-bot",
    "display_login":"teeckyar-bot",
    "gravatar_id":"",
    "url":"https://api.github.com/users/teeckyar-bot",
    "avatar_url":"https://avatars.githubusercontent.com/u/109853386?"
  },
  "repo":{
    "id":531248561,
    "name":"teeckyar/Times",
    "url":"https://api.github.com/repos/teeckyar/Times"
  },
  "payload":{
    "push_id":10982315959,
    "size":1,
    "distinct_size":1,
    "ref":"refs/heads/main",
    "head":"670e7ca4085e5faa75c8856ece0f362e56f55f09",
    "before":"0a2871cb7e61ce47a6790adaf09facb6e1ef56ba",
    "commits":[
      {
        "sha":"670e7ca4085e5faa75c8856ece0f362e56f55f09",
        "author":{
          "email":"support@teeckyar.ir",
          "name":"teeckyar-bot"
        },
        "message":"1662804002 Timehash!",
        "distinct":true,
        "url":"https://api.github.com/repos/teeckyar/Times/commits/670e7ca4085e5faa75c8856ece0f362e56f55f09"
      }
    ]
  },
  "public":true,
  "created_at":"2022-09-10T10:00:00Z",
  "org":{
    "id":106163581,
    "login":"teeckyar",
    "gravatar_id":"",
    "url":"https://api.github.com/orgs/teeckyar",
    "avatar_url":"https://avatars.githubusercontent.com/u/106163581?"
  }
}

From the data above, we can see that the actor, repo, payload, and org fields have a nested structure and can be stored as JSON. Others can be stored as basic data types. So we can create a table like this:

CREATE TABLE `github_data`
             (
                          `id`   VARCHAR,
                          `type` VARCHAR,
                          `actor` JSON,
                          `repo` JSON,
                          `payload` JSON,
                          `public` BOOLEAN,
                          `created_at` timestamp,
                          `org` json
             );

Use the COPY INTO command to load the data:

COPY INTO github_data
FROM 'https://data.gharchive.org/2022-09-10-10.json.gz'
FILE_FORMAT = (
  compression = auto
  type = NDJSON
);

The following code returns the top 10 projects with the most commits:

SELECT   repo:name,
         count(id)
FROM     github_data
WHERE    type = 'PushEvent'
GROUP BY repo:name
ORDER BY count(id) DESC
LIMIT    10;

+----------------------------------------------------------+-----------+
| repo:name                                                | count(id) |
+----------------------------------------------------------+-----------+
| "Lombiq/Orchard"                                         | 1384      |
| "maique/microdotblog"                                    | 970       |
| "Vladikasik/statistic"                                   | 738       |
| "brokjad/got_config"                                     | 592       |
| "yanonono/booth-update"                                  | 537       |
| "networkoperator/demo-cluster-manifests"                 | 433       |
| "kn469/web-clipper-bed"                                  | 312       |
| "ufapg/jojo"                                             | 306       |
| "bj5nj7oh/bj5nj7oh"                                      | 291       |
| "appseed-projects2/500f32d3-8019-43ee-8f2a-a273163233fb" | 247       |
+----------------------------------------------------------+-----------+

The following code returns the top 10 users with the most forks:

SELECT   actor:login,
         count(id)
FROM     github_data
WHERE    type='ForkEvent'
GROUP BY actor:login
ORDER BY count(id) DESC
LIMIT    10;

+-----------------------------------+-----------+
| actor:login                       | count(id) |
+-----------------------------------+-----------+
| "actions-marketplace-validations" | 191       |
| "alveraboquet"                    | 59        |
| "ajunlonglive"                    | 50        |
| "Shutch420"                       | 13        |
| "JusticeNX"                       | 13        |
| "RyK-eR"                          | 12        |
| "DroneMad"                        | 10        |
| "UnqulifiedEngineer"              | 9         |
| "PeterZs"                         | 8         |
| "lgq2015"                         | 8         |
+-----------------------------------+-----------+

Performance Optimization

The JSON data generally is saved in plaintext format and needs to be parsed to generate the enumeration value of serde_json::Value every time the data is read. Compared to other basic data types, handling JSON data takes more parsing time and needs more memory space.

Databend has improved the read performance of JSON data using the following methods:

To speed up the parsing and reduce memory usage, Databend stores the JSON data as JSONB in binary format and uses the built-in j_entry structure to hold data type and offset position of each element.
Adding virtual columns to speed up the queries. Databend extracts the frequently queried fields and the fields of the same data type and stores them as separate virtual columns. Data will be directly read from the virtual columns when querying, which makes Databend achieve the same performance as querying other basic data types.

Find Peter Parker in Databend

September 8, 2022 · 2 min read

wubx

The most impressive part of the movie Spider-Man: No Way Home is "three generations coming together. In the story, when the spider-man's friend repeats the magic words "Find Peter Parker!", we surprisingly see two old friends on the screen, the previous generations of spider-man. They travel from other universes to join forces and develop cures for the villains.

Did you know that you have a similar magic power in Databend? That is, you can always get back the previous versions of your data in a few simple steps whenever you need them. The secret is Databend automatically creates and saves snapshots with timestamps of your tables when a data updating occurs, so you can track the history of a table and see how what it looked like at a time point in the past.

The following code creates a table first, and then inserts values with three separate SQL statements:

create table spiderman(gen int, nickname varchar);

insert into spiderman values(1,'Peter-1');
insert into spiderman values(2,'Peter-2');
insert into spiderman values(3,'Peter-3');

Databend creates and saves three snapshots for the code above. Each one holds a historical version of the data in the table.

To find them, use the system function FUSE_SNAPSHOT. The function returns everything you may need to know about the saved snapshots of a table, such as the snapshot IDs, timestamps, and locations.

select snapshot_id,previous_snapshot_id, timestamp from fuse_snapshot('default','spiderman');

---
+----------------------------------+----------------------------------+----------------------------+
| snapshot_id                      | previous_snapshot_id             | timestamp                  |
+----------------------------------+----------------------------------+----------------------------+
| 34b8df220edc4d8cb9e3e76118788686 | 4bb479751b7144d8aa2b53e5b281453f | 2022-08-30 01:18:53.202724 |
| 4bb479751b7144d8aa2b53e5b281453f | a2801ed9656d42c9812f2921214f0795 | 2022-08-30 01:18:35.597615 |
| a2801ed9656d42c9812f2921214f0795 | NULL                             | 2022-08-30 01:18:21.750208 |
+----------------------------------+----------------------------------+----------------------------+

You can now query the history data with a snapshot or timestamp by including the AT clause in the SELECT statement:

select * from spiderman at(snapshot=>'a2801ed9656d42c9812f2921214f0795');

---
+------+----------+
| gen  | nickname |
+------+----------+
| 1    | Peter-1  |
+------+----------+

select * from spiderman at(timestamp=>'2022-08-30 01:18:21.750208'::timestamp);

---
+------+----------+
| gen  | nickname |
+------+----------+
| 1    | Peter-1  |
+------+----------+

The "magic" explained above is part of the powerful Time Travel feature of Databend that enables you to query, back up, or restore from a specified historical version of your data. That's not all about the feature. You can do more with the snapshots to make your work easier. Join the Databend community to find out more "magic tricks".

What's Fresh in Databend v0.8

August 18, 2022 · 7 min read

Xuanwo

Hello, everyone! I'm Xuanwo. Today, on behalf of the Databend community, I would like to announce the official release of v0.8.

Development of Databend v0.8 started on March 28th, with 5000+ commits and 4600+ file changes. In the last 5 months, the community of 120+ contributors added 420K lines of code and removed 160K lines, equivalent to rewriting Databend once. In this release, the community made significant improvements to the SQL Planner framework and migrated all SQL statements to the new Planner, providing full JOIN and subquery support.

Click here to download Databend v0.8

Let's see what has been done in v0.8.

What's Databend?

Databend is a modern cloud data warehouse based on Rust that enables high-performance, elastic and scalable real-time data analysis and activates the data potential of users.

databend-arch

Significant improvements

New Planner: JOIN! JOIN! JOIN!

To better support complex SQL queries and improve user experience, Databend v0.8 is designed with a new Planner framework.

Databend has added JOIN and proper subquery support, driven by New Planner.

select vip_info.Client_ID, vip_info.Region 
    from vip_info right 
    join purchase_records 
    on vip_info.Client_ID = purchase_records.Client_ID;

New Parser: The Best Parser！

While refactoring Planner, the databend community has implemented a new nom-based Parser that balances development efficiency with user experience.

New Parser makes it easy for developers to design/develop/test complex SQL syntax in an intuitive way

COPY
    ~ INTO ~ #copy_unit
    ~ FROM ~ #copy_unit
    ~ ( FILES ~ "=" ~ "(" ~ #comma_separated_list0(literal_string) ~ ")")?
    ~ ( PATTERN ~ "=" ~ #literal_string)?
    ~ ( FILE_FORMAT ~ "=" ~ #options)?
    ~ ( VALIDATION_MODE ~ "=" ~ #literal_string)?
    ~ ( SIZE_LIMIT ~ "=" ~ #literal_u64)?

It also gives the user specific and precise information about the error.

MySQL [(none)]> select number from numbers(10) as t inner join numbers(30) as t1 using(number);
ERROR 1105 (HY000): Code: 1065, displayText = error:
  --> SQL:1:8
  |
1 | select number from numbers(10) as t inner join numbers(30) as t1 using(number)
  |        ^^^^^^ column reference is ambiguous

No more worrying about not knowing what's wrong with SQL. Visit The New Databend SQL Planner for more information.

New Features

In addition to the newly designed Planner, the Databend community has implemented a number of new features.

COPY Enhancement

COPY capabilities have been greatly enhanced, and Databend can now:

Copy data from any supported storage service (even https!)

COPY 
    INTO ontime200 
    FROM 'https://repo.databend.rs/dataset/stateful/ontime_2006_[200-300].csv' 
    FILE_FORMAT = (TYPE = CSV)

Support for copying compressed files

COPY 
    INTO ontime200 
    FROM 's3://bucket/dataset/stateful/ontime.csv.gz' 
    FILE_FORMAT = (TYPE = CSV COMPRESSION=AUTO)

UNLOAD data to any supported storage service

COPY 
    INTO 'azblob://bucket/'  
    FROM ontime200
    FILE_FORMAT = (TYPE = PARQUET)

Hive Support

Databend v0.8 designed and developed the Multi Catalog and implemented Hive Metastore support on top of it!

Databend can now interface directly to Hive and read data from HDFS.

select * from hive.default.customer_p2 order by c_nation;

Time Travel

A long time ago, the Databend community shared an implementation of the underlying FUSE Engine, From Git to Fuse Engine, where one of the most important features was the support for time travel, allowing us to query data tables at any point in time.

Starting from v0.8, this feature is now officially installed and we can now

Query the data table for a specified time

-- Travel to the time when the last row was inserted
select * from demo at (TIMESTAMP => '2022-06-22 08:58:54.509008'::TIMESTAMP); 
+----------+
| c        |
+----------+
| batch1.1 |
| batch1.2 |
| batch2.1 |
+----------+

Recover mistakenly deleted data tables

DROP TABLE test;

SELECT * FROM test;
ERROR 1105 (HY000): Code: 1025, displayText = Unknown table 'test'.

-- un-drop table
UNDROP TABLE test;

-- check
SELECT * FROM test;
+------+------+
| a    | b    |
+------+------+
|    1 | a    |
+------+------+

Make business data have more security!

CTE Support

CTE (Common Table Expression) is a frequently used feature in OLAP business to define a temporary result set within the execution of a single statement, which is valid only during the query period, enabling the reuse of code segments, improving readability and better implementation of complex queries.

Databend v0.8 re-implements the CTE based on New Planner and now users can happily use WITH to declare the CTE.

WITH customers_in_quebec 
     AS (SELECT customername, 
                city 
         FROM   customers 
         WHERE  province = 'Québec') 
SELECT customername 
FROM   customers_in_quebec
WHERE  city = 'Montréal' 
ORDER  BY customername; 

In addition to these features mentioned above, Databend v0.8 also supports UDFs, adds DELETE statements, further enhances support for semi-structured data types, not to mention the numerous SQL statement improvements and new methods added. Thanks to all the contributors to the Databend community, without you all the new features mentioned here would not have been possible!

Quality Enhancement

Feature implementation is just the first part of product delivery. In Databend v0.8, the community introduced the concept of engineering quality, which evaluates the quality of Databend development in three dimensions: users, contributors, and community.

Reassuring users

In order for users to use Databend with confidence, the community has added a lot of tests over the last three months, fetching stateless test sets from YDB and others, adding stateful tests for ontime, hits and other datasets, putting SQL Logic Test online to cover all interfaces, and enabling SQL Fuzz testing to cover boundary cases.

Furthermore, the community has also gone live with Databend Perf to do continuous performance testing of Databend in production environments to catch unexpected performance regressions in time.

Make contributors comfortable

Databend is a large Rust project that has been criticized by the community for its build time.

To improve this issue and make contributors feel comfortable, the community went live with a highly configurable, specially tuned Self-hosted Runner to perform integration tests for PR and enabled several services or tools such as Mergify, mold, dev-tools, etc. to optimize the CI process.

We also initiated a new plan to restructure the Databend project, splitting the original huge query crate into multiple sub-crates to avoid, as much as possible, the situation of changing one line of code and check execution for five minutes.

Keeping the community happy

Databend is a contributor and participant in the open source community. During the development of v0.8, the Databend community established the principle of Upstream First, actively following and adopting the latest upstream releases, giving feedback on known bugs, contributing their own patches, and starting Tracking issues of upstream first violation to keep up with the latest developments.

The Databend community is actively exploring integration with other open source projects and has already implemented integration and support for third-party drivers such as Vector, sqlalchemy, clickhouse-driver, etc.

Next Steps

Databend v0.8 is a solid foundation release with a new Planner that makes it easier to implement features and make optimizations. In version 0.9, we expect improvements in the following areas.

Query Result Cache
JSON Optimization
Table Share
Processor Profiling
Resource Quota
Data Caching

Please check the Release proposal: Nightly v0.9 for the latest news~

Get going now!

Visit the release log and download the latest version to learn more, and feel free to submit feedback using Github Issues if you encounter problems!

Benchmarking Databend using TPC-H

August 8, 2022 · 5 min read

BohuTANG

This post gives you a general idea about the TPC-H benchmark and explains how to run a TPC-H benchmark on Databend.

What's TPC-H?

TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.

The TPC-H benchmark simulates a system for online sales of parts and components and defines eight tables in total. The structure, data volume, and mutual relationship of each table are shown in the figure below:

The benchmark workload consists of twenty-two decision support queries that must be executed as part of the TPC-H benchmark. Each TPC-H query asks a business question and includes the corresponding query to answer the question. More information about TPC-H can be found at https://www.tpc.org/tpch/.

Running TPC-H Benchmark on Databend

This section describes the steps to run the TPC-H benchmark on Databend and provides the related scripts.

Step 1: Generate test data with TPC-H Docker

The following code pulls a docker image and allocates the data in the path where you are running the TPC-H benchmark.

docker pull ghcr.io/databloom-ai/tpch-docker:main
docker run -it -v "$(pwd)":/data ghcr.io/databloom-ai/tpch-docker:main dbgen -vf -s 1

TPC-H comes with various data set sizes to test different scale factors. You can use the -s option to set scale factor in the command (for example, the code above sets the scale factor to 1). For more information about the command, see https://github.com/databloom-ai/TPCH-Docker.

SF (Gigabytes)	Size
1	Consists of the base row size (several million elements).
10	Consists of the base row size x 10.
100	Consists of the base row size x 100 (several hundred million elements).
1000	Consists of the base row size x 1000 (several billion elements).

Step 2: Create database and tables

CREATE DATABASE IF NOT EXISTS tpch;

USE tpch;

CREATE TABLE IF NOT EXISTS nation
  (
     n_nationkey INTEGER NOT NULL,
     n_name      VARCHAR NOT NULL,
     n_regionkey INT NOT NULL,
     n_comment   VARCHAR
  );

CREATE TABLE IF NOT EXISTS region
  (
     r_regionkey INT NOT NULL,
     r_name      VARCHAR NOT NULL,
     r_comment   VARCHAR
  );

CREATE TABLE IF NOT EXISTS part
  (
     p_partkey     INT NOT NULL,
     p_name        VARCHAR NOT NULL,
     p_mfgr        VARCHAR NOT NULL,
     p_brand       VARCHAR NOT NULL,
     p_type        VARCHAR NOT NULL,
     p_size        INT NOT NULL,
     p_container   VARCHAR NOT NULL,
     p_retailprice FLOAT NOT NULL,
     p_comment     VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS supplier
  (
     s_suppkey   INT NOT NULL,
     s_name      VARCHAR NOT NULL,
     s_address   VARCHAR NOT NULL,
     s_nationkey INT NOT NULL,
     s_phone     VARCHAR NOT NULL,
     s_acctbal   FLOAT NOT NULL,
     s_comment   VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS partsupp
  (
     ps_partkey    INT NOT NULL,
     ps_suppkey    INT NOT NULL,
     ps_availqty   INT NOT NULL,
     ps_supplycost FLOAT NOT NULL,
     ps_comment    VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS customer
  (
     c_custkey    INT NOT NULL,
     c_name       VARCHAR NOT NULL,
     c_address    VARCHAR NOT NULL,
     c_nationkey  INT NOT NULL,
     c_phone      VARCHAR NOT NULL,
     c_acctbal    FLOAT NOT NULL,
     c_mktsegment VARCHAR NOT NULL,
     c_comment    VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS orders
  (
     o_orderkey      INT NOT NULL,
     o_custkey       INT NOT NULL,
     o_orderstatus   VARCHAR NOT NULL,
     o_totalprice    FLOAT NOT NULL,
     o_orderdate     DATE NOT NULL,
     o_orderpriority VARCHAR NOT NULL,
     o_clerk         VARCHAR NOT NULL,
     o_shippriority  INT NOT NULL,
     o_comment       VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS lineitem
  (
     l_orderkey      INT NOT NULL,
     l_partkey       INT NOT NULL,
     l_suppkey       INT NOT NULL,
     l_linenumber    INT NOT NULL,
     l_quantity      FLOAT NOT NULL,
     l_extendedprice FLOAT NOT NULL,
     l_discount      FLOAT NOT NULL,
     l_tax           FLOAT NOT NULL,
     l_returnflag    VARCHAR NOT NULL,
     l_linestatus    VARCHAR NOT NULL,
     l_shipdate      DATE NOT NULL,
     l_commitdate    DATE NOT NULL,
     l_receiptdate   DATE NOT NULL,
     l_shipinstruct  VARCHAR NOT NULL,
     l_shipmode      VARCHAR NOT NULL,
     l_comment       VARCHAR NOT NULL
  ); 

Step 3: Load test data to Databend

This step uses the HTTP API v1/streaming_load to load the test data to Databend. More information about this API can be found at https://databend.rs/doc/load-data/local.

The code below connects to Databend using the Root user. Please note that the root user only works when you access Databend from localhost. You will need to create new users and grant proper privileges first to connect to Databend remotely.

#!/bin/bash

for t in customer lineitem nation orders partsupp part region supplier
do
    echo "$t"
    insert_sql="insert into tpch.$t file_format = (type = CSV skip_header = 0 field_delimiter = '|' record_delimiter = '\n')"
    curl -s -u root: -XPUT "http://localhost:8000/v1/streaming_load" -H "insert_sql: ${insert_sql}" -F 'upload=@"./'$t'.tbl"' > /dev/null 2>&1
done

Step 4: Run TPC-H queries

All the definitions of the TPC-H queries can be found at https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf. You can simply run them by copying and pasting the scripts to Databend.

The Databend team ran the TPC-H benchmark around two months ago and uploaded their queries and results to GitHub. You can find them at https://github.com/datafuselabs/databend/tree/main/tests/suites/0_stateless/13_tpch. Please note that Databend now uses the new planner by default, so you DO NOT need to enable it any more before running the queries.

Many Ways to Load Data into Databend

July 28, 2022 · One min read

Eric

Data is important. If you're moving your business to the cloud to take advantage of the cloud-native features, the first thing you need to consider might be how to load your data to the cloud.

There are many ways to load data into Databend. You can use the command-line interface (CLI), API, or the mysqldump client utility, depending on where your data is stored.

The powerful COPY INTO commands allow you to load data from:

Files in an S3 bucket or a blob storage container.
Staged files (internal or external).
Files in a remote server.

If you have local data files to load, use the HTTP API v1/streaming_load to upload them to Databend.

Please note that Databend supports loading data from files in these formats:

CSV
JSON
Parquet

If you're coming from MySQL, Databend can also use a dump file (*.sql) created by the mysqldump client utility to load your data from MySQL.

Detailed explanations about loading data in different scenarios can be found at https://databend.rs/doc/load-data.

They Use Mergify - Databend

July 20, 2022 · 5 min read

Xuanwo

tip

This post was originally published by Anne-Laure Civeyrac on https://mergify.com.

Every day, major projects use Mergify to automate their GitHub workflow. Whether they have a core team of 3 or 50 people, the one thing they all have in common is that the project leads are willing to let their developers focus on what’s really important—code. So we decided to meet with some of them to get to know more about the challenges they face and discover how Mergify helps their teams be more efficient when it comes to pull requests. This time, we sat down (virtually) with Xuanwo, an infrastructure engineer who oversees automation, distributed systems and storage for the Databend project.

Xuanwo

Please could you give us a brief outline of the Databend project.

Of course! So Databend is a modern elasticity and performance cloud data warehouse. It uses the latest techniques in vectorized query processing to allow people to do blazing-fast data analytics on object storage platforms like S3, Azure Blob, or MinIO. It was mainly inspired by ClickHouse and Snowflake and focuses on online analytical processing.

How many people are currently working on the project?

Right now, we have more than 100 contributors on Databend, with about 30 of them contributing continuously.

What’s your GitHub workflow on this project?

Databend is a very new project and it doesn’t have a stable release yet, so our pull request [PR] workflow is quite simple. All our contributions go through GitHub PRs. For every PR, we use GitHub Actions as the CI, where we run cargo check, cargo fmt, cargo clippy, and all our test cases. If all the checks pass, we merge the PR. And once on the main branch, the PR runs production CI, which goes through all test cases with the release build. We then release a nightly version daily, uploading our release builds to GitHub releases and the Docker Hub Registry.

How many people need to approve PRs?

We need two approvals for a PR to be merged.

What are the main challenges with this workflow?

Mainly the PR merge speed! Our developers are coding in Rust, which is known for not being very good with compilation speed. We typically need about 30 minutes to finish all our checks, and most of this time is consumed by rustc, the compiler for Rust. To make Rust compile faster, we set up our own self-hosted GitHub Actions runners, which have a very high performance. Thanks to this change, our PR merge time was reduced from 60 to 30 minutes.

What made the project team start using Mergify in the first place?

Before coming across Mergify, we enabled the option on GitHub that requires code to be updated before merging. But it added a lot of work for our maintainers, who had to merge the main branch repeatedly.

So to make their lives easier, we implemented Mergify to update the branch automatically and merge PRs after all tests have passed.

Which Mergify features do you use the most?

The automatic merge! And with the help of the Mergify team, we enabled the merge queue feature as well. So now, we can merge multiple PRs simultaneously with only one CI check. This helped us significantly reduce the waiting time for the CI.

What is your favorite Mergify feature and why?

Oh, I love PR actions. Although I can implement the same features with GitHub Actions, I find Mergify’s PR actions more simple and exciting. For example, last week, we introduced a new requirement that every PR must be semantic—we want all PRs to contain a valid title starting with a type like “fix”, “feat”, or “refactor”.

Actions

With the help of Mergify, I only needed to create some rules, such as adding corresponding labels if the PR title starts with “fix”, comment in the PR with a help message if the PR title doesn’t fulfill the requirements, and add post checks so that we can mark the PR as not mergeable.

This feature is very cool, and I would love to have it on issues too!

What has been the most significant impact of using Mergify on your team’s performance so far?

Our teams don’t need to worry about merges anymore! We can start jobs without having to wait on the PRs.

What would be your n°1 tip for someone new to Mergify?

Don’t try to migrate all your workloads to Mergify in one go. Migrating things one by one and progressively will make your lives easier.

If you had time to contribute to the Mergify project, what would your contribution be about?

I used to contribute to Mergify regarding a small documentation typo. But I am not comfortable contributing to more complex issues because Mergify uses Python, a programming language I’m unfamiliar with. But if I could contribute to one feature, it would be about commands. Mergify only supports a small set of commands. Maybe we could define new commands in PR rules and allow users to call them with Mergify bots. This feature would be exciting to me!

The New Databend SQL Planner

July 19, 2022 · 5 min read

Yu Lei

To support complex SQL queries and improve user experience, a large-scale refactoring work for Databend's SQL planner was started several months ago. At present, the refactoring is coming to an end. You can now modify the Session settings of Databend as follows to enable the new planner for early access:

Feature Highlights

A more friendly query experience

Data analysts and developers usually get various errors when coding SQL queries, and troubleshooting can be a nightmare when the queries are complex. I hate MySQL's error prompts because I have coded a query with dozens of JOIN clauses.

The new planner now includes some passes for strict semantic checking so that most errors can be intercepted during the compilation. A new error prompt algorithm was also introduced to help users locate the errors. When there is invalid syntax in your SQL query (for example, misspelled keywords or missing clauses), you will receive an error message that is more instructive.

If your SQL query has a semantic error (for example, you reference a column that is ambiguous, or a column does not exist at all), Databend can help you locate it.

You can also get a better experience when coding complex queries:

Support for JOIN queries and correlated subqueries

The new SQL planner supports JOIN queries (INNER JOIN, OUTER JOIN, CROSS JOIN) and correlated subqueries, and provides a Hash Join algorithm to execute JOIN queries.

For more information about how to use JOIN in Databend, go to https://databend.rs/doc/reference/sql/query-syntax/dml-join

JOIN is a very important part of the OLAP query. In traditional star and snowflake schemas, we join dimensional tables with fact tables through the JOIN query to generate the resulting report.

TPC-H Benchmark is a set of OLAP query benchmarks developed by the TPC committee to evaluate the OLAP capabilities of database systems. It contains the following eight tables:

Lineitem: Holds product information.
Orders: Holds order information.
Customer: Holds customer information.
Part: Holds parts information.
Supplier: Holds supplier information.
Partsupp: Parts-Supplier Relationship Table
Nation: Holds nation information.
Region: Holds region information.
TPC-H includes 22 complex queries, corresponding to different business needs. The new SQL planner now supports the Q9 query that calculates the profit amount for a specified year and region using a large number of JOIN calculations:

Correlated subqueries are also an essential part of SQL for coding complex queries. The Q4 query of TPC-H shows the order delivery status of various priority levels over a period of time and uses a correlated subquery with the EXISTS clause to filter overdue orders:

Brand New Architecture
We redesigned the process of SQL parsing for the new SQL planner to support more complex semantic analysis and SQL optimization.

After the client sends a SQL statement to the databend-query server, the components in the new SQL planner process the SQL statement in the order shown in the flowchart below before returning the query result to the client:

The Parser starts to parse a SQL query after receiving it. If a syntax error is found during the parsing, the error information will be directly returned to the client; If the parsing is successful, an AST (Abstract Syntax Tree) for the query will be constructed.

Parser

To provide more powerful syntax analysis functions and a better development experience, we have developed a DSL (Domain Specific Language) nom-rule based on the nom Parser combinator and rewritten SQL Parser based on this framework.

With this framework, we can easily define the syntax for a statement. Taking the CREATE TABLE statement as an example, we can use DSL to describe it as follows:

The elegant syntax brings more fun to the work of coding a parser. Try it out if you’re interested.

Binder

After the AST is successfully parsed by the Parser, we will semantically analyze it through Binder and generate an initial logical plan. During this process, we perform different types of semantic analysis:

Name resolution: Check the validity of the variables referenced in the SQL query by querying the relevant table and column object information in the Databend Catalog and bind the valid variables to their corresponding objects for subsequent analysis.
Type check: Check the validity of the expression according to the information obtained in the name resolution, and find a proper return type for the expression.
Subquery unnesting: Extract the subquery from the expression and translate it into relational algebra.
Grouping check: For queries with aggregate calculations, check whether non-aggregate columns are referenced.

With semantic analysis, we can eliminate most semantic errors and return them to the user during the compilation to provide the best troubleshooting experience.

Optimizer

After getting the initial logical plan, the optimizer will rewrite and optimize it and, finally, generate an executable physical plan.

The new planner introduced a set of Transformer Rule-based optimizer frameworks (Volcano/Cascades). An independent rule can be implemented by defining a relational algebra sub-tree structure pattern with related transform logic.

Take Predicate Push Down as a simple example:

We only need to define the pattern of the input plan:

And then implement a conversion function:

Interpreter

After the physical plan is generated by the Optimizer, we will translate it into an executable pipeline and hand it over to Databend's processor execution framework for calculation.

What's Next

Building a SQL planner from the ground up is a very challenging job, but the redesign and development let us find the most suitable architecture and functionalities for the system. In the future, we will continue to improve and consolidate the new SQL planner on these functions:

Cost-based Optimization (CBO)
Distributed query optimization
More optimization rules

Currently, we’re in the middle of migrating to the new SQL planner. We will release an announcement when the migration is complete (around July 2022). Stay tuned.

From Git to Fuse Engine

July 5, 2022 · 7 min read

BohuTANG

Databend, developed with Rust, is a new open-source data warehouse architected toward the cloud. It is committed to providing fast elastic expansion capabilities and a pay-as-you-go user experience. GitHub：https://github.com/datafuselabs/databend

Introduction

This post explains the Databend base: Fuse Engine, a powerful columnar storage engine. The engine was designed by the Databend community with the following principles: Powerful performance, simple architecture, and high reliability.

Before we start, check out a challenging task that Databend completed: With the Fuse Engine deployed on AWS S3, a transaction wrote 22.89 TB of raw data in around one and a half hour.

mysql> INSERT INTO ontime_new SELECT * FROM ontime_new;
Query OK, 0 rows affected (1 hour 34 min 36.82 sec)
Read 31619274180 rows, 22.89 TB in 5675.207 sec., 5.57 million rows/sec., 4.03 GB/sec.

Meanwhile, the following conditions were met as well:

Distributed transactions: Multiple computing nodes can read and write the same data simultaneously (This is the first problem that an architecture that separates storage from compute must solve).
Snapshot isolation: Different versions of data do not affect each other so you can do Zero-Copy Cloning for tables.
Retrospective ability: You are allowed to switch to any version of the data, so you can recover with the Time Travel feature.
Data merging: A new version of data can be generated after merging.
Simple and robust: Data relationships are described using files, and you can recover entire data system based on these files.

From above, you will find that Fuse Engine is "Git-inspired". Before introducing the design of Fuse Engine, let's take a look at how the bottom layer of Git works.

How Git Works

Git implements data version control (including branch, commit, checkout, and merge) in a distributed environment. Based on Git semantics, it is possible to create a distributed storage engine. There are also some products built on Git-like on the market, such as Nessie - Transactional Catalog for Data Lakes and lakeFS.

To better explore the underlying working mechanism of Git, we use Git semantics to complete a series of "data" operations from the perspective of the database.

Prepare a file named cloud.txt with the content:

2022/05/06, Databend, Cloud

Commit the file cloud.txt to Git.

git commit -m "Add cloud.txt"

Git generates a snapshot (Commit ID: 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71).

git log 
commit 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71 (HEAD)
Author: BohuTANG <overred.shuttler@gmail.com>
Date:   Fri May 6 16:44:21 2022 +0800

    Add cloud.txt

Prepare another file named warehouse.txt.

2022/05/07, Databend, Warehouse

Commit the file warehouse.txt to Git.

git commit -m "Add warehouse.txt"

Git generates another snapshot (Commit ID: 15af34e4d16082034e1faeaddd0332b3836f1424).

commit 15af34e4d16082034e1faeaddd0332b3836f1424 (HEAD)
Author: BohuTANG <overred.shuttler@gmail.com>
Date:   Fri May 6 17:41:43 2022 +0800

    Add warehouse.txt

commit 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71
Author: BohuTANG <overred.shuttler@gmail.com>
Date:   Fri May 6 16:44:21 2022 +0800

    Add cloud.txt

Git now keeps two versions of the data:

ID 15af34e4d16082034e1faeaddd0332b3836f1424，Version2
ID 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71，Version1

We can switch between versions by the Commit ID, which implements the functions of Time Travel and Table Zero-Copy. How does Git make it possible in the bottom layer? It's not rocket science. Git introduces these types of object files to describe the relationship:

Commit: Describes tree object information
Tree: Describes blob object information
Blob: Describes file information

HEAD File

First, we need to know the HEAD pointer:

cat .git/HEAD
15af34e4d16082034e1faeaddd0332b3836f1424

Commit File

The Commit file records metadata related to the commit, such as the current tree and parent, as well as the committer, etc.

File path:

.git/objects/15/af34e4d16082034e1faeaddd0332b3836f1424

File content:

git cat-file -p 15af34e4d16082034e1faeaddd0332b3836f1424

tree 576c63e580846fa6df2337c1f074c8d840e0b70a
parent 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71
author BohuTANG <overred.shuttler@gmail.com> 1651830103 +0800
committer BohuTANG <overred.shuttler@gmail.com> 1651830103 +0800

Add warehouse.txt

Tree File

The Tree file records all the files of the current version.

File path:

.git/objects/57/6c63e580846fa6df2337c1f074c8d840e0b70a

File content:

git cat-file -p 576c63e580846fa6df2337c1f074c8d840e0b70a

100644 blob 688de5069f9e873c7e7bd15aa67c6c33e0594dde    cloud.txt
100644 blob bdea812b9602ed3c6662a2231b3f1e7b52dc1ccb    warehouse.txt

Blob File

The Blob files are raw data files. You can view the file content using git cat-file (if you use Git to manage code, blobs are the code files).

git cat-file -p 688de5069f9e873c7e7bd15aa67c6c33e0594dde
2022/05/06, Databend, Cloud

git cat-file -p bdea812b9602ed3c6662a2231b3f1e7b52dc1ccb
2022/05/07, Databend, Warehouse

Fuse Engine

Databend's Fuse Engine was designed in a way similar to Git. It introduces three description files：

Snapshot: Describes segment object information.
Segment: Describes block object information.
Block: Describes parquet file information.

Let's repeat the operations we just did with Git in Fuse Engine.

Create a table.

CREATE TABLE git(file VARCHAR, content VARCHAR);

Write cloud.txt to Fuse Engine.

 INSERT INTO git VALUES('cloud.txt', '2022/05/06, Databend, Cloud');

Fuse Engine generates a snapshot (Snapshot ID: 6450690b09c449939a83268c49c12bb2).

CALL system$fuse_snapshot('default', 'git');
*************************** 1. row ***************************
         snapshot_id: 6450690b09c449939a83268c49c12bb2
   snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
      format_version: 1
previous_snapshot_id: NULL
       segment_count: 1
         block_count: 1
           row_count: 1
  bytes_uncompressed: 68
    bytes_compressed: 351

Write warehouse.txt to Fuse Engine.

INSERT INTO git VALUES('warehouse.txt', '2022/05/07, Databend, Warehouse');

Fuse Engine generates another snapshot (Snapshot ID efe2687fd1fc48f8b414b5df2cec1e19) that is linked to the previous one (Snapshot ID: 6450690b09c449939a83268c49c12bb2).

CALL system$fuse_snapshot('default', 'git');
*************************** 1. row ***************************
         snapshot_id: efe2687fd1fc48f8b414b5df2cec1e19
   snapshot_location: 53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json
      format_version: 1
previous_snapshot_id: 6450690b09c449939a83268c49c12bb2
       segment_count: 2
         block_count: 2
           row_count: 2
*************************** 2. row ***************************
         snapshot_id: 6450690b09c449939a83268c49c12bb2
   snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
      format_version: 1
previous_snapshot_id: NULL
       segment_count: 1
         block_count: 1
           row_count: 1

Fuse Engine now keeps two versions of the data:

ID efe2687fd1fc48f8b414b5df2cec1e19，Version2
ID 6450690b09c449939a83268c49c12bb2，Version1

That's very similar to Git. Right?

HEAD

Git needs a HEAD as an entry. So does Fuse Engine. Check the HEAD of Fuse Engine:

SHOW CREATE TABLE git\G;
*************************** 1. row ***************************
       Table: git
Create Table: CREATE TABLE `git` (
  `file` VARCHAR,
  `content` VARCHAR
) ENGINE=FUSE SNAPSHOT_LOCATION='53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json'

SNAPSHOT_LOCATION is the HEAD, which by default points to the latest snapshot efe2687fd1fc48f8b414b5df2cec1e19, then how do we switch to the snapshot data whose ID is 6450690b09c449939a83268c49c12bb2? First, check the snapshot information of the current table:

CALL system$fuse_snapshot('default', 'git')\G;
*************************** 1. row ***************************
         snapshot_id: efe2687fd1fc48f8b414b5df2cec1e19
   snapshot_location: 53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json
      format_version: 1
previous_snapshot_id: 6450690b09c449939a83268c49c12bb2
       segment_count: 2
         block_count: 2
           row_count: 2
*************************** 2. row ***************************
         snapshot_id: 6450690b09c449939a83268c49c12bb2
   snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
      format_version: 1
previous_snapshot_id: NULL
       segment_count: 1
         block_count: 1
           row_count: 1

Then create a new table (git_v1) and point SNAPSHOT_LOCATION to the snapshot file you need:

CREATE TABLE git_v1(`file` VARCHAR, `content` VARCHAR) SNAPSHOT_LOCATION='53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json';

SELECT * FROM git_v1;
+-----------+-----------------------------+
| file      | content                     |
+-----------+-----------------------------+
| cloud.txt | 2022/05/06, Databend, Cloud |
+-----------+-----------------------------+

Snapshot File

Stores the segment information.

File path:

53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json

File content:

{
   "format_version":1,
   "snapshot_id":"efe2687f-d1fc-48f8-b414-b5df2cec1e19",
   "prev_snapshot_id":[
      "6450690b-09c4-4993-9a83-268c49c12bb2",
      1
   ],
   
   "segments":[
      [
         "53/133/_sg/df56e911eb26446b9f8fac5acc65a580_v1.json"
      ],
      [
         "53/133/_sg/d0bff902b98846469480b23c2a8f93d7_v1.json"
      ]
   ]
   ... ...
}

Segment File

Stores block information.

File path:

 53/133/_sg/df56e911eb26446b9f8fac5acc65a580_v1.json

File content:

{
   "format_version":1,
   "blocks":[
      {
         "row_count":1,
         "block_size":76,
         "file_size":360,
         "location":[
            "53/133/_b/ba4a60013e27479e856f739aefeadfaf_v0.parquet",
            0
         ],
         "compression":"Lz4Raw"
      }
   ]
   ... ...
}

Block File

The underlying data of Fuse Engine uses Parquet format, and each file is composed of multiple blocks.

Summary

In the early design period (October 2021) of Databend's Fuse Engine, the requirements were very clear, but the solution selection didn't go smoothly. At that time, the Databend community investigated a large number of Table Format solutions (such as Iceberg) on the market. The challenge was to choose between using an existing solution and building a new one. Finally, we decided to develop a simple and suitable Storage Engine that uses the Parquet standard as the storage format. Fuse Engine stores the Parquet Footer separately to reduce unnecessary Seek operations, and introduces a more flexible indexing mechanism, for example, operations such as Aggregation and Join can have their own indexes for acceleration.

Feel free to deploy Fuse Engine with your object storage to have a different experience on the big data analysis: https://databend.rs/doc/deploy

Databend on GitHub: https://github.com/datafuselabs/databend

Deploying Databend on Your Laptop in Minutes

June 9, 2022 · 3 min read

Eric

Deploying Databend on Your Laptop in Minutes

Deploying a data warehouse sounds like a big job to you? Definitely NOT. Databend can be deployed to your laptop and uses the local file system as storage. You can complete the deployment in a few minutes even if you're new to Databend. Now let's get started!

tip

Databend requires a scalabe storage (for example, object storage) to work. This blog uses local file system to provide you a hands-on experience. Never use a local file system as storage for production purposes.

STEP 1. Downloading Databend

a. Create a folder named databend in the directory /usr/local. Then create the following subfolders in the folder databend:

bin
data
etc
logs

b. Download and extract the latest Databend package for your platform from https://github.com/datafuselabs/databend/releases.

c. Move the extracted files databend-meta and databend-query to the folder /usr/local/databend/bin.

STEP 2. Deploying a Standalone databend-meta

a. Create a file named databend-meta.toml in the folder /usr/local/databend/etc with the following content:

dir = "metadata/_logs"
admin_api_address = "127.0.0.1:8101"
grpc_api_address = "127.0.0.1:9101"

[raft_config]
id = 1
single = true
raft_dir = "metadata/datas"

b. Open a terminal window and navigate to the folder /usr/local/databend/bin.

c. Run the following command to start databend-meta:

./databend-meta -c ../etc/databend-meta.toml > meta.log 2>&1 &

d. Run the following command to check if databend-meta was started successfully:

curl -I  http://127.0.0.1:8101/v1/health

STEP 3. Deploying a Standalone databend-query

a. Create a file named databend-query.toml in the folder /usr/local/databend/etc with the following content:

[log]
level = "INFO"
dir = "benddata/_logs"

[query]
# For admin RESET API.
admin_api_address = "127.0.0.1:8001"

# Metrics.
metric_api_address = "127.0.0.1:7071"

# Cluster flight RPC.
flight_api_address = "127.0.0.1:9091"

# Query MySQL Handler.
mysql_handler_host = "127.0.0.1"
mysql_handler_port = 3307


# Query HTTP Handler.
http_handler_host = "127.0.0.1"
http_handler_port = 8081

tenant_id = "tenant1"
cluster_id = "cluster1"

[meta]
address = "127.0.0.1:9101"
username = "root"
password = "root"

[storage]
# s3
type = "fs"

[storage.fs]
data_path = "benddata/datas"

b. Open a terminal window and navigate to the folder /usr/local/databend/bin.

c. Run the following command to start databend-meta:

./databend-query -c ../etc/databend-query.toml > query.log 2>&1 &

d. Run the following command to check if databend-meta was started successfully:

curl -I  http://127.0.0.1:8001/v1/health

There you go! You have successfully deployed Databend on your computer. If you have a SQL client on your computer, try the steps below to verify the deployment:

a. Create a connection to 127.0.0.1 from your SQL client. In the connection, set the port to 3307, and set the username to root.

b. Run the following commands to check if the query is successful:

CREATE TABLE t1(a int);

INSERT INTO t1 VALUES(1), (2);

SELECT * FROM t1;

Announcing Databend v0.7.0 - Deploy easier, query faster

March 28, 2022 · 2 min read

BohuTANG

Continuing breaking ground, activate your object storage for big data analytics in sub-second!

What's New​

Accepted RFCs :flight_departure:​

Features & Improvements ✨​

Code Refactoring 🎉​

Build/Testing/CI Infra Changes 🔌​

Bug Fixes 🔧​

What's On In Databend​

RFC: Query Result Cache​

How to Write a Scalar / Aggregate Function​

Profile-Guided Optimization​

What's Up Next​

Restructure function-related documentation​

Changelog​

Contributors​

Connect With Us​

Background​

Workflow​

Preparations​

Step-By-Step​

References​

What's New​

Features & Improvements ✨​

Code Refactoring 🎉​

Build/Testing/CI Infra Changes 🔌​

Bug Fixes 🔧​

What's On In Databend​

Why You Should Try Sccache​

What's Up Next​

Try using build-info​

Changelog​

Contributors​

Connect With Us​

Github Action Cache Service Principle​

Query Cache​

Reserve Cache​

Upload Cache​

Create Cache​

rust-cache implementation​

sccache implementation​

Conclusion​

What's New​

Features & Improvements ✨​

Code Refactoring 🎉​

Bug Fixes 🔧​

What's On In Databend​

DML Command - UPDATE​

What's Up Next​

Support Arrow Flight SQL Protocol​

Changelog​

Contributors​

Connect With Us​

Performance comparison​

Brand new type system​

Support for JSONB​

Full support for JOINs​

Support for CBO​

Native Storage Format​

Efficient bloom filter filtering​

Designing and open-sourcing serverless DataSharing protocol​

Stage​

Other improvements​

Download v0.9.0​

Feedback​

What's New​

Features & Improvements ✨​

Code Refactoring 🎉​

Build/Testing/CI Infra Changes 🔌​

Bug Fixes 🔧​

What's On In Databend​

Databend All-in-One Docker Image​

What's Up Next​

Vector Search​

Changelog​

Contributors​

Connect With Us​

What's New​

Features & Improvements ✨​

Code Refactoring 🎉​

Build/Testing/CI Infra Changes 🔌​

Bug Fixes 🔧​

What's New

Accepted RFCs :flight_departure:

Features & Improvements ✨

Code Refactoring 🎉

Build/Testing/CI Infra Changes 🔌

Bug Fixes 🔧

What's On In Databend

RFC: Query Result Cache

How to Write a Scalar / Aggregate Function

Profile-Guided Optimization

What's Up Next

Restructure function-related documentation

Changelog

Contributors

Connect With Us

Background

Workflow

Preparations

Step-By-Step

References

What's New

Features & Improvements ✨

Code Refactoring 🎉

Build/Testing/CI Infra Changes 🔌

Bug Fixes 🔧

What's On In Databend

Why You Should Try Sccache

What's Up Next

Try using `build-info`

Changelog

Contributors

Connect With Us

Github Action Cache Service Principle

Query Cache

Reserve Cache

Upload Cache

Create Cache

rust-cache implementation

sccache implementation

Conclusion

What's New

Features & Improvements ✨

Code Refactoring 🎉

Bug Fixes 🔧

What's On In Databend

DML Command - UPDATE

What's Up Next

Support Arrow Flight SQL Protocol

Changelog

Contributors

Connect With Us

Performance comparison

Brand new type system

Support for JSONB

Full support for JOINs

Support for CBO

Native Storage Format

Efficient bloom filter filtering

Designing and open-sourcing serverless DataSharing protocol

Stage

Other improvements

Download v0.9.0

Feedback

What's New

Features & Improvements ✨

Code Refactoring 🎉

Build/Testing/CI Infra Changes 🔌

Bug Fixes 🔧

What's On In Databend

Databend All-in-One Docker Image

What's Up Next

Vector Search

Changelog

Contributors

Connect With Us

What's New

Features & Improvements ✨

Code Refactoring 🎉

Build/Testing/CI Infra Changes 🔌

Bug Fixes 🔧