Crail is open source and integrates seamlessly with the Apache ecosystem, such as Spark, Flink, Parquet, etc. Crail is based on Crail Store -- a high-performance distributed data store for temporary data -- and a series of modules interfacing with the compute engine. Crail modules provide standard interfaces (e.g. HDFS) and can be loaded transparently at runtime (e.g., Spark shuffle).
Crail is built explictly for user-level I/O (RDMA, NVMef, etc.), allowing storage and networking hardware to directly access I/O memory within the data processing engine. Bypassing OS and JVM during data access enables delivering bare-metal I/O performance to analytics workloads. For example, Crail achieves data access at rates close to the 100Gb/s network limit with latencies below 10 us.
Crail orchestrates I/O operations across different storage tiers including DRAM, flash or GPU memory. Aside from providing fine grained control as to which storage tier is used when storing data, Crail also supports horizontal tiering where higher performing storage resources are filled up across the cluster prior to using lower performing tiers -- making effective use of the storage hardware.
New blog post about Crail’s metadata performance and scalability
Crail features in the FLOSS weekly podcast
New blog post about SparkRDMA and Crail shuffle plugins
Crail on OpenPower discussed by Peter Hofstee on Youtube
DiSNI, the RDMA and NVMe user-level stack used in Crail is now available on Maven Central