硅谷信息港

 找回密码
 立即注册
查看: 5929|回复: 0
打印 上一主题 下一主题

8/25 Tachyon meetup

[复制链接]

557

主题

736

帖子

4491

积分

超级版主

Rank: 8Rank: 8

积分
4491
跳转到指定楼层
楼主
发表于 2014-8-21 09:55:23 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式



  • Monday, August 25, 2014
    7:00 PM

  • ClassRoom 9/10 Building E, Yahoo!
    700 1st Avenue, Sunnyvale, CA (map)
  • Welcome to the first Tachyon meetup! This will be a chance to learn about Tachyon from the developers, hear about other peoples’ experiences with Tachyon, network, and get to know future development plans.
    Thanks to Yahoo! for hosting this event and providing food and drinks.
    Abstract:
    Memory is the key to fast Big Data processing. This has been realized by many, and frameworks such as Spark and Shark already leverage memory performance. As data sets continue to grow, storage is increasingly becoming a critical bottleneck in many workloads.
    To address this need, we have developed Tachyon, a memory centric fault-tolerant distributed file system, which enables reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. The result of over two years of research, Tachyon achieves memory-speed and fault-tolerance by using memory aggressively and leveraging lineage information. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that are frequently read.
    Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without any code changes. Tachyon is the default off-heap option in Spark, which means that RDDs can automatically be stored inside Tachyon to make Spark more resilient and avoid GC overheads. The project is open source and is already deployed at multiple companies. In addition, Tachyon has more than 40 contributors from over 15 institutions, including Yahoo, Intel, Redhat, and Pivotal. The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the Fedora distribution.
    In this meetup, Haoyuan Li will give a overview of the project, including motivation, current status, and its roadmap. In addition, we will have a Tachyon tutorial.
    Bio:
    Haoyuan Li is a Computer Science Ph.D. candidate in AMPLab at UC Berkeley, and he works with Prof. Scott Shenker and Prof. Ion Stoica on big data and cloud computing. He leads Tachyon, an open source memory-centric distributed file system enabling reliable file sharing at memory-speed across cluster frameworks. He is a founding committer of Apache Spark and a co-creator of Spark Streaming. Before Berkeley, he worked at Conviva and Google, where he co-created PFP-Growth algorithm, which is included in Apache Mahout. Haoyuan has a M.S. from Cornell University and a B.S. from Peking University, both in Computer Science.


回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

硅谷信息港bay123.com ( 鲁ICP )

GMT-8, 2024-5-4 23:20

@2013-2015 BAY123.com

快速回复 返回顶部 返回列表