About Me

TL; DR

Seasoned Engineer and Technical Leader with a strong background and track record of success at Google, ByteDance, Alibaba and Intel.
Extensive technical expertise in AI/ML solutions, software/hardware co-design, AI/ML & cloud infrastructure, accelerators, embedded systems, memory/storage technologies and computer architecture.
Strong experience in XFN collaboration, communication and working with partners & system integrators.
Ph.D. in Computer Engineering, Certified Google Cloud Architect & TensorFlow Developer.

See my LinkedIn profile for more details.

Things I’ve Worked On

Google (Present)

TPU ASIC Software & Firmware

ByteDance

Applied Research/Infrastructure Lab

Lead the SW/HW co-design projects for accelerating storage and data analytics workloads. Drive XFN effort E2E, including prototyping, ROI estimation, project proposal, architectural design, planning and execution.
Lead the research and exploration of emerging memory/storage technologies, including CXL, disaggregated memory architecture, Persistent Memory/NVRAM, in-memory computing, Neuromorphic computing.
Next-gen infrastructure for AI/ML: As one of the few core members of the task force, I drive the XFN effort for the exploration and planning of the next-gen infra for AI/ML. Provide important recommendations to the C-level management, and help incubate new projects in optimizing data pipelines for AI/ML workloads.

Google

Hardware Partner Service

Led the XFN project for enabling next-gen streaming platform for ecosystem partners.
Worked with Google Product Areas and BD teams with focus on SoC integration, hardware ecosystem scaling, program development and management.
Onboarded partner devices with Google technologies and help partners scale.
Managed internal and external stakeholders and product lifecycle from end to end.
Tool development and automation, solution design, prototyping and technical troubleshooting.
Successfully launched Google Assistant/Chromecast on embedded devices from top partners.
Successfully onboarded top SoC partners with Project Matter and Google Ecosystem.

Edge TPU (Collaboration with Google Brain)

“Moonshot” project of using Machine Learning to improve TPU design and model architecture.
Developed Learned Performance Model, a key component for model architecture exploration.
Developed the end-to-end pipeline for automated performance modeling on TPU.
Work was featured by Google AI and published in CVPR-22 (ECV22) with patents pending.

Alibaba

Storage Innovation with Hardware/Software Co-optimization

Created and led the AliFlash team for storage innovation.
Designed new storage architecture with software/hardware co-design and optimization.
Launched the first production Open Channel SSD in industry (AliFlash V3).
Project management with internal/external collaborations.

Storage Innovation with Intel 3D XPoint Technology

Led the XFN effort of deploying 3D XPoint technology in Alibaba’s infrastructure, making Alibaba one of the earliest adopters of this disruptive technology.

Research and Pathfinding of Innovative Technologies and Solutions

Evaluation of emerging technologies, strategic analysis & recommendations for executives.

Intel

3D XPoint Storage Technology (Optane)

As core developer of this disruptive product, I was responsible for media management, NVMe, Error Recovery/Injection, test automation, media characterization and performance analysis.

NVMe SSD Products

Core developer of P3700 (Intel’s first NVMe SSD) and P4500 (Intel’s first 3D NAND SSD).

Streaming Media

Core developer of streaming driver, firmware and platform software for Intel media SoC.
Bootloader & kernel enablement, performance optimization of the embedded device.

Academic

University of Pittsburgh (PhD, Computer Engineering)

New memory/storage technologies (Phase Change Memory, STT-RAM, Memrister)
Computer architecture
Memory hierarchy, modeling, performance, power

My Publications

Exploring CXL-based KV Cache Storage for LLM Serving, Yupeng Tang, Runxiang Cheng, Ping Zhou, Tongping Liu, Fei Liu, Wei Tang, Kyoungryun Bae, Jianjun Chen, Wu Xiang, Rui Shi, to appear in NeurIPS 2024 Workshop MLforSys, Dec. 2024.
Exploring Performance and Cost Optimization with ASIC-Based CXL Memory, Yupeng Tang, Ping Zhou, Wenhui Zhang, Henry Hu, Qirui Yang, Hao Xiang, Tongping Liu, Jiaxin Shan, Ruoyun Huang, Cheng Zhao, Cheng Chen, Hui Zhang, Fei Liu, Shuai Zhang, Xiaoning Ding, Jianjun Chen, EuroSys 2024 (Best Paper Award Runner-Up), Apr. 2024.
Dynamic storage for adaptive mapping for data compression on a storage device, Ping Zhou, et. al. US Patent US20230273727A1, Aug. 2023.
Space manager for transparent block device compression, Ping Zhou, et. al. US Patent US20230229324A1, Jul. 2023.
Multi-dimensional solid state drive block access, Ping Zhou, et. al. US Patent US20230195345A1, Jun. 2023.
Adaptive mapping for transparent block device level compression, Ping Zhou, et. al. US Patent US20230176734A1, Jun. 2023.
System and method for allocating memory space, Ping Zhou, et. al. US Patent US20230122533A1, Apr. 2023.
Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs, Ping Zhou, et. al, 2022 Conference on Computer Vision and Pattern Recognition (CVPR-2022) ECV22 Workshop
Universal and automatic end-to-end testing of smart TVs, Ping Zhou, et. al, Technical Disclosure Commons (Invention Disclosure), Dec. 2019
Alibaba Open Channel SSD for Next-Generation Data Centers, Ping Zhou, et. al, Flash Memory Summit, Aug. 2018
Throughput Enhancement for Phase Change Memories, Ping Zhou, Bo Zhao, Youtao Zhang, Jun Yang, IEEE Transactions on Computers (TC), DOI: 10.1109/TC.2013.76, Mar. 2013
The Design of Sustainable Wireless Sensor Network Node using Solar Energy and Phase Change Memory, Ping Zhou, Youtao Zhang, Jun Yang, Design, Automation & Test in Europe (DATE), March 1, 2013
Towards Successful Application of Phase Change Memories: Address Challenges from Write Operations, Ping Zhou, PhD Dissertation, 2012
MRAC: A Memristor-based Reconfigurable Framework for Adaptive Cache Replacement, Ping Zhou, Bo Zhao, Youtao Zhang, Jun Yang, Yiran Chen, The 20th International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2011
Fine-Grained QoS Scheduling for DRAM/PCM Hybrid Memory Systems, Ping Zhou, Yu Du, Youtao Zhang, Jun Yang, Non-Volatile Memories Workshop (NVMW), March 2011
Fine-Grained QoS Scheduling for PCM-based Main Memory Systems, Ping Zhou, Yu Du, Youtao Zhang, Jun Yang The 24th IEEE International Parallel & Distributed Processing Symposium (IPDPS-2010), April 2010
Phase Change Technology and the Future of Main Memory, Benjamin Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, Doug Burger, IEEE Micro Top Picks, vol. 30, no. 1, pp. 143-143, February 2010
Energy Reduction for STT-RAM Using Early Write Termination, Ping Zhou, Bo Zhao, Jun Yang, Youtao Zhang, IEEE/ACM 2009 International Conference on Computer-Aided Design (ICCAD-2009), pp. 264-268, November, 2009
A Durable and Energy Efficient Main Memory Using Phase Change Memory Technology, Ping Zhou, Bo Zhao, Jun Yang, Youtao Zhang, The 36th International Symposium on Computer Architecture (ISCA-2009), pp. 14-23, June, 2009. Among the 15 most cited papers in the history of ISCA.
Frequent Value Compression in Packet-based NoC Architectures, Ping Zhou, Bo Zhao, Yu Du, Yi Xu, Youtao Zhang, Jun Yang, Li Zhao, The 14th Asia and South Pacific Design Automation Conference (ASP-DAC 2009), pp. 13-18, January 2009

My Inventions/Patents

Dynamic storage for adaptive mapping for data compression on a storage device (US20230273727A1)
Space manager for transparent block device compression (US20230229324A1)
Multi-dimensional solid state drive block access (US20230195345A1)
Adaptive mapping for transparent block device level compression (US20230176734A1)
System and method for allocating memory space (US20230122533A1)
NEURAL NETWORK ARCHITECTURE FOR IMPLEMENTING GROUP CONVOLUTIONS (WO2023059336A1)
HARDWARE ACCELERATOR OPTIMIZED NEURAL NETWORK MODELS USING GROUP CONVOLUTIONS (WO2023059335A1)
Universal and automatic end-to-end testing of smart TVs (defensive publication)
SYSTEM AND METHOD FOR FLASH STORAGE MANAGEMENT USING MULTIPLE OPEN PAGE STRIPES (US 20210034301 A1)
SYSTEM AND METHOD FOR OPTIMIZATION OF GLOBAL DATA PLACEMENT TO MITIGATE WEAR-OUT OF WRITE CACHE AND NAND FLASH (US 20200159419 A1)
COLLABORATIVE COMPRESSION IN A DISTRIBUTED STORAGE SYSTEM, (US 20200042500 A1)
METHOD AND SYSTEM FOR FACILITATING ATOMICITY ASSURANCE ON METADATA AND DATA BUNDLED STORAGE (US 20200034079 A1)
RAPID SIDE-CHANNEL ACCESS TO STORAGE DEVICES (US 20190347204 A1)
METHOD AND SYSTEM FOR DATA DESTRUCTION IN A PHASE CHANGE MEMORY-BASED STORAGE DEVICE (US 20190087587 A1)
METHOD AND SYSTEM FOR ACTIVE PERSISTENT STORAGE VIA A MEMORY BUS (US 20190073132 A1)
METHOD AND SYSTEM FOR MITIGATING WRITE AMPLIFICATION IN A PHASE CHANGE MEMORY-BASED STORAGE DEVICE (US 20190012111 A1)
SYSTEM AND METHOD FOR FINE-GRAINED POWER CONTROL MANAGEMENT IN A HIGH CAPACITY COMPUTER CLUSTER (US 20180364795 A1)
METHOD AND SYSTEM FOR IMPLEMENTING BYTE-ALTERABLE WRITE CACHE (US 20180349041 A1)
HIGH-VOLUME, LOW-LATENCY DATA PROCESSING IN FLEXIBLY CONFIGURED LOCAL HETEROGENEOUS COMPUTING ENVIRONMENTS (US 20180329632 A1)
PERSISTENT MEMORY FOR KEY-VALUE STORAGE (US 20180307620 A1)

Things I’m Interested In

AI/ML stuff
Quantum computing
Cloud computing & infrastructure
Linux, Emacs, Lisp, Python, C/C++, Go & other programming stuff
Astronomy, physics, math
Embedded systems, smart devices, IoT
Emerging memory & storage technologies (PCM, STT-RAM, ReRAM, ….)
In-memory computing & Neuromorphic computing
(and many more…)