Avatar

Ngo Huy Cu

System Technology R&D Engineer

Kioxia

Biography

Hi, welcome to my homepage!

Ngo Huy Cu received the B.Eng. degree in electrical and electronic engineering and the M.Eng. degree in physical electronics from the Tokyo Institute of Technology, Tokyo, Japan, in 2015, and 2017, respectively. He was involved in mixed-signal, and synthesizable digital phase-locked loop (PLL) designs. His master thesis focused on designing synthesizable fractional-N injection-locked PLL using digital to time converter (DTC).
In 2017, he joined the Device Technology Laboratories, NTT Corporation, Atsugi, Japan, where he was involved in the research of deep learning accelerator using FPGA. In 2019, he joined Institute of Memory Technology Research & Development, Kioxia Corporation (formerly Toshiba Memory Corporation), Kawasaki, Japan, where he is involved in the design of analog mixed-signal circuits and architectures for advanced high-speed wireline communication. His current interests include high-speed wireline transceivers, high-speed low-power analog-to-digital converter, and efficient hardware accelerator for deep learning applications.
Mr. Ngo was a recipient of the Japanese Government (MEXT) Scholarship from 2009 to 2017.

Interests

  • Mixed-signal integrated circuit design
  • High-speed wireline transceiver
  • High-speed,low-power analog-to-digital converter
  • All-digital phase-locked loop
  • Deep learning hardware accelerator

Education

  • M.E. in physical electronics, 2017

    Tokyo Institute of Technology

  • B.E. in electrical and electronic engineering, 2015

    Tokyo Institute of Technology

  • A.E. in Electronic Control Engineering, 2013

    Kagoshima College of Technology

Experience

 
 
 
 
 

R&D Engineer

Kioxia

Jul 2019 – Present Japan

Working on high-speed wireline transceiver. Responsibilities include:

  • Researching
  • Designing
 
 
 
 
 

Researcher

NTT

Apr 2017 – Jun 2019 Japan

Worked on deep learning accelerator using FPGA. Responsibilities include:

  • Researching
  • Designing
 
 
 
 
 

Research Asistant

Tokyo Institute of Technology

Apr 2014 – Mar 2017 Japan

Researched on

  • Synthesizable analog circuits
  • Synthesizable PLL
  • All-digital PLL/Injection-Locked PLL

Accomplish­ments

TOEIC-905

Deep Learning Specialization

See certificate

Structuring Machine Learning Projects

See certificate

Sequence Models

See certificate

Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

See certificate

Convolutional Neural Networks

See certificate

Neural Networks and Deep Learning

See certificate

JLPT N1

Recent Posts

Writing technical content in Academic

Academic is designed to give technical content creators a seamless experience. You can focus on the content and Academic handles the rest. Highlight your code snippets, take notes on math classes, and draw diagrams from textual representation.

Recent Publications

Quickly discover relevant content by filtering publications.

A Fully Synthesizable Fractional-N MDLL With Zero-Order Interpolation-Based DTC Nonlinearity Calibration and Two-Step Hybrid Phase Offset Calibration

In this paper, a fully-synthesizable digital-to-time (DTC)-based fractional-N multiplying delay-locked loop,(MDLL) is presented. The fractional spur is less than -59.0 dBc, and the reference spur is -64.5 dBc. The power consumptions are 1.85 mW and 1.22 mW, corresponding to figures of merit,(FOM) of -240.4 dB and -245.5 dB.

Distributed Deep Learning with FPGA Ring Allreduce

In this work, we propose a new In-Network Computing system that can support Ring Allreduce. In order to minimize communication overhead, we apply layer-based computing/communication overlap and optimize it for our proposed In-Network Computing system.

A 0.4-ps-jitter− 52-dBc-spur synthesizable injection-locked PLL with self-clocked nonoverlap update and slope-balanced subsampling BBPD

In this letter, a fully synthesizable injection-locked phase-locked loop (IL-PLL) is presented.The PLL achieved a 0.4-ps integrated jitter at 1-GHz output frequency with −52-dBc reference spur. The power consumptions are 1.2 mW, corresponding to figures of merit of −247.2 dB.

Large-Message Size Allreduce at Wire Speed for Distributed Deep Learning

To reduce the latency, we devised a dataflow architecture with an Allreduce-specific hardware accelerator that performs data aggregation and reduction while data is being transferred. The accelerator is designed to immediately start Allreduce operation before an entire message is recived.

A Sub-mW Fractional- ADPLL With FOM of −246 dB for IoT Applications

This paper presents a sub-mW fractional-N all-digital phase-locked loop (ADPLL) with scalable power consumption, which achieves an figure of merit (FOM) of -246 dB.

Contact