-
Efficiency and Scalability of Multi-Lane Capsule Networks (MLCN)
CIENCIA 2019 Encontro com a Ciencia e Tecnologia em Portugal
[PDF]
-
ASR Automatic Speech Recognition for European Portuguese with the Kaldi Framework
[PDF]
CIENCIA-2018, Encontro com a Ciencia e Tecnologia em Portugal
-
Microarchitecture, Computing Systems, High Performance Computing and End-to-End Deep Neural Networks (Keynote)
WSCAD-2018, XIX Simposium em Sistemas Computacionais de Alto Desempenho, 3/10/2018
Sao Paulo, Brazil
[PDF]
-
E2eML: High Performance, Power Efficient Application of Machine Learning Systems
NII Shonan Meeeting Seminar 134, 08/20/2018 http://shonan.nii.ac.jp/seminar/134/
[PDF]
-
AMD’s Open Compute and Open Source Cross Platform Solutions for Machine Learning
Invited Lecture
Deep Learning Tools and Methods Workshop, IDIAP, EPFL Martigny, 2016
https://www.idiap.ch/workshop/dltm/front-page
[talk]
-
Optimizing Big Data Analytics on Heterogeneous Processors
M. Daga, J. Gu, M. Breternitz
Tutorial, 2015 IEEE Conference on Big Data
[PDF]
- 10,558,466 System and method for parallelization of data processing in a processor
[link]
- 10,318,340 NVRAM-aware data processing system
- 10,318,153 Techniques for changing management modes of multilevel memory hierarchy
- 10,271,008 Enhanced resolution video and security via machine learning
- 10,198,349 Programming in-memory accelerators to improve the efficiency of datacenter operations
- 10,089,155 Power aware work stealing [link]
- 10,067,709 Page migration acceleration using a two-level bloom filter on high bandwidth memory systems
- 10,019,365 Adaptive value range profiling for enhanced system performance
- 9,817,644 Apparatus, method, and system for providing a decision mechanism for conditional commits in an atomic region
- 9,766,936 Selecting a resource from a set of resources for performing an operation
- 9,658,895 System and method for configuring boot-time parameters of nodes of a cloud computing system
- 9,639,140 Power management of interactive workloads driven by direct and indirect user feedback
- 9,479,449 Workload partitioning among heterogeneous processing nodes
- 9,274,585 Combined dynamic and static power and performance optimization on data centers
- 9,262,231 System and method for modifying a hardware configuration of a cloud computing system
- 9,251,069 Mechanisms to bound the presence of cache blocks with specific properties in caches
- 9,223,714 Instruction boundary prediction for variable length instruction set
- 9,183,055 Selecting a resource from a set of resources for performing an operation
- 9,170,854 Thread assignment for power and performance efficiency using multiple power states
- 9,152,601 Power-efficient nested map-reduce execution on a cloud of heterogeneous accelerated processing units
- 9,152,532 System and method for configuring a cloud computing system with a synthetic test workload
- 9,146,846 Programmable physical address mapping for memory
- 9,146,844 Apparatus, method, and system for providing a decision mechanism for conditional commits in an atomic region
- 9,116,703 Semi-static power and performance optimization of data centers
- 8,935,472 Processing device with independently activatable working memory bank and methods
- 8,929,220 Processing system using virtual network interface controller addressing as flow control metadata
- 8,887,056 System and method for configuring cloud computing systems
- 8,782,645 Automatic load balancing for heterogeneous cores
- 8,738,877 Processor with garbage-collection based classification of memory
- 8,683,468 Automatic kernel migration for heterogeneous cores
- 8,549,504 Apparatus, method, and system for providing a decision mechanism for conditional commits in an atomic region
- 8,146,106 On-demand emulation via user-level exception handling
- 8,099,587 Compressing and accessing a microcode ROM
- 7,840,953 Method and system for reducing program code size
- 7,757,221 Apparatus and method for dynamic binary translator to support precise exceptions with minimal optimization constraints
- 7,725,887 Method and system for reducing program code size
- 7,694,281 Two-pass MRET trace selection for dynamic optimization
- 7,620,781 Efficient Bloom filter
- 7,451,121 Genetic algorithm for microcode compression
- 7,430,574 Efficient execution and emulation of bit scan operations
- 7,428,731 Continuous trip count profiling for loop optimizations in two-phase dynamic binary translators
- 7,095,342 Compressing microcode
- 6,823,070 Method for key escrow in a communication system and apparatus therefor
- 6,523,095 Method and data processing system for using quick decode instructions
- 6,484,228 Method and apparatus for data compression and decompression for a data processor system
- 6,381,739 Method and apparatus for hierarchical restructuring of computer code
- 6,343,354 Method and apparatus for compression, decompression, and execution of program code
- 6,216,213 Method and apparatus for compression, decompression, and execution of program code
- 6,044,220 Method and apparatus for operating a data processor to execute software written using a foreign instruction set
- 5,966,143 Data allocation into multiple memories for concurrent access
- 5,889,999 Method and apparatus for sequencing computer instruction execution in a data processing system
- 5,805,895 Method and apparatus for code translation optimization
- 5,737,576 Method and system for efficient instruction execution in a data processing system having multiple prefetch units
- 5,659,699 Method and system for managing cache memory utilizing multiple hash functions
- 5,634,025 Method and system for efficiently fetching variable-width instructions in a data processing system having multiple prefetch units
- 5,537,620 Redundant load elimination on optimizing compilers
Plus 55 more U.S. patents pending