- 3D Gaussian Splatting(1)
- 3D NAND Flash(1)
- 3D accelerator(1)
- 3D reconstruction(1)
- 3D spatial computing(1)
- 3D-Stacked DRAM(1)
- 3D-Stacked Memory(1)
- 3D-stacked DRAM(1)
- 3D-stacked-memory(1)
- 4-bit matrix multiplication(1)
- AESPA(1)
- AI(2)
- AI Accelerator(1)
- AI accelerator(2)
- AI processor(1)
- AR/VR(1)
- ARMv8-A(1)
- ASPLOS 2025(1)
- Accelerator(5)
- Accelerator-in-Memory(1)
- Activation Compression(1)
- Address Translation(1)
- Algorithm-Hardware Co-Design(1)
- Approximate Computing(1)
- Approximate Nearest Neighbor(1)
- Arbitration(1)
- Ascend NPU(1)
- Associative Processor(1)
- AsyncDIMM(1)
- Atomic Operations(1)
- Atomic Regions(1)
- Auto-Tuning(1)
- Autoscaling(1)
- BM1684X(1)
- Bandwidth Utilization(1)
- Bank-Level Parallelism(1)
- Big Data(1)
- Bit-Serial-SIMD(1)
- Bit-serial SIMD PUD(1)
- Bit-slice Architecture(1)
- BitNet(1)
- Bitwise Operation(1)
- Bitwise Operations(1)
- Bitwise-Operations(1)
- Block Floating Point(1)
- Brain-Computer Interface(1)
- Branch Prediction(1)
- BreakHammer(1)
- Bulk Data Copy(1)
- Bulk bitwise operations(1)
- CARS(1)
- CGRA(1)
- CKKS-TFHE(1)
- CNN(1)
- CNN accelerator(2)
- CNN training accelerator(1)
- CNN/DNN Accelerator(1)
- CNN_accelerator(1)
- CPU Kernel(1)
- CPU Offloading(1)
- CUDA Graph(1)
- CUDA VMM(1)
- CXL(7)
- Cache Coherence(1)
- Cache Tiling(1)
- Cacheline Locking(1)
- Cambricon-C(1)
- Chiplet(2)
- Cloud(1)
- Cloud Storage(1)
- Code Generation(1)
- Coherence(1)
- Collective-Communication(1)
- Command Processor(1)
- Comparator-based_Neural_Network(1)
- Compiler(1)
- Compiler Framework(1)
- Compiler Optimization(1)
- Composition-of-Experts(1)
- Compute-in-Memory(1)
- Computing-in-Memory(2)
- Concurrency Control(1)
- Consistency(1)
- Continuous Batching(1)
- Cost-Optimization(1)
- D-RaNGe(1)
- DDR DRAM(1)
- DDR5(2)
- DIMM-Link(1)
- DIMM-NMP(1)
- DLRM(1)
- DMA Descriptor(1)
- DNN Accelerator(1)
- DNN compiler(1)
- DNN compression(1)
- DNN systems(1)
- DNN training(1)
- DNN-accelerator(1)
- DRAM(28)
- DRAM Cache(1)
- DRAM PIM(7)
- DRAM mapping(1)
- DRAM-Cache(1)
- DRAM-PIM(2)
- DRAM-Throughput(1)
- DRAM-based FPGA(1)
- DRAM↔PIM data transfer(1)
- DVFS(2)
- Data Encoding(1)
- Data-Movement(1)
- Data-Parallel Processor(1)
- Datacenter accelerators(1)
- Dataflow(1)
- Dataflow Architecture(1)
- Deep Learning(1)
- Dense Prediction(1)
- Die-Stacked-DRAM(1)
- Diffusion Model(1)
- Disaggregated Memory(1)
- Distributed Caching(1)
- Distributed Systems(1)
- Distributed Training(1)
- Domain-Specific Architecture(1)
- Domain-Specific-Architecture(1)
- Domain-Wall Memory(1)
- Domain-wall Logic(1)
- Drift Detection(1)
- Dynamic Data Structure(1)
- Dynamic Memory Management(1)
- Dynamic Partial Reconfiguration(1)
- Dynamic Scheduling(1)
- Dynamic Sparsity(1)
- ECC(1)
- EDA(1)
- ENNS(1)
- Early Exit(1)
- Edge AI(3)
- Edge Deployment(1)
- Edge inference(2)
- Efficient Inference(1)
- Energy Efficiency(3)
- Energy-efficient architecture(1)
- Error-Correcting-Code(1)
- Event-based HAR(1)
- Execution scheduling(1)
- FFT(1)
- FHE(1)
- FP-INT_GEMM(1)
- FPGA(10)
- FaaS(1)
- Fibonacci-coding(1)
- Fine-grained Activation(1)
- Fine-grained memory access(1)
- Fine-grained-DRAM(1)
- FlashAttention(1)
- Floating-Point(1)
- Floating-Point Data(1)
- Formal Methods(1)
- Fully Homomorphic Encryption(1)
- GCN(1)
- GDMA(1)
- GPGPU(1)
- GPGPU simulation(1)
- GPU(9)
- GPU Architecture(1)
- GPU Cluster(1)
- GPU Compression(1)
- GPU Inference(1)
- GPU Memory(1)
- GPU Memory Management(1)
- GPU Optimization(1)
- GPU Performance Forecasting(1)
- GPU enclave(1)
- GPU sharing(1)
- GPU synchronization(1)
- GPU training systems(1)
- GhostMinion(1)
- Graph Analytics(1)
- Graph Computing(1)
- Graph Neural Network(1)
- Graph Neural Network Accelerator(1)
- Graph Processing(3)
- HBM(2)
- HBM2(1)
- HBM3(1)
- HLS(1)
- Hadamard transform(1)
- Halide(1)
- Hardware(1)
- Hardware Accelerator(5)
- Hardware Architecture(1)
- Hardware Security(1)
- Hardware Transactional Memory(2)
- Hardware-Software Co-Design(1)
- Hardware/Software Co-Design(1)
- Hash Table(1)
- Hashed Page Table(1)
- Heterogeneous Architecture(2)
- Heterogeneous Computing(1)
- Heterogeneous Memory(1)
- Hierarchical Backbone(1)
- Hierarchical Search(1)
- High-Level Synthesis (HLS)(1)
- Hybrid Accelerator Design(1)
- Hybrid Memory Cube(3)
- Hybrid Signaling(1)
- Hybrid-Memory-System(1)
- Hypergraph Neural Network(1)
- ILP(1)
- IO-aware kernels(1)
- Image Classification(1)
- Image Processing(1)
- In-Cache Computing(1)
- In-DRAM Computing(1)
- In-Flash Computing(1)
- In-Flash-Processing(1)
- In-Memory Computing(3)
- In-Order Processor(2)
- In-Situ Accelerator(1)
- In-band ECC(1)
- In-situ-accelerator(1)
- Inter-DIMM Communication(1)
- Interconnection-Network(1)
- Job Scheduling(1)
- KV Cache(1)
- KV Cache Compression(2)
- KV cache(5)
- KV cache compression(8)
- KV cache eviction(1)
- KV cache quantization(2)
- KV-Cache Compression(1)
- KV-cache compression(2)
- KVCache compression(1)
- Kernel Fusion(1)
- LLM(5)
- LLM Inference(12)
- LLM Systems(1)
- LLM Training(1)
- LLM acceleration(1)
- LLM compression(1)
- LLM inference(9)
- LLM quantization(3)
- LLM serving(4)
- LLM-inference(1)
- LLM_inference(1)
- LRDIMM(2)
- LUT Accelerator(1)
- LUT accelerator(1)
- LUT-NN(1)
- LUT_accelerator(1)
- LUT_based_multiplication(1)
- Large Language Model(2)
- Large Language Models(1)
- Large-Scale Pretraining(1)
- Locality(1)
- Logarithmic Number System(1)
- Logic-PIM(1)
- Long-context LLM(1)
- Lookup Table(1)
- Lookup Tables(1)
- Lossless Compression(1)
- Low-Bit Inference(1)
- Low-Power Interface(1)
- Low-Rank Approximation(1)
- Low-Rank Projection(1)
- Low-bit Quantization(1)
- Low-power Memory(1)
- Low-rank Approximation(1)
- MAGIC logic(1)
- MCM(1)
- MICRO 2022(1)
- MICRO 2024(1)
- MILP(1)
- MIMD(1)
- ML checkpointing(1)
- MLLM(1)
- MLOps(1)
- MPU-Sim(1)
- Machine Learning Inference(1)
- Mamba(2)
- MappingOptimization(1)
- Matrix Computation(1)
- Matrix Multiplication(2)
- Matrix-Multiplication(1)
- Matryoshka training(1)
- Max/Min Search(1)
- Memory(1)
- Memory Controller(1)
- Memory Expander(1)
- Memory Hierarchy(1)
- Memory Pooling(1)
- Memory Reliability(1)
- Memory Tiering(1)
- Memory management unit(1)
- Memory-Level Parallelism(1)
- Memory-Management-Unit(1)
- Memory-Wall(1)
- Memory_Model(1)
- Memristive CIM(1)
- Memristor(1)
- Message Passing(1)
- Microarchitecture(2)
- Microarchitecture Security(1)
- MiniCPM-V(1)
- Mixture-of-Experts(1)
- MoE(2)
- Mobile ML(1)
- MosaicCPU(1)
- MosaicScheduler(1)
- Multi-Instance(1)
- Multi-Tenant Storage(1)
- Multi-chip(1)
- Multicore(1)
- Multilingual(1)
- N:M pruning(1)
- NAND-Flash(1)
- NAND-Flash-Controller(1)
- NDP(1)
- NPU(4)
- NTT(1)
- NUMA(2)
- NVM crossbar(1)
- NeRF(2)
- Near-DRAM Acceleration(1)
- Near-DRAM Processing(1)
- Near-Data Processing(4)
- Near-Data-Processing(1)
- Near-Memory Computing(1)
- Near-Memory Processing(4)
- Near-Memory-Processing(1)
- Near-bank(1)
- Near-bank computing(1)
- Network-on-Chip(1)
- Neu10(1)
- NeuISA(1)
- NeuSight(1)
- NeurIPS 2024(1)
- Neural Network Acceleration(1)
- Neural Network Accelerator(2)
- Neural rendering(1)
- Neural-Network-Inference(1)
- Neuromorphic Computing(1)
- Neuromorphic Processor(1)
- Non-Volatile Memory(1)
- Non-blocking Miss Handling(1)
- Nonvolatile-Memory(1)
- Normalized Effective Rank(1)
- OCR(1)
- On-Device LLMs(1)
- On-device AI(1)
- OpenSSD(1)
- Operand Collector(1)
- PCIe(1)
- PCN Accelerator(1)
- PF-DRAM(1)
- PIM(11)
- PIM accelerator(1)
- PIM-enabled Instructions(1)
- PIVOT(1)
- PRAM(1)
- PVT Variation(1)
- Page Table(1)
- Page-Table-based(1)
- PagedAttention(1)
- Parallelism(2)
- Performance Modeling(1)
- Performance debugging(1)
- Point Cloud(1)
- Pointer Traversal(1)
- Polymorphic ECC(1)
- Power Modeling(1)
- Power-Management(1)
- Power-Modeling(2)
- Prefetching(1)
- Prefill(1)
- Processing-In-Memory(2)
- Processing-Using-DRAM(1)
- Processing-Using-Memory(1)
- Processing-in-DRAM(1)
- Processing-in-Memory(36)
- Processing-in-memory(2)
- Processing-using-DRAM(2)
- Programmable Accelerator(1)
- Programmable Switch(1)
- PuM(1)
- PyPIM(1)
- Python Tensor Library(1)
- QAT(1)
- QoS(1)
- Quantization(6)
- Quantum Computing(1)
- RAG(2)
- RISC-V(3)
- RISC-V Vector(1)
- ROC(1)
- RRAM PIM(1)
- RTL Verification(1)
- Racetrack Memory(2)
- Range Lock(1)
- Ray Tracing(2)
- ReRAM(3)
- ReRAM PIM(2)
- Real-Time Systems(1)
- Recommendation System(1)
- Reconfigurable logic(1)
- Reconfigurable-Dataflow-Unit(1)
- Register File(1)
- Register-based Addressing(1)
- Reinforcement Learning(1)
- Reliability(1)
- Resource Management(1)
- Resource Partitioning(1)
- Resource-constrained systems(1)
- Retention Time(1)
- RoPE(1)
- Roofline(1)
- Roofline Model(1)
- Root Cause Analysis(1)
- RowClone(2)
- RowHammer(2)
- Runahead(1)
- SLAM(2)
- SN40L(1)
- SNN-Accelerator(1)
- SOT-MRAM(1)
- SPASM(1)
- SRAM CIM(1)
- SRAM-CIM(2)
- SSD Virtualization(1)
- SSD-Architecture(1)
- SSM(1)
- Scheduling(2)
- ScopeAdvice(1)
- Secure Speculation(1)
- Security(1)
- Server CPU(1)
- Serverless(1)
- Serverless Computing(1)
- Shifted Window(1)
- Side-Channel Attack(1)
- Simulation(1)
- Simulator(3)
- Slice-level Sparsity(1)
- SoC(1)
- Software Prefetching(1)
- Software Transactional Memory(1)
- Software-Defined-Storage(1)
- SpMM(1)
- SpMV(3)
- SpTRSV(1)
- SpaceA(1)
- Sparse Accelerator(1)
- Sparse Attention(1)
- Sparse Data Structures(1)
- Sparse Embedding Similarity(1)
- Sparse Matrix(3)
- Sparse Tensor Algebra(1)
- Sparse attention(1)
- Sparse-Matrix(1)
- Sparsity(1)
- Speculative Lock Elision(1)
- Speculative Value Forwarding(1)
- Spike-Driven Processing(1)
- Spiking Neural Network(2)
- Spiking Neural Networks(1)
- Spiking-Neural-Networks(1)
- Spintronic(1)
- SplitSync(1)
- Stage-Customization(1)
- State Space Model(1)
- Static Analysis(1)
- Stiefel Manifold(1)
- Stochastic Computing(1)
- Storage Optimization(1)
- Stream-based DRAM Cache(1)
- Subarray-Level Parallelism(1)
- Synchronization(3)
- Systems(1)
- Systolic Array(3)
- Systolic-Array(1)
- TAGE(1)
- TLB(3)
- TPU(2)
- TYR(1)
- Table Lookup(1)
- Tags(1)
- Temporal Parallelism(1)
- Temporal Similarity(1)
- Tensor compiler(1)
- Ternary weight network(1)
- Tesseract(1)
- Test-Time Adaptation(1)
- Thermal Management(1)
- Time-Domain Interface(1)
- Token Merging(1)
- Top-K SpMV(1)
- Training and Inference(1)
- Transformer(2)
- Transformer Accelerator(2)
- Transformer Models(1)
- Tree Traversal(1)
- Triple-row activation(1)
- UPMEM(5)
- Unified Virtual Memory(1)
- VM scheduling(1)
- VMM latency optimization(1)
- Value Prediction(1)
- Vandermonde(1)
- Variational Quantum Algorithm(1)
- Vector Database(1)
- Vector Quantization(3)
- Vector-Similarity-Search(1)
- Virtual Memory(2)
- Vision Transformer(3)
- Visual Encoding(1)
- Wear-Leveling(1)
- Weight-only_quantization(1)
- accelerator(2)
- active message(1)
- actor model(1)
- address translation(1)
- all-SRAM accelerator(1)
- approximate computing(1)
- associative accelerator(1)
- attention accelerator(1)
- attention quantization(1)
- auto-decomposition(1)
- bank-level parallelism(1)
- binary attention(1)
- bit-pipelining(1)
- bit-serial architecture(1)
- branch prediction(2)
- cache hierarchy(1)
- cache indexing(1)
- cache replacement(1)
- cache side-channel(1)
- chiplet(1)
- cloud oversubscription(1)
- cloud-platform(1)
- clustering(1)
- code deformation(1)
- cold start optimization(1)
- collective communication(2)
- commodity GPUs(1)
- communication scheduling(1)
- compiler optimization(1)
- computation-in-memory(1)
- computer-architecture-simulation(1)
- computing-in-memory(1)
- confidential computing(1)
- content addressable memory(1)
- coupled quantization(1)
- cross-point RAM(1)
- database accelerator(1)
- datacenter networking(1)
- debugging-tool(1)
- decoding speed(1)
- deep learning hardware(1)
- deep learning systems(1)
- die-stacked DRAM(1)
- differentiable KMeans(1)
- diffusion LLM(1)
- diffusion transformer(1)
- direct-attached accelerators(1)
- distributed deep learning(1)
- distributed machine learning(1)
- distributed on-chip memory(1)
- distributed training(1)
- dynamic defects(1)
- eDRAM(1)
- eDRAM/eNVM Accelerator(1)
- early termination(1)
- edge inference(2)
- einsum cascade(1)
- embedding model training(1)
- energy efficiency(1)
- entropy(1)
- fault tolerance(1)
- formal verification(1)
- function calls(1)
- gem5(1)
- geo-distributed inference(1)
- ghost arbitration(1)
- graph pattern mining(1)
- graph scheduling(1)
- graph-analytics(1)
- hardware accelerator(1)
- hardware compression(1)
- hardware generation(1)
- hardware-accelerator(1)
- hardware-software co-design(2)
- hardware-software codesign(1)
- helper threads(1)
- heterogeneous GPU(1)
- heterogeneous memory architecture(1)
- hot page management(1)
- hybrid fidelity(1)
- hypercube(1)
- hyperdimensional computing(1)
- image projection accelerator(1)
- in-DRAM PIM(1)
- in-memory computing(1)
- incremental SVD(1)
- inference optimization(1)
- inferential statistics(1)
- inner product estimation(1)
- input-stationary dataflow(1)
- inter-DIMM-broadcast(1)
- issue queue(1)
- k-means(1)
- kernel scheduling(1)
- large language models(3)
- last-level cache(1)
- latency-critical data center(1)
- lattice codebook(1)
- leakage contracts(1)
- long-context LLM(4)
- long-sequence-modeling(1)
- lookup table(1)
- low-bit LLM(1)
- low-bit inference(1)
- low-precision quantization(1)
- low-rank approximation(4)
- low-rank attention(1)
- low-rank decomposition(1)
- low-rank projection(1)
- mMPU(1)
- max-flow(1)
- memory bandwidth partitioning(1)
- memory management(1)
- memory oversubscription(1)
- memory-centric architecture(1)
- memory-security(1)
- memory-system(1)
- memory_maintenance(1)
- memristor(2)
- microarchitectural side channel(1)
- microarchitecture(4)
- mixed-precision(1)
- mixed-signal accelerator(1)
- mobile inference(1)
- mpGEMM(1)
- multi-GPU communication(1)
- multi-camera system(1)
- multi-model scheduling(1)
- near-data computing(1)
- near-data-processing(1)
- near-memory computing(1)
- near-memory-processing(1)
- nearest neighbor search(1)
- network stack(1)
- network topology(1)
- network-on-chip(1)
- neuromorphic(1)
- normalized effective rank(1)
- on-device inference(1)
- out-of-order core(1)
- performance isolation(1)
- persistent memory(1)
- pipeline parallelism(1)
- pipelining(1)
- post-training quantization(3)
- precomputation(1)
- prefill(1)
- processing-in-memory(1)
- processing-using-memory(2)
- python(2)
- quantization(1)
- quantum error correction(1)
- real-time-monitoring(1)
- register stack(1)
- reinforcement learning(1)
- resistive memory(1)
- resource allocation(1)
- resource management(1)
- secure prefetching(1)
- self-attention(1)
- server workloads(1)
- serverless LLM inference(1)
- side-channel attack(1)
- simulator(1)
- sparse acceleration(1)
- sparse attention(1)
- sparse iterative solver(1)
- spatial accelerator(1)
- spatial architecture(1)
- spiking neural network(1)
- spiking neural networks(1)
- spot VMs(1)
- stall analysis(1)
- structured sparsity(1)
- subarray-level parallelism(1)
- surface code(1)
- system optimization(1)
- systolic array(1)
- task scheduling(1)
- ternary quantization(1)
- throttling(1)
- throughput(1)
- token editing(1)
- uPIMulator(1)
- uv(2)
- vector quantization(5)
- virtual memory(3)
- virtualization(1)
- vision transformer(1)
- wakeup logic(1)
- weight clustering(1)
- weight-only quantization(1)
- 以查代算(1)
- 端侧大模型(1)
- 统一内存(2)
- 量化(1)