Expand
-
Collapse
CS 470
Multiprocessing
Explicit vs. implicit
Threads vs. processes
Thread safety
Non-determinism
Race conditions
Deadlock
Synchronization
Mutual exclusion
Locks
Semaphores
Conditions
Monitors
Barriers
OpenMP
Pragmas
parallel
for
task
single
master
critical
barrier
Clauses
default/private/shared
reduction
schedule
collapse
firstprivate/lastprivate
nowait
Functions
omp_get_wtime
omp_get_num_threads
omp_get_max_threads
Loop-carried dependencies
Teams and parallel regions
Locks
Parallel languages
Productivity vs. performance
Partitioned global address spaces (PGAS)
High-Performance Fortran
CAF and UPC
X10 and Fortress
Chapel
Python and Julia
Performance analysis
Speedup and efficiency
Amdahl's and Gustafson's laws
Linear speedup
Critical path analysis
Temporal vs. spatial locality
Weak vs. strong scaling
CPU time vs. wall time
Communication overhead
Bandwidth
Latency
Bisection
Contention
Energy usage
Energy (work) vs. power (rate)
Power caps
Dynamic voltage frequency scaling
Analysis tools
Debuggers
Profilers
Hardware counters
Timer resolution
Sampling
Overhead
Perturbation
Skid
Tool frameworks
Performance modeling
Roofline model
Autotuning
Distributed issues
Naming
Flat namespaces
Hierarchical namespaces
IPv4, IPv6, and DNS
Partitioned global address spaces (PGAS)
Overlay networks
Distributed hash tables
Virtual address space
Finger / lookup tables
Chord
Synchronization
Message passing
Clocks
Physical
Lamport clocks
Vector clocks
NTP
Barriers
Consensus protocols
Transactions
Elections
One-phase vs. two-phase
Paxos
Replication and consistency
Partial vs. total orderings
Data-centric
Continuous
Sequential
Causal
Client-centric
Monotonic reads
Monotonic writes
Read-your-writes
Writes-follow-reads
Distributed version control
Fault tolerance
CAP theorem
Consistency
Strong
Eventual
Weak
Availability
Active / passive
Active / active
Partition tolerance
Soft vs. hard failure
Permanent vs. intermittent vs. transient faults
MTBF and FIT
Failure types
Crash
Omission
Timing
Response
Arbitrary (Byzantine)
Failure handling
Detection
Prevention
Avoidance
Recovery
Techniques
DMR vs. TMR
Checksums / hashes
Hamming codes
Reed-Solomon codes
Checkpointing
Security
Attacks
Brute force password cracking
Replay attacks
Man-in-the-middle attacks
Principle of least privilege
Trust
Policies
Encryption
One-way hash functions
Cryptographic systems
Symmetric vs. asymmetric
MD5 / SHA
DES / RSA
Authentication
Shared-key challenge/response
Needham-Schroeder
Kerberos
Key exchange parties
Diffie-Helman key exchange
Certificate authorities
Authorization
Firewalls
Unix file permissions
Access control lists
LDAP and AD
Auditing
Append-only logs
Blockchains (Bitcoin)
File systems
Design issues
File-level vs. block-level
Remote access vs. upload/download
Centralized vs. decentralized
Symmetric vs. asymmetric
Striping
Remote procedure calls
Function stubs
Parameter marshalling
Synchronous vs. asynchronous
Networked file systems
Exports
Mounts
Static vs. automatic
Protocols
NFS
AFS
GoogleFS
Lustre
Bittorrent
Freenet
Middleware
Scheduling
SLURM
Interactive vs. batch jobs
Parameterized MPI jobs
Monitoring
Load balancing
Checkpoint/restart
Parallel patterns and concepts
Task vs. data decomposition
Shared-memory vs. distributed-memory
Locality
Data access patterns
Spatial vs. temporal locality
NUMA effects
Caching
Mirroring
Content delivery networks
Foster's methodology
Partitioning
Communication
Aggregation
Mapping
Communication patterns
Naturally ("embarrassingly") parallel
Reduction trees
Nearest-neighbor
Producer/consumer
Map/reduce
Pipelines and streams
Collective operations
Broadcast
Reduction
Scatter
Gather
Allgather
Allreduce
All-to-all
Matrix operations
Sparse vs. dense
Access patterns
Linear system solvers
Linear algebra
Architectures and technologies
Flynn's taxonomy
SISD
SIMD
MIMD
SPMD
SIMT
Instruction-level parallelism
von Neumann bottleneck
Pipelining instructions
Superscalar processing
Speculative execution
Vector processing
Shared memory
Threading libraries
Pthreads
Java threads
Windows threads
OpenMP
Manycore
Coprocessors and accelerators
GPUs / GPGPUs
SIMT
Streaming multiprocessors
Warps and divergence
Host vs. device memory
CUDA
Kernels
Thread blocks and grids
Grid-stride access pattern
Atomic operations
Fast barrier
OpenACC
Distributed clusters
OpenMPI and MPICH
Homogeneous vs. heterogeneous
Hybrid w/ accelerators
Topologies
Bus
Crossbar switches
Star
Ring
Grid / Mesh
Torus
Hypercube
Fat trees
Interconnects
Ethernet
InfiniBand
OmniPath
Supercomputers
Wide-area networks
End-to-end principle
Sockets
OSI model
QoS concerns
Routing
Circuit switching vs. packet switching
Unicast
Multicast
Broadcast
Web protocols
IP / DNS
TCP / UDP
HTTP / HTML
SSL / TLS
NTP
XML / SOAP / JSON
Peer-to-peer
Bittorrent
Tor
Freenet
Clouds and grids
Infrastructure-as-a-service
Virtualization
Type-1 vs. type-2 hypervisors
Virtual machines
Containers / Docker
Cloud providers
Amazon AWS
Google Cloud
Microsoft Azure
Rackspace
Work-sharing
Condor
GIMPS
Novel architectures
Chiplets
Memory-centric
Neuromorphic
Quantum
Optical
Nanosheet transistors
Turnkey AI solutions