McqMate
These multiple-choice questions (MCQs) are designed to enhance your knowledge and understanding in the following areas: Computer Science Engineering (CSE) .
1. |
A CUDA program is comprised of two primary components: a host and a _____. |
A. | gpu??kernel |
B. | cpu??kernel |
C. | os |
D. | none of above |
Answer» A. gpu??kernel |
2. |
The kernel code is dentified by the ________qualifier with void return type |
A. | _host_ |
B. | __global__?? |
C. | _device_ |
D. | void |
Answer» B. __global__?? |
3. |
The kernel code is only callable by the host |
A. | true |
B. | false |
Answer» A. true |
4. |
The kernel code is executable on the device and host |
A. | true |
B. | false |
Answer» B. false |
5. |
Calling a kernel is typically referred to as _________. |
A. | kernel thread |
B. | kernel initialization |
C. | kernel termination |
D. | kernel invocation |
Answer» D. kernel invocation |
6. |
Host codes in a CUDA application can Initialize a device |
A. | true |
B. | false |
Answer» A. true |
7. |
Host codes in a CUDA application can Allocate GPU memory |
A. | true |
B. | false |
Answer» A. true |
8. |
Host codes in a CUDA application can not Invoke kernels |
A. | true |
B. | false |
Answer» B. false |
9. |
CUDA offers the Chevron Syntax to configure and execute a kernel. |
A. | true |
B. | false |
Answer» A. true |
10. |
the BlockPerGrid and ThreadPerBlock parameters are related to the ________ model supported by CUDA. |
A. | host |
B. | kernel |
C. | thread??abstraction |
D. | none of above |
Answer» C. thread??abstraction |
11. |
_________ is Callable from the device only |
A. | _host_ |
B. | __global__?? |
C. | _device_ |
D. | none of above |
Answer» C. _device_ |
12. |
______ is Callable from the host |
A. | _host_ |
B. | __global__?? |
C. | _device_ |
D. | none of above |
Answer» B. __global__?? |
13. |
______ is Callable from the host |
A. | _host_ |
B. | __global__ |
C. | _device_ |
D. | none of above |
Answer» A. _host_ |
14. |
CUDA supports ____________ in which code in a single thread is executed by all other threads. |
A. | tread division |
B. | tread termination |
C. | thread abstraction |
D. | none of above |
Answer» C. thread abstraction |
15. |
In CUDA, a single invoked kernel is referred to as a _____. |
A. | block |
B. | tread |
C. | grid |
D. | none of above |
Answer» C. grid |
16. |
A grid is comprised of ________ of threads. |
A. | block |
B. | bunch |
C. | host |
D. | none of above |
Answer» A. block |
17. |
A block is comprised of multiple _______. |
A. | treads |
B. | bunch |
C. | host |
D. | none of above |
Answer» A. treads |
18. |
a solution of the problem in representing the parallelismin algorithm is |
A. | cud |
B. | pta |
C. | cda |
D. | cuda |
Answer» D. cuda |
19. |
Host codes in a CUDA application can not Reset a device |
A. | true |
B. | false |
Answer» B. false |
20. |
Host codes in a CUDA application can Transfer data to and from the device |
A. | true |
B. | false |
Answer» A. true |
21. |
Host codes in a CUDA application can not Deallocate memory on the GPU |
A. | true |
B. | false |
Answer» B. false |
22. |
Any condition that causes a processor to stall is called as _____. |
A. | hazard |
B. | page fault |
C. | system error |
D. | none of the above |
Answer» A. hazard |
23. |
The time lost due to branch instruction is often referred to as _____. |
A. | latency |
B. | delay |
C. | branch penalty |
D. | none of the above |
Answer» C. branch penalty |
24. |
_____ method is used in centralized systems to perform out of order execution. |
A. | scorecard |
B. | score boarding |
C. | optimizing |
D. | redundancy |
Answer» B. score boarding |
25. |
The computer cluster architecture emerged as an alternative for ____. |
A. | isa |
B. | workstation |
C. | super computers |
D. | distributed systems |
Answer» C. super computers |
26. |
NVIDIA CUDA Warp is made up of how many threads? |
A. | 512 |
B. | 1024 |
C. | 312 |
D. | 32 |
Answer» D. 32 |
27. |
Out-of-order instructions is not possible on GPUs. |
A. | true |
B. | false |
C. | -- |
D. | -- |
Answer» B. false |
28. |
CUDA supports programming in .... |
A. | c or c++ only |
B. | java, python, and more |
C. | c, c++, third party wrappers for java, python, and more |
D. | pascal |
Answer» C. c, c++, third party wrappers for java, python, and more |
29. |
FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU. |
A. | 32-bit ieee floating point instructions |
B. | 32-bit integer instructions |
C. | both |
D. | none of the above |
Answer» A. 32-bit ieee floating point instructions |
30. |
Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP). |
A. | 1024 |
B. | 128 |
C. | 512 |
D. | 8 |
Answer» D. 8 |
31. |
Each NVIDIA GPU has ------ Streaming Multiprocessors |
A. | 8 |
B. | 1024 |
C. | 512 |
D. | 16 |
Answer» D. 16 |
32. |
CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----. |
A. | “programming-overhead”, 2 clock |
B. | “zero-overhead”, 1 clock |
C. | 64, 2 clock |
D. | 32, 1 clock |
Answer» B. “zero-overhead”, 1 clock |
33. |
Each warp of GPU receives a single instruction and “broadcasts” it to all of its threads. It is a ---- operation. |
A. | simd (single instruction multiple data) |
B. | simt (single instruction multiple thread) |
C. | sisd (single instruction single data) |
D. | sist (single instruction single thread) |
Answer» B. simt (single instruction multiple thread) |
34. |
Limitations of CUDA Kernel |
A. | recursion, call stack, static variable declaration |
B. | no recursion, no call stack, no static variable declarations |
C. | recursion, no call stack, static variable declaration |
D. | no recursion, call stack, no static variable declarations |
Answer» B. no recursion, no call stack, no static variable declarations |
35. |
What is Unified Virtual Machine |
A. | it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously. |
B. | it is a technique for managing separate host and device memory spaces. |
C. | it is a technique for executing device code on host and host code on device. |
D. | it is a technique for executing general purpose programs on device instead of host. |
Answer» A. it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously. |
36. |
_______ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____. |
A. | python, gpus. |
B. | c, cpus. |
C. | cuda c, gpus. |
D. | java, cpus. |
Answer» C. cuda c, gpus. |
37. |
The CUDA architecture consists of --------- for parallel computing kernels and functions. |
A. | risc instruction set architecture |
B. | cisc instruction set architecture |
C. | zisc instruction set architecture |
D. | ptx instruction set architecture |
Answer» D. ptx instruction set architecture |
38. |
CUDA stands for --------, designed by NVIDIA. |
A. | common union discrete architecture |
B. | complex unidentified device architecture |
C. | compute unified device architecture |
D. | complex unstructured distributed architecture |
Answer» C. compute unified device architecture |
39. |
The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device. State true or false. |
A. | true |
B. | false |
C. | --- |
D. | --- |
Answer» A. true |
40. |
The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device. |
A. | 128, 256, 512 |
B. | 32, 64, 128 |
C. | 64, 128, 256 |
D. | 256, 512, 1024 |
Answer» A. 128, 256, 512 |
41. |
NVIDIA 8-series GPUs offer -------- . |
A. | 50-200 gflops |
B. | 200-400 gflops |
C. | 400-800 gflops |
D. | 800-1000 gflops |
Answer» A. 50-200 gflops |
42. |
IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU. |
A. | 32-bit ieee floating point instructions |
B. | 32-bit integer instructions |
C. | both |
D. | none of the above |
Answer» B. 32-bit integer instructions |
43. |
CUDA Hardware programming model supports:
|
A. | a,c,d,f |
B. | b,c,d,e |
C. | a,d,e,f |
D. | a,b,c,d,e,f |
Answer» D. a,b,c,d,e,f |
44. |
In CUDA memory model there are following memory types available:
|
A. | a, b, d, f |
B. | a, c, d, e, f |
C. | a, b, c, d, e, f |
D. | b, c, e, f |
Answer» C. a, b, c, d, e, f |
45. |
What is the equivalent of general C program with CUDA C: int main(void) { printf("Hello, World!\n"); return 0; } |
A. | int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; } |
B. | __global__ void kernel( void ) { } int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; } |
C. | __global__ void kernel( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; } |
D. | __global__ int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; } |
Answer» B. __global__ void kernel( void ) { } int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; } |
46. |
Which function runs on Device (i.e. GPU): a) __global__ void kernel (void ) { } b) int main ( void ) { ... return 0; } |
A. | a |
B. | b |
C. | both a,b |
D. | --- |
Answer» A. a |
47. |
A simple kernel for adding two integers: __global__ void add( int *a, int *b, int *c ) { *c = *a + *b; } where __global__ is a CUDA C keyword which indicates that: |
A. | add() will execute on device, add() will be called from host |
B. | add() will execute on host, add() will be called from device |
C. | add() will be called and executed on host |
D. | add() will be called and executed on device |
Answer» A. add() will execute on device, add() will be called from host |
48. |
If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement: |
A. | cudamalloc( &dev_a, sizeof( int ) ) |
B. | malloc( &dev_a, sizeof( int ) ) |
C. | cudamalloc( (void**) &dev_a, sizeof( int ) ) |
D. | malloc( (void**) &dev_a, sizeof( int ) ) |
Answer» C. cudamalloc( (void**) &dev_a, sizeof( int ) ) |
49. |
If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement: |
A. | memcpy( dev_a, &a, size); |
B. | cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice ); |
C. | memcpy( (void*) dev_a, &a, size); |
D. | cudamemcpy( (void*) &dev_a, &a, size, cudamemcpydevicetohost ); |
Answer» B. cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice ); |
50. |
Triple angle brackets mark in a statement inside main function, what does it indicates? |
A. | a call from host code to device code |
B. | a call from device code to host code |
C. | less than comparison |
D. | greater than comparison |
Answer» A. a call from host code to device code |
Done Studing? Take A Test.
Great job completing your study session! Now it's time to put your knowledge to the test. Challenge yourself, see how much you've learned, and identify areas for improvement. Don’t worry, this is all part of the journey to mastery. Ready for the next step? Take a quiz to solidify what you've just studied.