How does cudaeventrecord handle asynchronous events?

When you call cudaEventRecord, you are pushing an event into the stream. If there is work in the stream ahead of the event, the event sits unprocessed in the stream FIFO until every operation ahead of it has completed. All these calls are asynchronous with respect to the calling host thread.

Why do we need to synchronize events?

Using events as timers basically comes down to this: It's important to synchronize on event2 because you want to make sure everything got executed before calculating the time. As both events and the kernel are on the same stream (order is preserved) event1 and kernel got executed too.

Is it possible to call cudastreamsynchronize instead of cudathreadsynchronize?

You could call cudaStreamSynchronize or even cudaThreadSynchronize instead but both are overkill in this case. Thanks for contributing an answer to Stack Overflow!

