How do waiting threads effect performance?
I am fairly new to multithreading and threads in general; please bare with
me as the idea of this is not only hard for me to understand in terms of
programming, but even more so when it comes to what the computer is doing
and how multiple threads effect performance. I search extensively to find
an answer to this question online, but to no avail.
Anyway, I am currently writing a program which has a relatively expensive
calculation when the worst case scenario is reached. I have tried to
dynamically create threads and this has proven to work most of the time,
but when the worst case scenario comes around, the execution speed is
beyond the time I am allotted to complete these calculations, which is
largely due to the creation and destruction of these threads. This has
lead me to the idea that I have used in the past, which is creating the
threads prior to execution, instead of creating and destroying them
dynamically, and having them wait on a condition before performing the
calculation instead of creating them dynamically.
Normally I wouldn't think twice about doing this, but because I will be
creating lots of threads when the system is initializing, I am concerned
with how this will effect the performance of the system. This has begged a
certain question: how do threads that are waiting on a condition effect
the system, if at all? Is creating threads during the program's
initializing and only notifying them when I need to perform a calculation
the correct way to approach this problem, or does there exist a better
solution that I am unaware of? I have thought about using a threadpool in
order to do this, too. Would a threadpool be best for this situation?
Some information that you may find helpful to better answer this question:
--I am using the boost library (version 1_54_0) in order to multithread
the program.
--I am using Windows 7 and Visual Studio.
--If I create the threads when the program initializes, I will be creating
200-1000 threads (this number is predetermined as a #define and I won't
necessarily be using all threads every time I need to do the calculation).
--The number of threads needed varies each time I need to perform this
calculation; it is dependent on the number of inputs received which
changes every time the calculation is performed, but can never exceed a
maximum value (the maximum number being determined at compile time as a
#define).
--The computer I am using has 32 cores.
I am sorry if this question isn't up to par; I am a new stack overflow
user, so feel free to ask for more information and critique me as to how I
can better explain the situation and problem. Thank you in advance for
your help!
UPDATE
Here is the source code (some variables have been renamed in compliance
with my company's terms and conditions)
for(int i = curBlob.boundingBoxStartY; i < curBlob.boundingBoxStartY +
curBlob.boundingBoxHeight; ++i)
{
for(int j = curBlob.boundingBoxStartX; j < curBlob.boundingBoxStartX +
curBlob.boundingBoxWidth; ++j)
{
for(int k = 0; k < NUM_FILTERS; ++k)
{
if((int)arrayOfBinaryValues[channel][k].at<uchar>(i,j) == 1)
{
for(int p = 0; p < NUM_FILTERS; ++p)
{
if(p != k)
{
if((curBlob.boundingBoxStartX + 1 <
(curBlob.boundingBoxStartX +
curBlob.boundingBoxWidth)) &&
((int)arrayOfBinaryValues[channel][k].at<uchar>(i
+ 1,j) == 1))
++count;
if((curBlob.boundingBoxStartY + 1 <
(curBlob.boundingBoxStartY +
curBlob.boundingBoxHeight)) &&
((int)arrayOfBinaryValues[channel][k].at<uchar>(i,j
+ 1) == 1))
++count;
}
}
}
}
}
}
Source code provided is strictly to show the complexity of the algorithm.
No comments:
Post a Comment