🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 7 (from laksa107)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

📄
INDEXABLE
CRAWLED
1 day ago
🤖
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH0 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://bnikolic.co.uk/blog/python-numerical-gil
Last Crawled2026-04-10 17:37:08 (1 day ago)
First Indexed2023-04-10 22:32:02 (3 years ago)
HTTP Status Code200
Meta TitleB. Nikolic: Python Global Interpreter Lock (GIL) + numerical data processing
Meta Descriptionnull
Meta Canonicalnull
Boilerpipe Text
The global interpreter lock may naively seem to imply that a Python process can only use one core of a processor. That is not at all the case: one (well-known) reason is that Python extension code (such as ubiquitous NumPy/SciPy kernels) can themselves use multiple threads. This is what I'd call fine-grained threading, which is easy to use and effective when the kernels have available sufficient parallelism. Coarse-grained parallelism through the threading module can however also be effective for the following reasons. The Python GIL ensures that the Python interpreter in a process is only ever executing one opcode (Python instruction) at any one time. Instructions in a compiled extension to Python, or a syscall do not intrinsically need to the lock acquired, and indeed the lock is released before most such calls, e.g., the lock is released before NumPy kernels are called. For this reason many different NumPy kernels from /different Python threads/ can be executing at the same time, utilising multiple CPU cores. Illustration As an illustration I've slightly Adapted the code Travis posted here ( http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html ) so that it uses either threading or multiprocessing for coarse-grained threading. The multiprocessing approach uses multiple Python processes and so can not be affected by the GIL. # Adapted from: http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html from numpy import zeros, array_split, linspace import threading import multiprocessing NT = 1 dx = 0 .1 dy = 0 .1 dx2 = dx*dx dy2 = dy*dy def num_update ( u ) : u [ 1 :-1,1:-1 ] = (( u [ 2 :,1:-1 ] +u [ :-2,1:-1 ]) *dy2 + ( u [ 1 :-1,2: ] + u [ 1 :-1,:-2 ]) *dx2 ) / ( 2 * ( dx2+dy2 )) def py_update ( u ) : nx, ny = u.shape for i in range ( 1 ,nx-1 ) : for j in range ( 1 , ny-1 ) : u [ i,j ] = (( u [ i+1, j ] + u [ i-1, j ]) * dy2 + ( u [ i, j+1 ] + u [ i, j-1 ]) * dx2 ) / ( 2 * ( dx2+dy2 )) def calc ( N, Niter = 100 , func = py_update, icl =[ 1 ]) : for ic in icl: u = zeros ([ N, N ]) u [ 0 ] = ic for i in range ( Niter ) : func ( u ) def blot ( np = 100 , N = 30 , mp = False, num = True ) : threads =[] if num: update = num_update else : update = py_update for icl in array_split ( linspace ( 1 ,2,np ) , NT ) : if mp: t = multiprocessing.Process ( target = calc, args =( N, 30 , update, icl, )) else : t = threading.Thread ( target = calc, args =( N, 30 , update, icl, )) t.start () threads.append ( t ) for t in threads: t.join () Measuring the performance of this example as function of number of threads (NT) gives following results: The graph labelled Threading+PurePy is for the pure Python version of the problem. It can be seen there is a small increase in the execution time as more threads are added. This is due to the GIL: each pure Python thread has to acquire the lock in order to execute any instructions, so speed of execution engine can never be faster than single threaded. The small increases in execution time are due to overheads of acquiring the GIL and switching between the threads. The graph labelled Threading+NumPy uses NumPy kernels for the computation. It can be seen that as the number of threads increases execution time decreases close to linearly and then levels off. A similar pattern can be seen in the MultiProc+Numpy, in which the multipricessing module is used instead of threading. In both of these scenarious the speed of execution with multiple threads exceeds the single core case, demonstrating that GIL is not limiting the execution to an equivalent of one core. It can be seen that multiprocessing performance levels of at slightly faster than threading but it isn't clear if this is due to lock-acquisition and thread switching overheads or due O/S level effects.
Markdown
[Bojan Nikolic](https://bnikolic.co.uk/blog/python-numerical-gil) - [\[home\]](http://www.bnikolic.co.uk/) - [\[blog\]](http://www.bnikolic.co.uk/blog/) - [\[BN Algorithms Ltd\]](http://www.bnikolic.co.uk/about-bn-algorithms) - [\[Quantlib\]](http://www.bnikolic.co.uk/ql/) # Python Global Interpreter Lock (GIL) + numerical data processing The global interpreter lock may naively seem to imply that a Python process can only use one core of a processor. That is *not* at all the case: one (well-known) reason is that Python extension code (such as ubiquitous NumPy/SciPy kernels) can themselves use multiple threads. This is what I'd call fine-grained threading, which is easy to use and effective when the kernels have available sufficient parallelism. Coarse-grained parallelism through the threading module can however also be effective for the following reasons. The Python GIL ensures that the Python interpreter in a process is only ever executing one opcode (Python instruction) at any one time. Instructions in a compiled extension to Python, or a syscall do not intrinsically need to the lock acquired, and indeed the lock is released before most such calls, e.g., the lock is released before NumPy kernels are called. For this reason many different NumPy kernels from /different Python threads/ can be executing at the same time, utilising multiple CPU cores. ## Illustration As an illustration I've slightly Adapted the code Travis posted here (<http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html>) so that it uses either threading or multiprocessing for coarse-grained threading. The multiprocessing approach uses multiple Python processes and so can not be affected by the GIL. ``` # Adapted from: http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html from numpy import zeros, array_split, linspace import threading import multiprocessing NT=1 dx = 0.1 dy = 0.1 dx2 = dx*dx dy2 = dy*dy def num_update(u): u[1:-1,1:-1] = ((u[2:,1:-1]+u[:-2,1:-1])*dy2 + (u[1:-1,2:] + u[1:-1,:-2])*dx2) / (2*(dx2+dy2)) def py_update(u): nx, ny = u.shape for i in range(1,nx-1): for j in range(1, ny-1): u[i,j] = ((u[i+1, j] + u[i-1, j]) * dy2 + (u[i, j+1] + u[i, j-1]) * dx2) / (2*(dx2+dy2)) def calc(N, Niter=100, func=py_update, icl=[1]): for ic in icl: u = zeros([N, N]) u[0] = ic for i in range(Niter): func(u) def blot(np=100, N=30, mp=False, num=True): threads=[] if num: update=num_update else: update=py_update for icl in array_split(linspace(1,2,np), NT): if mp: t=multiprocessing.Process(target=calc, args=(N, 30, update, icl,)) else: t=threading.Thread(target=calc, args=(N, 30, update, icl,)) t.start() threads.append(t) for t in threads: t.join() ``` Measuring the performance of this example as function of number of threads (NT) gives following results: ![figs/python-gil/measurements.png](https://bnikolic.co.uk/blog/figs/python-gil/measurements.png) The graph labelled Threading+PurePy is for the pure Python version of the problem. It can be seen there is a small *increase* in the execution time as more threads are added. This is due to the GIL: each pure Python thread has to acquire the lock in order to execute any instructions, so speed of execution engine can never be faster than single threaded. The small increases in execution time are due to overheads of acquiring the GIL and switching between the threads. The graph labelled Threading+NumPy uses NumPy kernels for the computation. It can be seen that as the number of threads increases execution time decreases close to linearly and then levels off. A similar pattern can be seen in the MultiProc+Numpy, in which the multipricessing module is used instead of threading. In both of these scenarious the speed of execution with multiple threads exceeds the single core case, demonstrating that GIL is not limiting the execution to an equivalent of one core. It can be seen that multiprocessing performance levels of at slightly faster than threading but it isn't clear if this is due to lock-acquisition and thread switching overheads or due O/S level effects. © 2008 to 2019, [Bojan Nikolic](http://www.bnikolic.co.uk/). Send comments/enquiries to: [webs@bnikolic.co.uk](mailto:webs@bnikolic.co.uk). [Terms and Conditions](http://www.bnikolic.co.uk/siteterms). ***
Readable Markdown
The global interpreter lock may naively seem to imply that a Python process can only use one core of a processor. That is *not* at all the case: one (well-known) reason is that Python extension code (such as ubiquitous NumPy/SciPy kernels) can themselves use multiple threads. This is what I'd call fine-grained threading, which is easy to use and effective when the kernels have available sufficient parallelism. Coarse-grained parallelism through the threading module can however also be effective for the following reasons. The Python GIL ensures that the Python interpreter in a process is only ever executing one opcode (Python instruction) at any one time. Instructions in a compiled extension to Python, or a syscall do not intrinsically need to the lock acquired, and indeed the lock is released before most such calls, e.g., the lock is released before NumPy kernels are called. For this reason many different NumPy kernels from /different Python threads/ can be executing at the same time, utilising multiple CPU cores. ## Illustration As an illustration I've slightly Adapted the code Travis posted here (<http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html>) so that it uses either threading or multiprocessing for coarse-grained threading. The multiprocessing approach uses multiple Python processes and so can not be affected by the GIL. ``` # Adapted from: http://technicaldiscovery.blogspot.co.uk/2011/06/speeding-up-python-numpy-cython-and.html from numpy import zeros, array_split, linspace import threading import multiprocessing NT=1 dx = 0.1 dy = 0.1 dx2 = dx*dx dy2 = dy*dy def num_update(u): u[1:-1,1:-1] = ((u[2:,1:-1]+u[:-2,1:-1])*dy2 + (u[1:-1,2:] + u[1:-1,:-2])*dx2) / (2*(dx2+dy2)) def py_update(u): nx, ny = u.shape for i in range(1,nx-1): for j in range(1, ny-1): u[i,j] = ((u[i+1, j] + u[i-1, j]) * dy2 + (u[i, j+1] + u[i, j-1]) * dx2) / (2*(dx2+dy2)) def calc(N, Niter=100, func=py_update, icl=[1]): for ic in icl: u = zeros([N, N]) u[0] = ic for i in range(Niter): func(u) def blot(np=100, N=30, mp=False, num=True): threads=[] if num: update=num_update else: update=py_update for icl in array_split(linspace(1,2,np), NT): if mp: t=multiprocessing.Process(target=calc, args=(N, 30, update, icl,)) else: t=threading.Thread(target=calc, args=(N, 30, update, icl,)) t.start() threads.append(t) for t in threads: t.join() ``` Measuring the performance of this example as function of number of threads (NT) gives following results: ![figs/python-gil/measurements.png](https://bnikolic.co.uk/blog/figs/python-gil/measurements.png) The graph labelled Threading+PurePy is for the pure Python version of the problem. It can be seen there is a small *increase* in the execution time as more threads are added. This is due to the GIL: each pure Python thread has to acquire the lock in order to execute any instructions, so speed of execution engine can never be faster than single threaded. The small increases in execution time are due to overheads of acquiring the GIL and switching between the threads. The graph labelled Threading+NumPy uses NumPy kernels for the computation. It can be seen that as the number of threads increases execution time decreases close to linearly and then levels off. A similar pattern can be seen in the MultiProc+Numpy, in which the multipricessing module is used instead of threading. In both of these scenarious the speed of execution with multiple threads exceeds the single core case, demonstrating that GIL is not limiting the execution to an equivalent of one core. It can be seen that multiprocessing performance levels of at slightly faster than threading but it isn't clear if this is due to lock-acquisition and thread switching overheads or due O/S level effects.
Shard7 (laksa)
Root Hash3261792786456953207
Unparsed URLuk,co,bnikolic!/blog/python-numerical-gil s443