Local memory is so named because its scope is local to the thread, not because of its physical location. In fact, local memory is off-chip. Hence, access to local memory is as expensive as access to global memory. Like global memory, local memory is not cached on devices of compute capability 1.x. In other words, the term “local” in the name does not imply faster access.
Local memory is used only to hold automatic variables. This is done by the nvcc compiler when it determines that there is insufficient register space to hold the variable. Automatic variables that are likely to be placed in local memory are large structures or arrays that would consume too much register space and arrays that the compiler determines may be indexed dynamically.
Inspection of the PTX assembly code (obtained by compiling with -ptx or -keep command-line options to nvcc) reveals whether a variable has been placed in local memory during the first compilation phases. If it has, it will be declared using the .local mnemonic and accessed using the ld.local and st.local mnemonics. If it has not, subsequent compilation phases might still decide otherwise, if they find the variable consumes too much register space for the targeted architecture. There is no way to check this for a specific variable, but the compiler reports total local memory usage per kernel (lmem) when run with the --ptxas-options=-v option.