Understanding arenas and heaps in malloc()

9 minute read

Published:

Understanding arenas and heaps in glibc’s malloc()

When malloc() is called, what happens behind the scene?

What is arena

According to the GNU C Library Manual, arena is a large contiguous area of memory which manages its memory allocation. Glibc malloc maintains multiple arenas for multi-thread applications, and they do not interleave with each other. It decreases the usage of lock because each thread has its own arena.

There are two types of arenas in Glibc malloc: main arena and thread arena. Main arena manages heap area in a process, and thread arenas manage the memory mapping segments in a process. Memory in the main arena is allocated through sbrk(), and memory in the thread arena is allocated through mmap().

Arena linked list

In Glibc malloc, arena is a struct, malloc_state. The definition of malloc_state shows that an arena manages top chunk, memory allocated in this arena, its fast, small, and large bins. Also, every arena has a next pointer, which points to te next arena in the arena linked list. Each thread has a variable thread_arena in its TLS(thread local storage), which binds to its arena.

struct malloc_state
{
    ...

  /* Linked list */
  struct malloc_state *next;

  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;

  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;

    ...
};

Picture 1 shows how arena linked list works. The next pointer of the last arena points to the first arena, which is the main arena. When searching for an arena in the linked list starting from the main arena, if the pointer goes back to the main arena, it means that it has iterated a loop.

Arena linked list

In the source code of Glibc malloc, after creating a new arena using new_heap(), this new arena will be inserted to the start of the linked list. Before modifying the linked list, we should lock it first so that multi threads will get synchronized. The code snippet below is a part of _int_new_arena() in arena.c and it shows how to insert a new arena.

  thread_arena = a;
  __libc_lock_init (a->mutex);

  __libc_lock_lock (list_lock);

  /* Add the new arena to the global list.  */
  a->next = main_arena.next;
  /* FIXME: The barrier is an attempt to synchronize with read access
     in reused_arena, which does not acquire list_lock while
     traversing the list.  */
  atomic_write_barrier ();
  main_arena.next = a;

  __libc_lock_unlock (list_lock);

Getting an arena

Even though each thread has their own arena, there will not be 1,000 arenas if you create 1,000 threads. Glibc malloc sets a limit for the total number of arenas. This value is set to be 8*number_of_cpu according to your device.

When a thread calls malloc() for the first time, if number of arenas has not accrossed narenas_limit, malloc will always create a new arena for this thread. Then when this thread calls malloc() again, memory will be allocated from its arena (thread arena). Before allocating memory, the thread will try to get the mutex of its arena. If the mutex is not available (maybe other threads is using this arena because reusing happens when narena_limit is accrossed), it will wait until it is available.

When a thread calls malloc() for the first time, if number of arenas has accrossed the limit, malloc will reuse arenas. It will search on the linked list starting from main arena. It tries to lock each arena, and obtain the first one it can lock.

When there are no more memory to obtain in an arena, malloc will retry to get an arena. It will create a new arena, or search on the linked list for the next available arena, and also avoid current one. However, it will not guarantee the success of memory allocation, because the next available arena is also possible to be full when we have low memory.

Picture 2 shows the call graph of malloc(). It will call arena_get() first, and then call int_malloc() to allocate memory from the arena. When there are no more memory in current arena, it will call arena_get_retry().

Call graph of malloc()

Initializing an arena

Every time when we call malloc() in our program, it will get an arena first, and then allocate memory on it.

There are three ways to get an arena:

  1. If the thread already has an arena (has called malloc() before), try to lock it. If cannot lock it, then wait.
  2. _int_new_arena(): If the thread calls malloc() for the first time, and limit has not been accrossed, create a new arena and allocate a large contiguous memory segment for it.
  3. reused_arena(): Search on the arena list and reuse the first available arena.

Initialization of arenas is completed in malloc_init_state(). Main arena, as a static global variable in a process, is initialized in ptmalloc_init(). Main arena (struct malloc_state) is located in the data segment of a program, and it is accessible for all threads. main_arena.top points to program’s heap segment. Initialization of a thread arena is in int_new_arena(). Thread arena (struct malloc_state) and its memory segment are both located in the memory mapping segment.

Because memory of main arena and thread arena locate in different place, they call different system functions to obtain memory. Thread arena calls new_heap(), and new_heap() calls mmap() to get a large memory segment from kernel. Main arena calls sbrk() to expand its memory in the heap segment.

Picture 3 shows the call graph of getting an arena.

Call graph of arena_get()

Free list: Free list of arena is separated from the arena linked list. Glibc creates a free list of arena when we call fork(). It will copy an arena list in the child process. (why not just use arena list?) The next_free pointer of arena will point to the next free arena.

void
__malloc_fork_unlock_child (void)
{
  if (__malloc_initialized < 1)
    return;

  /* Push all arenas to the free list, except thread_arena, which is
     attached to the current thread.  */
  __libc_lock_init (free_list_lock);
  if (thread_arena != NULL)
    thread_arena->attached_threads = 1;
  free_list = NULL;
  for (mstate ar_ptr = &main_arena;; )
    {
      __libc_lock_init (ar_ptr->mutex);
      if (ar_ptr != thread_arena)
        {
	  /* This arena is no longer attached to any thread.  */
	  ar_ptr->attached_threads = 0;
          ar_ptr->next_free = free_list;
          free_list = ar_ptr;
        }
      ar_ptr = ar_ptr->next;
      if (ar_ptr == &main_arena)
        break;
    }

  __libc_lock_init (list_lock);
}

Allocating memory from arena

Memory is taken from the heap for an arena. For main arena, it locates in the heap segment, and for thread arena, it locates in the memory mapping segment. An arena has an initial heap, and it would obtain more heap when the initial one runs out of space. Heaps for a thread arena is managed by heap_info. Main arena does not have heap_info.

Picture 4 (from https://sourceware.org/glibc/wiki/MallocInternals)shows how heaps for an arena links to each other. The ar_ptr pointers of those heaps point to the same arena.

Heaps and arenas

sysmalloc()

Allocation of memory is completed in _int_malloc(). It would first search in the bins (bins will not be introduced in this article). If no available bins can be used, it would call sysmalloc().

Picture 5 shows the call graph of sysmalloc(). What it does can be concluded as followed:

  1. If arena is NULL, or size of memory to be allocated is large (more than 128K), it calls mmap() directly.
  2. If size of memory is smaller than 128K, it calls grow_heap() to grow the top chunk of heap, and obtain memory from the top chunk.
  3. If heap is full and grow_heap() failed, it calls new_heap() to obtain a new heap and allocate memory from it.
  4. If grow_heap() and new_heap() failed, it will try to mmap() directly.

Call graph of sysmalloc().

new_heap()

Initialization of thread arena, and sysmalloc() would call new_heap() to obtain a heap from kernel. There are three mmap() in new_heap():

  1. If the starting address of heap is known (calculated in last malloc()), it calls mmap() to allocate HEAP_MAX_SIZE(usually 64M) memory from the starting address.
  2. If the starting address is not known, it calls mmap() to allocate 2xHEAP_MAX_SIZE memory. Then it would unmap() HEAP_MAX_SIZE memory and calculate the starting address of next mmap() to be HEAP_MAX_SIZE from current starting point.
  3. If both of 1 and 2 failed, it calls mmap() to allocate HEAP_MAX_SIZE memory. (just give it a try)

An interesting fact I observe when I print log in the first and the second mmap() is that, malloc() always calls them alternatively. It is because malloc() would always call the second mmap() first, and then it calculates the starting address for the next mmap(), so it calls the first mmap() at the next time.

I think the reason is that, Glibc malloc() tries to make memory segments as contiguous as possible. By allocating 2xHEAP_MAX_SIZE memory first, and calculate the starting address for next malloc(), it makes heaps “side by side”. It can decrease the memory fragments in kernel. Also, it tells the next mmap() where to start, so that it will not overlap with current heap, and leads to faults.

Picture 6 shows the call graph of new_heap().

Call graph of new_heap().

Conclusion

In conclusion, what happens behind malloc() can be concluded in Picture 7.

Call graph of malloc().