알고리즘, 파이썬을 곁들인 (5)

알고리즘, 파이썬을 곁들인 (5) - 선택 정렬

2023. 1. 3. 22:07ㆍ기술 공부/알고리즘

오늘은 선택 정렬 알고리즘에 대해 알아보자.

선택 정렬이란,
순차적으로 작은(또는 큰) 값을 골라내 정렬을 하는 것이다.

무슨 말인고 하니,

nums = [5, 2, 1, 4, 0, 3, 9, 8, 7, 6]

이렇게 생긴 nums List로 선택 정렬 알고리즘을 작성해 보자.

정렬한 데이터를 저장할 List를 만든다.
순차 탐색 알고리즘을 사용해 nums List에서 가장 작은 값을 찾는다.
찾아낸 가장 작은 값을 새로운 List에 저장한다.
찾아낸 가장 작은 값을 nums List에서 지운다.
다시 2번으로 돌아간다.

구현해 보자.

nums = [5, 2, 1, 4, 0, 3, 9, 8, 7, 6]
sorted_num = []

for _ in range(len(nums)):
    min_index = 0

    for index, num in enumerate(nums):
        if nums[min_index] > num:
            min_index = index

    sorted_num.append(nums[min_index])
    del nums[min_index]

print(sorted_num)

출력 결과 : 

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

잘 정렬되었다.

하지만 For문의 반복 횟수를 대략 세보면,
10 * (10 + 9 +... + 1) = 550번을 반복한다.

그렇다. 데이터의 개수인 10(n)보다, 훨씬 더 많이 연산을 해야 한다.
우린 이런 경우를 O(n^2) (n의 제곱)이라고 부르기로 했었다.

그리고, 우리 Python의 List Class는 정렬 Method를 제공하고 있다.
어떻게 구현되어 있을까?

/* An adaptive, stable, natural mergesort.  See listsort.txt.
 * Returns Py_None on success, NULL on error.  Even in case of error, the
 * list will be some permutation of its input state (nothing is lost or
 * duplicated).
 */
/*[clinic input]
list.sort
    *
    key as keyfunc: object = None
    reverse: bool = False
Sort the list in ascending order and return None.
The sort is in-place (i.e. the list itself is modified) and stable (i.e. the
order of two equal elements is maintained).
If a key function is given, apply it once to each list item and sort them,
ascending or descending, according to their function values.
The reverse flag can be set to sort in descending order.
[clinic start generated code]*/

static PyObject *
list_sort_impl(PyListObject *self, PyObject *keyfunc, int reverse)
/*[clinic end generated code: output=57b9f9c5e23fbe42 input=a74c4cd3ec6b5c08]*/
{
    MergeState ms;
    Py_ssize_t nremaining;
    Py_ssize_t minrun;
    sortslice lo;
    Py_ssize_t saved_ob_size, saved_allocated;
    PyObject **saved_ob_item;
    PyObject **final_ob_item;
    PyObject *result = NULL;            /* guilty until proved innocent */
    Py_ssize_t i;
    PyObject **keys;

    assert(self != NULL);
    assert(PyList_Check(self));
    if (keyfunc == Py_None)
        keyfunc = NULL;

    /* The list is temporarily made empty, so that mutations performed
     * by comparison functions can't affect the slice of memory we're
     * sorting (allowing mutations during sorting is a core-dump
     * factory, since ob_item may change).
     */
    saved_ob_size = Py_SIZE(self);
    saved_ob_item = self->ob_item;
    saved_allocated = self->allocated;
    Py_SET_SIZE(self, 0);
    self->ob_item = NULL;
    self->allocated = -1; /* any operation will reset it to >= 0 */

    if (keyfunc == NULL) {
        keys = NULL;
        lo.keys = saved_ob_item;
        lo.values = NULL;
    }
    else {
        if (saved_ob_size < MERGESTATE_TEMP_SIZE/2)
            /* Leverage stack space we allocated but won't otherwise use */
            keys = &ms.temparray[saved_ob_size+1];
        else {
            keys = PyMem_Malloc(sizeof(PyObject *) * saved_ob_size);
            if (keys == NULL) {
                PyErr_NoMemory();
                goto keyfunc_fail;
            }
        }

        for (i = 0; i < saved_ob_size ; i++) {
            keys[i] = PyObject_CallOneArg(keyfunc, saved_ob_item[i]);
            if (keys[i] == NULL) {
                for (i=i-1 ; i>=0 ; i--)
                    Py_DECREF(keys[i]);
                if (saved_ob_size >= MERGESTATE_TEMP_SIZE/2)
                    PyMem_Free(keys);
                goto keyfunc_fail;
            }
        }

        lo.keys = keys;
        lo.values = saved_ob_item;
    }


    /* The pre-sort check: here's where we decide which compare function to use.
     * How much optimization is safe? We test for homogeneity with respect to
     * several properties that are expensive to check at compare-time, and
     * set ms appropriately. */
    if (saved_ob_size > 1) {
        /* Assume the first element is representative of the whole list. */
        int keys_are_in_tuples = (Py_IS_TYPE(lo.keys[0], &PyTuple_Type) &&
                                  Py_SIZE(lo.keys[0]) > 0);

        PyTypeObject* key_type = (keys_are_in_tuples ?
                                  Py_TYPE(PyTuple_GET_ITEM(lo.keys[0], 0)) :
                                  Py_TYPE(lo.keys[0]));

        int keys_are_all_same_type = 1;
        int strings_are_latin = 1;
        int ints_are_bounded = 1;

        /* Prove that assumption by checking every key. */
        for (i=0; i < saved_ob_size; i++) {

            if (keys_are_in_tuples &&
                !(Py_IS_TYPE(lo.keys[i], &PyTuple_Type) && Py_SIZE(lo.keys[i]) != 0)) {
                keys_are_in_tuples = 0;
                keys_are_all_same_type = 0;
                break;
            }

            /* Note: for lists of tuples, key is the first element of the tuple
             * lo.keys[i], not lo.keys[i] itself! We verify type-homogeneity
             * for lists of tuples in the if-statement directly above. */
            PyObject *key = (keys_are_in_tuples ?
                             PyTuple_GET_ITEM(lo.keys[i], 0) :
                             lo.keys[i]);

            if (!Py_IS_TYPE(key, key_type)) {
                keys_are_all_same_type = 0;
                /* If keys are in tuple we must loop over the whole list to make
                   sure all items are tuples */
                if (!keys_are_in_tuples) {
                    break;
                }
            }

            if (keys_are_all_same_type) {
                if (key_type == &PyLong_Type &&
                    ints_are_bounded &&
                    Py_ABS(Py_SIZE(key)) > 1) {

                    ints_are_bounded = 0;
                }
                else if (key_type == &PyUnicode_Type &&
                         strings_are_latin &&
                         PyUnicode_KIND(key) != PyUnicode_1BYTE_KIND) {

                        strings_are_latin = 0;
                    }
                }
            }

        /* Choose the best compare, given what we now know about the keys. */
        if (keys_are_all_same_type) {

            if (key_type == &PyUnicode_Type && strings_are_latin) {
                ms.key_compare = unsafe_latin_compare;
            }
            else if (key_type == &PyLong_Type && ints_are_bounded) {
                ms.key_compare = unsafe_long_compare;
            }
            else if (key_type == &PyFloat_Type) {
                ms.key_compare = unsafe_float_compare;
            }
            else if ((ms.key_richcompare = key_type->tp_richcompare) != NULL) {
                ms.key_compare = unsafe_object_compare;
            }
            else {
                ms.key_compare = safe_object_compare;
            }
        }
        else {
            ms.key_compare = safe_object_compare;
        }

        if (keys_are_in_tuples) {
            /* Make sure we're not dealing with tuples of tuples
             * (remember: here, key_type refers list [key[0] for key in keys]) */
            if (key_type == &PyTuple_Type) {
                ms.tuple_elem_compare = safe_object_compare;
            }
            else {
                ms.tuple_elem_compare = ms.key_compare;
            }

            ms.key_compare = unsafe_tuple_compare;
        }
    }
    /* End of pre-sort check: ms is now set properly! */

    merge_init(&ms, saved_ob_size, keys != NULL, &lo);

    nremaining = saved_ob_size;
    if (nremaining < 2)
        goto succeed;

    /* Reverse sort stability achieved by initially reversing the list,
    applying a stable forward sort, then reversing the final result. */
    if (reverse) {
        if (keys != NULL)
            reverse_slice(&keys[0], &keys[saved_ob_size]);
        reverse_slice(&saved_ob_item[0], &saved_ob_item[saved_ob_size]);
    }

    /* March over the array once, left to right, finding natural runs,
     * and extending short natural runs to minrun elements.
     */
    minrun = merge_compute_minrun(nremaining);
    do {
        int descending;
        Py_ssize_t n;

        /* Identify next run. */
        n = count_run(&ms, lo.keys, lo.keys + nremaining, &descending);
        if (n < 0)
            goto fail;
        if (descending)
            reverse_sortslice(&lo, n);
        /* If short, extend to min(minrun, nremaining). */
        if (n < minrun) {
            const Py_ssize_t force = nremaining <= minrun ?
                              nremaining : minrun;
            if (binarysort(&ms, lo, lo.keys + force, lo.keys + n) < 0)
                goto fail;
            n = force;
        }
        /* Maybe merge pending runs. */
        assert(ms.n == 0 || ms.pending[ms.n -1].base.keys +
                            ms.pending[ms.n-1].len == lo.keys);
        if (found_new_run(&ms, n) < 0)
            goto fail;
        /* Push new run on stack. */
        assert(ms.n < MAX_MERGE_PENDING);
        ms.pending[ms.n].base = lo;
        ms.pending[ms.n].len = n;
        ++ms.n;
        /* Advance to find next run. */
        sortslice_advance(&lo, n);
        nremaining -= n;
    } while (nremaining);

    if (merge_force_collapse(&ms) < 0)
        goto fail;
    assert(ms.n == 1);
    assert(keys == NULL
           ? ms.pending[0].base.keys == saved_ob_item
           : ms.pending[0].base.keys == &keys[0]);
    assert(ms.pending[0].len == saved_ob_size);
    lo = ms.pending[0].base;

succeed:
    result = Py_None;
fail:
    if (keys != NULL) {
        for (i = 0; i < saved_ob_size; i++)
            Py_DECREF(keys[i]);
        if (saved_ob_size >= MERGESTATE_TEMP_SIZE/2)
            PyMem_Free(keys);
    }

    if (self->allocated != -1 && result != NULL) {
        /* The user mucked with the list during the sort,
         * and we don't already have another error to report.
         */
        PyErr_SetString(PyExc_ValueError, "list modified during sort");
        result = NULL;
    }

    if (reverse && saved_ob_size > 1)
        reverse_slice(saved_ob_item, saved_ob_item + saved_ob_size);

    merge_freemem(&ms);

keyfunc_fail:
    final_ob_item = self->ob_item;
    i = Py_SIZE(self);
    Py_SET_SIZE(self, saved_ob_size);
    self->ob_item = saved_ob_item;
    self->allocated = saved_allocated;
    if (final_ob_item != NULL) {
        /* we cannot use _list_clear() for this because it does not
           guarantee that the list is really empty when it returns */
        while (--i >= 0) {
            Py_XDECREF(final_ob_item[i]);
        }
        PyMem_Free(final_ob_item);
    }
    return Py_XNewRef(result);
}
#undef IFLT
#undef ISLT

int
PyList_Sort(PyObject *v)
{
    if (v == NULL || !PyList_Check(v)) {
        PyErr_BadInternalCall();
        return -1;
    }
    v = list_sort_impl((PyListObject *)v, NULL, 0);
    if (v == NULL)
        return -1;
    Py_DECREF(v);
    return 0;
}

음. 아주 길다.

맨 위의 요약문을 읽어보자면, 튜닝된(?) 병합 정렬을 사용한다는 것 같다.
자세한 내용은 listsort.txt 파일을 참고하라고 나온다.

병합 정렬은 곧 다룰 예정이니 넘어가자.

List Class의 정렬 Method 맛이나 보자.

nums = [5, 2, 1, 4, 0, 3, 9, 8, 7, 6]
nums.sort()
print(nums)

출력 결과 :

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

마지막으로 Python의 내장 함수 sorted도 사용해 보자.
CPython으로 구현되어 있지만, 어느 알고리즘을 사용했는지는 나오지 않는다.

nums = [5, 2, 1, 4, 0, 3, 9, 8, 7, 6]
print(sorted(nums))

출력 결과 :

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

출처

Python Git - cpython/listobjdect.c at main (2228번째 줄 ~ , list_sort_impl function)
https://github.com/python/cpython/blob/main/Objects/listobject.c

GitHub - python/cpython: The Python programming language

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

github.com

Python Git - cpython/listsort.txt at main
https://github.com/python/cpython/blob/main/Objects/listsort.txt

GitHub - python/cpython: The Python programming language

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

github.com

Python Git - cpython/bltinmodule.c at main (2339번째 줄 ~ , builtin_sorted function)
https://github.com/python/cpython/blob/main/Python/bltinmodule.c

GitHub - python/cpython: The Python programming language

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

github.com

저작자표시 비영리 변경금지 (새창열림)

'기술 공부 > 알고리즘' 카테고리의 다른 글

알고리즘, 파이썬을 곁들인 (7) - 병합 정렬 (0)	2023.01.05
알고리즘, 파이썬을 곁들인 (6) - 삽입 정렬 (0)	2023.01.04
알고리즘, 파이썬을 곁들인 (4) - 순차 탐색 (0)	2023.01.03
알고리즘, 파이썬을 곁들인 (3) (0)	2022.12.21
알고리즘, 파이썬을 곁들인 (2) (0)	2022.12.21

MoonsRainbow

MoonsRainbow

태그

최근글

댓글

공지사항

아카이브

'기술 공부 > 알고리즘' 카테고리의 다른 글

관련글

티스토리툴바