본문 바로가기

ML 관련/이미지 처리 관련

[논문 리뷰]On the Unreasonable Effectiveness of Centroids in Image Retrieval

by 탶선 2023. 3. 14.

Abstract

재식별 task SOTA
제안 방법의 전체 architecture

기존 이미지 검색, 재식별 → 갤러리 내 쿼리에서 유사 이미지 검색하는 방식(metric learning: input data간 거리 학습)을 주로 사용
- Hard negative sampling(mining)
  - 클래스 불균형 문제 해결을 위한 방법
    - hard negative(negative인데 postive라고 잘못 예측한 데이터)를 모아 원본 데이터셋에 추가하여 재학습하면 false positive 오류에 강해짐
    - 문제점: 배치의 모든 샘플 사이의 거리 계산하는 방법으로 cost가 커짐
    - 문제점: tiplet loss와 사용할 경우 tiplet loss의 특성 point-to-point loss 때문에 노이즈 레이블 발생하기 쉬움
- triplet loss
  - baseline(anchor)와 positive, negative input들과 비교하는 loss function
  - anchor input - positive input 사이 최소화 / anchor input - negative input 사이 최대화가 목적
  - 문제점: local minima에 쉽게 빠짐 → “학습이 잘 안됨”
- 이러한 문제점 해결을 위한 triplet loss의 point-to-point loss 특성에 의한 문제 해결 위해 point-to-set / point-to-centroid로 변경 제안
  - 이상값, 노이즈 라벨에 견고함
  - 빠른 학습, point-to-point loss ≤ 좋은 성능

Proposed method

Centroid Triplet loss
- 기존 triplet loss 수식()$\alpha:$ positive, negative 쌍 사이의 margin
- A: 앵커 이미지 P: Positive N: Negative
- 목표: A-P 사이의 거리 최소화 / A-N은 최대화

Triplet Loss 수식

제안 Centroid Triplet Loss 수식

- 제안한 Centroid Triplet Loss(CTL) 수식(A: 앵커 이미지 / $C_p$: Positive / $C_n$: Negative)
- 목표: $C_p$-$C_n$ 사이의 거리 최소화 N은 최대화
- Aggregating item representations
  - $k$: 미니 배치 클래스
  - 배치 사이즈: P x M(훈련 단계 동안 각 미니 배치에는 클래스당 M개의 샘플이 있는 P개의 개별 항목 클래스가 포함되어 배치 크기가 P×M이다.)…뭔말인지 모르겠
  - $f$: 이미지 인코딩 신경망
  - $S_k$: k에 대한 샘플 ex) $S_k =$ {$x_1, ..., x_M$} $x_i:$represents an embedding of i-th sample, such that $x_i \in R^D$, with D being thd sample representation size
  - $q_k$: $S_k$의 각 샘플 쿼리
    - 각 훈련 단계에서 $S_k$의 각 샘플을 $q_k$로 사용됨, 나머지 M-1 샘플은 $C_{k_p}$(를 구축하는 사용됨
    - $S_k$의 각 샘플은 쿼리 $q_k$로 사용되고 나머지 𝑀 -1 샘플은 프로토타입 centroid $C_{k_p}$을 구축하는 데 사용

평가하는 동안 쿼리 이미지는 쿼리 세트 Q에서 제공
검색이 수행되기 전에 각 클래스 𝑘의 centroid는 미리 계산
이러한 centroid를 구성하기 위해 클래스 𝑘에 대한 갤러리 세트 $G_k$의 모든 임베딩을 사용
각 클래스 $C_k \in R$ 의 centroid는 주어진 클래스에 속하는 모든 임베딩의 평균으로 계산

[A Strong Baseline for Fashion Retrieval with Person Reidentification Models]에서 설명한 패션 검색 최신 모델에 중심 계산과 CTL을 적용
기본 CNN 모델(ResNet 아키텍처의 변형 사용)과 함께 이미지를 포함하고 평균 풀링 및 배치 정규화를 통해 간단한 feed forward 아키텍처를 통해 이미지를 전달
순방향 전파의 다양한 단계에서 세 가지 별도의 손실 함수 계산
CNN으로 임베딩한 직후 훈련을 위한 centroid 계산 추가
추론을 위한 centroid는 원래 모델과의 일관성을 위해 다음 단계(배치 정규화 후)에서 계산

Implementation Details

Resnet-50 / Resnet50-IBN-A 두개로 테스트
stride = 1 / last convolutional layer, Resnet-50 native 2048 dimensional embedding size
loss functioncenter loss - auxiliary loss(모든 transformer 출력에 동일한 linear를 활용 Diarization loss 사용)
classification loss computed on batch-normalized embedding
triplet loss - raw embedding
Adam optimizer with base learning rate of $1e^{-4}$, multistep learning rate scheduler
decreasing the learning rate by a factor of 10 after $40^{th}$ and $70^{th}$ epoch
Center loss optimized separately by SGD optimizer with LR=0.5
Each model was trained 3 times, for 120 epochs each

Conclusion

3부분으로 구성된 손실 함수 사용
원시 임베딩에서 Triplet Loss 계산
보조 loss로 Center Loss
배치 정규화 임베딩에서 계산된 Classification Loss
[인스턴스 검색 작업에 사용되는 새로운 손실 함수 Centroid Triplet Loss ]

저작자표시 비영리 변경금지 (새창열림)

'ML 관련 > 이미지 처리 관련' 카테고리의 다른 글

[논문 리뷰] Video Person Re-ID: Fantastic Techniques and Where to Find Them (0)	2023.03.20
YOLO version(1~6)별 정리 (0)	2023.03.16
[논문 리뷰] InternImage : Exploring Large-Scale Vision Foundation Models with Deformable Convolutions(2022) [논문 리뷰] (2)	2023.03.13
[논문 리뷰]A Method for Detection of Small Moving Objects inUAV Videos (0)	2023.01.30
[논문 리뷰] Small Object Detection in Remote Sensing Images with Residual Feature Aggregation-Based Super-Resolution and Object Detector Network (2)	2023.01.30

댓글

티스토리툴바