Ömer Veysel Çağatan

LG
h-index14
7papers
105citations
Novelty44%
AI Score47

7 Papers

LGJan 30
Clipping-Free Policy Optimization for Large Language Models

Ömer Veysel Çağatan, Barış Akgün, Gözde Gül Şahin et al. · berkeley

Reinforcement learning has become central to post-training large language models, yet dominant algorithms rely on clipping mechanisms that introduce optimization issues at scale, including zero-gradient regions, reward hacking, and training instability. We propose Clipping-Free Policy Optimization (CFPO), which replaces heuristic clipping with a convex quadratic penalty derived from Total Variation divergence constraints, yielding an everywhere-differentiable objective that enforces stable policy updates without hard boundaries. We evaluate CFPO across both reasoning and alignment settings. In reasoning, CFPO matches clipping-based methods on downstream benchmarks while extending the stable training regime. In alignment, CFPO mitigates verbosity exploitation and reduces capability degradation, while achieving competitive instruction-following performance. CFPO requires only a one-line code change and no additional hyperparameters. Our results suggest that CFPO is a promising drop-in alternative to clipping-based methods for LLM post-training.

35.7LGMay 11Code
Higher Resolution, Better Generalization: Unlocking Visual Scaling in Deep Reinforcement Learning

Raphael Trumpp, Ömer Veysel Çağatan, Barış Akgün et al.

Pixel-based deep reinforcement learning agents are typically trained on heavily downsampled visual observations, a convention inherited from early benchmarks rather than grounded in principled design. In this work, we show that observation resolution is a critical yet overlooked variable for policy learning: higher-resolution inputs can substantially improve both performance and generalization, provided the network architecture can process them effectively. We find that the widely used Impala encoder, which flattens spatial features into a vector, suffers from quadratic parameter growth as resolution increases and fails to leverage the additional visual detail. Replacing this operation with global average pooling, as in the Impoola architecture, decouples parameter count from resolution and yields consistent improvements across resolutions and network widths - at their respective best conditions, visual scaling unlocks a 28 % performance gain for Impoola over Impala. These gains are strongest in environments that require precise perception of small or distant objects, and gradient saliency analysis confirms that the underlying mechanism is a more spatially localized visual attention of the policy at higher resolutions. Our results challenge the prevailing practice of aggressive input downsampling and position resolution-independent architectures as a simple, effective path toward scalable visual deep RL. To facilitate future research on resolution scaling in deep RL, we publicly release the open-source code for the Procgen-HD benchmark: https://github.com/raphajaner/procgen-hd.

LGOct 22, 2024
Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL

Ömer Veysel Çağatan, Barış Akgün

In this study, we investigate the effect of SSL objective modifications within the SPR framework, focusing on specific adjustments such as terminal state masking and prioritized replay weighting, which were not explicitly addressed in the original design. While these modifications are specific to RL, they are not universally applicable across all RL algorithms. Therefore, we aim to assess their impact on performance and explore other SSL objectives that do not accommodate these adjustments like Barlow Twins and VICReg. We evaluate six SPR variants on the Atari 100k benchmark, including versions both with and without these modifications. Additionally, we test the performance of these objectives on the DeepMind Control Suite, where such modifications are absent. Our findings reveal that incorporating specific SSL modifications within SPR significantly enhances performance, and this influence extends to subsequent frameworks like SR-SPR and BBF, highlighting the critical importance of SSL objective selection and related adaptations in achieving data efficiency in self-predictive reinforcement learning.

CVOct 22, 2024
SigCLR: Sigmoid Contrastive Learning of Visual Representations

Ömer Veysel Çağatan

We propose SigCLR: Sigmoid Contrastive Learning of Visual Representations. SigCLR utilizes the logistic loss that only operates on pairs and does not require a global view as in the cross-entropy loss used in SimCLR. We show that logistic loss shows competitive performance on CIFAR-10, CIFAR-100, and Tiny-IN compared to other established SSL objectives. Our findings verify the importance of learnable bias as in the case of SigLUP, however, it requires a fixed temperature as in the SimCLR to excel. Overall, SigCLR is a promising replacement for the SimCLR which is ubiquitous and has shown tremendous success in various domains.

LGSep 24, 2025
Failure Modes of Maximum Entropy RLHF

Ömer Veysel Çağatan, Barış Akgün

In this paper, we show that Simple Preference Optimization (SimPO) can be derived as Maximum Entropy Reinforcement Learning with length-normalized temperature, providing a theoretical foundation for this reference-free method. Motivated by SimPO's strong performance in offline preference optimization, we investigate whether Maximum Entropy RL can achieve similar results in online RLHF settings. Our experiments find that Maximum Entropy RL consistently exhibits overoptimization and unstable KL dynamics, even at very low learning rates. Unlike KL-constrained methods that maintain stable training, entropy regularization fails to prevent reward hacking and appears to correlate with overoptimization. Lastly, we discuss possible explanations for why SimPO succeeds in offline settings while Maximum Entropy RL struggles in online scenarios. Our findings suggest that reference-free approaches may face distinct challenges when applied to online or offline preference learning.

CVMar 8, 2025
Adversarial Robustness of Discriminative Self-Supervised Learning in Vision

Ömer Veysel Çağatan, Ömer Faruk Tal, M. Emre Gürsoy

Self-supervised learning (SSL) has advanced significantly in visual representation learning, yet comprehensive evaluations of its adversarial robustness remain limited. In this study, we evaluate the adversarial robustness of seven discriminative self-supervised models and one supervised model across diverse tasks, including ImageNet classification, transfer learning, segmentation, and detection. Our findings suggest that discriminative SSL models generally exhibit better robustness to adversarial attacks compared to their supervised counterpart on ImageNet, with this advantage extending to transfer learning when using linear evaluation. However, when fine-tuning is applied, the robustness gap between SSL and supervised models narrows considerably. Similarly, this robustness advantage diminishes in segmentation and detection tasks. We also investigate how various factors might influence adversarial robustness, including architectural choices, training duration, data augmentations, and batch sizes. Our analysis contributes to the ongoing exploration of adversarial robustness in visual self-supervised representation systems.

CLJan 27, 2024
UNSEE: Unsupervised Non-contrastive Sentence Embeddings

Ömer Veysel Çağatan

We present UNSEE: Unsupervised Non-Contrastive Sentence Embeddings, a novel approach that outperforms SimCSE in the Massive Text Embedding benchmark. Our exploration begins by addressing the challenge of representation collapse, a phenomenon observed when contrastive objectives in SimCSE are replaced with non-contrastive objectives. To counter this issue, we propose a straightforward solution known as the target network, effectively mitigating representation collapse. The introduction of the target network allows us to leverage non-contrastive objectives, maintaining training stability while achieving performance improvements comparable to contrastive objectives. Our method has achieved peak performance in non-contrastive sentence embeddings through meticulous fine-tuning and optimization. This comprehensive effort has yielded superior sentence representation models, showcasing the effectiveness of our approach.