====== Publications ====== The :!: denote my favorite or most representative works. ===== Theses ===== [2] **Ortega, P.A.**\\ :!: A Unified Framework for Resource-Bounded Autonomous Agents Interacting with Unknown Environments\\ PhD Thesis, Dept. of Engineering, University of Cambridge, 2011.\\ Thesis supervisor: [[http://mlg.eng.cam.ac.uk/zoubin/|Zoubin Ghahramani]]\\ Thesis committee: [[http://www.hutter1.net/|Marcus Hutter]] and [[http://learning.eng.cam.ac.uk/carl/|Carl E. Rasmussen]]\\ {{:papers:thesis.pdf|[PDF]}}\\ [1] **Ortega, P.A.**\\ Design of Interactive Processing Mechanisms for the Analysis of Brain Waves (in Spanish)\\ Dissertation, School of Physical and Mathematical Sciences, University of Chile, 2005.\\ {{:papers:memoria.pdf|[PDF]}} ===== Articles and Conference Papers ===== ==== 2023 ==== [54] :!: Neural Networks and the Chomsky Hierarchy\\ Delétang G., Ruoss A., Grau-Moya J., Genewein T., Wenliang L.K., Catt E., Cundy C., Hutter M., Legg S., Veness J., **Ortega P.A**\\ International Conference on Learning Representations (ICLR), 2023\\ [[https://arxiv.org/pdf/2207.02098.pdf|[PDF]]] ==== 2022 ==== [53] :!: Beyond Bayes-optimality: meta-learning what you know you don't know\\ Grau-Moya J., Delétang G., Kunesch M., Genewein T., Catt E., Li W.K., Ruoss A., Cundy C., Veness J., Wang J.X., Hutter M., Summerfield C., Legg S., **Ortega P.A**\\ ArXiv:2207.02098, 2022\\ [[https://arxiv.org/pdf/2209.15618.pdf|[PDF]]] [52] Your Policy Regularizer is Secretly an Adversary\\ Brekelmans R., Genewein T., Grau-Moya J., Delétang G., Kunesch M., Legg S., **Ortega P.A.**\\ Transactions on Machine Learning Research, 2022\\ [[https://arxiv.org/pdf/2203.12592.pdf|[PDF]]] ==== 2021 ==== [51] Model-Free Risk-Sensitive Reinforcement Learning\\ Delétang G., Grau-Moya J., Kunesch M., Genewein T., Brekelmans R., Legg S., **Ortega P.A.**\\ DeepMind Technical Report, ArXiv:2111.02907, 2021\\ [[https://arxiv.org/pdf/2111.02907.pdf|[PDF]]] [50] :!: Shaking the foundations: delusions in sequence models for interaction and control\\ **Ortega P.A.**, Kunesch M., Delétang G., Genewein T., Grau-Moya J., Veness J., Buchli J., Degrave J., Piot B., Perolat J., Everitt T., Tallec C., Parisotto E., Erez T., Chen Y., de Freitas, N., Legg S.\\ DeepMind Technical Report, ArXiv:2110.10819, 2021\\ [[https://arxiv.org/pdf/2110.10819.pdf|[PDF]]] [49] Causal Analysis of Agent Behavior for AI Safety\\ Déletang G., Grau-Moya J., Martic M., Genewein T., McGrath T., Mikulik V., Kunesch M., Legg S., **Ortega P.A.**\\ ArXiv:2010.12237, 2020\\ [[https://arxiv.org/pdf/2103.03938.pdf|[PDF]]] [48] From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization\\ Perolat J., Munos R., Lespiau J.-B., Omidshafiei S., Rowland M., **Ortega P.A.**, Burch N., Anthony T., Balduzzi D., De Vylder B., Piliouras G., Lanctot M., Tuyls K.\\ ICML 2021\\ [[https://arxiv.org/pdf/2002.08456.pdf|[PDF]]] ==== 2020 ==== [47] Agent Incentives: A Causal Perspective\\ Everitt T., Carey R., Langlois E., **Ortega P.A.**, Legg S.\\ AAAI Conference on Artificial Intelligence, 2020.\\ [[https://ojs.aaai.org/index.php/AAAI/article/view/17368/17175|[PDF]]] [46] Meta-trained agents implement Bayes-optimal agents\\ Mikulik V., Delétang G., McGrath T., Genewein T., Martic M., Legg S., **Ortega P.A.**\\ Neural Information Processing Systems (NIPS), 2020.\\ [[https://arxiv.org/pdf/2010.11223.pdf|[PDF]]] [45] Algorithms for Causal Reasoning in Probability Trees\\ Genewein T., McGrath T., Delétang G., Mikulik V., Martic M., Legg S., **Ortega P.A.**\\ ArXiv:2010.12237, 2020\\ [[http://arxiv.org/pdf/2010.12237.pdf|[PDF]]][[https://colab.research.google.com/github/deepmind/deepmind_research/blob/master/causal_reasoning/Causal_Reasoning_in_Probability_Trees.ipynb|[Colab Tutorial]]] [44] Action and Perception as Divergence Minimization\\ Hafner D., **Ortega P.A.**, Ba J., Parr T., Friston K., Heess N.\\ arXiv:2009.01791, 2020\\ [[https://arxiv.org/pdf/2009.01791.pdf]|[PDF]]] ==== 2019 ==== [43] Meta reinforcement learning as task inference\\ Humplik J., Galashov A., Hasenclever L., **Ortega P.A.**, Teh Y.W., Heess N.\\ arXiv:1905.06424, 2019\\ [[https://arxiv.org/pdf/1905.06424.pdf|[PDF]]] [42] Intrinsic Social Motivation via Causal Influence in Multi-Agent RL\\ Jaques N., Lazaridou A., Hughes E., Gulcehre C., **Ortega P.A.**, Strouse D.J., Leibo J.Z., de Freitas N.\\ International Conference on Machine Learning (ICML), 2019\\ [[http://proceedings.mlr.press/v97/jaques19a/jaques19a.pdf|[PDF]]] [41] :!: Meta-learning of Sequential Strategies\\ **Ortega P.A.**, Wang J.X., Rowland M., Genewein T., Kurth-Nelson Z., Pascanu R., Heess N., Veness J., Pritzel A., Sprechmann P., Jayakumar S.M., McGrath T., Miller K., Azar M., Osband I., Rabinowitz N., György A., Chiappa S., Osindero S., Teh Y.W., van Hasselt H., de Freitas N., Botvinick M., Legg S.\\ DeepMind Technical Report, 2019\\ [[https://arxiv.org/pdf/1905.03030|[PDF]]] [40] Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings\\ Everitt T., **Ortega P.A.**, Barnes E., Legg S.\\ arXiv:1902.09980, 2019\\ [[https://arxiv.org/pdf/1902.09980.pdf|[PDF]]] [39] Causal Reasoning from Meta-reinforcement Learning\\ Dasgupta I., Wang J., Chiappa S., Mitrovic J., **Ortega P.A.**, Raposo D., Hughes E., Battaglia E., Botvinick M., Kurth-Nelson Z.\\ arXiv:1901.08162, 2019\\ [[https://arxiv.org/pdf/1901.08162.pdf|[PDF]]] ==== 2018 ==== [38] Bayesian Optimistic Kullback-Leibler Exploration\\ Lee K., Kim G.-H., **Ortega P.A.**, Lee D.D., and Kim K.-E.\\ Machine Learning, 2018\\ [[http://link.springer.com/article/10.1007/s10994-018-5767-4|[PDF]]] [37] Modelling Friends and Foes\\ **Ortega, P.A.** and Legg, S.\\ ArXiv:1807.00196, 2018\\ [[https://arxiv.org/pdf/1807.00196.pdf|[PDF]]] ==== 2017 ==== [36] AI safety gridworlds.\\ Leike, J., Martic, M., Krakovna, V., **Ortega, P.A.**, Everitt, T., Lefrancq, A., Orseau, L. and Legg, S.\\ ArXiv:1711.09883, 2017\\ [[https://arxiv.org/pdf/1711.09883.pdf|[PDF]]] ==== 2016 ==== [35] **Ortega, P.A.** and Tishby, N.\\ :!: Memory controls time perception and intertemporal choices\\ ArXiv:1604.05129, 2016\\ [[http://arxiv.org/pdf/1604.05129.pdf|[PDF]]] [34] Human Decision-Making under Limited Time.\\ :!: **Ortega, P.A.** and Stocker, A.A.\\ Neural Information Processing Systems (NIPS), 2016.\\ [[https://papers.nips.cc/paper/6249-human-decision-making-under-limited-time.pdf|[PDF]]] [33] Bayesian Reinforcement Learning with Behavioral Feedback.\\ Hong, T., Lee, J., Kim, K.-E., **Ortega, P.A.**, and Lee, D.D.\\ International Joint Conference on Artificial Intelligence (IJCAI), 2016.\\ [[http://www.ijcai.org/Proceedings/16/Papers/225.pdf|[PDF]]] [32] Decision-making under ambiguity is modulated by visual framing, but not by motor vs. non-motor context. Experiments and an information-theoretic ambiguity model.\\ Grau-Moya, J. and **Ortega, P.A.** and Braun, D.A.\\ PLoS One, 11(4):e0153179, 2015.\\ [[http://bit.ly/24pankA|[PDF]]] ==== 2015 ==== [31] **Ortega, P.A.**, Braun, D.A., Dyer, J.S., Kim, K.-E., and Tishby, N.\\ :!: Information-Theoretic Bounded Rationality\\ ArXiv:1512.06789, 2015\\ [[http://arxiv.org/pdf/1512.06789.pdf|[PDF]]] [30] Commentary: What is epistemic value in free energy models of learning and acting? A bounded rationality perspective.\\ **Ortega, P.A.** and Braun, D.A.\\ Cognitive Neuroscience, 2015.\\ [[http://www.tandfonline.com/doi/abs/10.1080/17588928.2015.1051525|[PDF]]] [29] :!: Subjectivity, Bayesianism, and Causality\\ **Ortega, P.A.**\\ Special Issue on Philosophical Aspects of Pattern Recognition\\ Pattern Recognition Letters, pp. 63-70, 2015 \\ [[http://arxiv.org/pdf/1407.4139.pdf|[PDF]]] [28] Causal reasoning in a prediction task with hidden causes\\ **Ortega, P.A.** and Lee, D.D. and Stocker, A.A.\\ 37th Annual Cognitive Science Society Meeting (CogSci), 2015\\ {{:papers:cogsci2015-submission.pdf|[PDF]}} {{:papers:causalityprediction.pdf|[PDF Slides]}} {{:papers:causality-prediction-task-poster.pdf|[Poster]}} [27] Reactive bandits with attitude\\ **Ortega, P.A.** and Kim, K.-E. and Lee, D.D.\\ 18th International Conference on Artificial Intelligence and Statistics (AISTATS), 2015\\ [[http://jmlr.org/proceedings/papers/v38/ortega15.pdf|[PDF]]] {{:papers:attitude-poster.pdf|[Poster]}} [26] Belief flows for robust online learning\\ **Ortega, P.A.** and Crammer, K. and Lee, D.D.\\ Information Theory and Applications (ITA), pp. 70-77, 2015\\ [[http://ita.ucsd.edu/workshop/15/files/paper/paper_481.pdf|[PDF]]] {{:papers:belief-flows-talk-ita.pdf|[PDF Slides]}} [25] Perceptual adaptation: Getting ready for the future\\ Wei, X.-X. and **Ortega, P.A.** and Stocker, A.A.\\ Computational and Systems Neuroscience (Cosyne), 2015\\ :!: Won best poster award ==== 2014 ==== [24] Information-theoretic bounded rationality and $ϵ$-optimality \\ Braun, D.A. and **Ortega, P.A**\\ Entropy 16(8), 4662-4676, 2014\\ [[http://www.mdpi.com/1099-4300/16/8/4662|[PDF]]] [23] **Ortega, P.A.** and Lee, D.D.\\ An Adversarial Interpretation of Information-Theoretic Bounded Rationality\\ Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI '14), 2014\\ [[http://arxiv.org/pdf/1404.5668.pdf|[PDF]]] {{::adversarial-poster.pdf|[Poster]}} [22] Generalized Thompson Sampling for Sequential Decision-Making and Causal Inference\\ **Ortega, P.A.** and Braun, D.A.\\ Complex Adaptive Systems Modeling 2:2, 2014 \\ [[http://www.casmodeling.com/content/2/1/2|[PDF]]] [21] Dynamic Belief State Representations \\ Lee, D.D., **Ortega, P.A.** and Stocker, A.\\ Current Opinion in Neurobiology 25, pp. 221–227, 2014 \\ [[http://www.sciencedirect.com/science/article/pii/S0959438814000348|[PDF]]] [20] :!: Monte Carlo Methods for Exact & Efficient Solution of the Generalized Optimality Equations\\ **Ortega, P.A.**, Braun, D.A. and Tishby, N. \\ IEEE International Conference on Robotics and Automation (ICRA), 2014\\ {{::ortegabrauntishby_icra14.pdf|[PDF]}} ==== 2013 ==== [19] An Adversarial Interpretation of Information-Theoretic Bounded Rationality\\ **Ortega, P.A.** and Lee, D.\\ NIPS Workshop on Planning with Information Constraints, 2013\\ {{:game.pdf|[PDF]}} [18] :!: Thermodynamics as a theory of decision-making with information processing costs\\ **Ortega, P.A.** and Braun, D.A.\\ Proceedings of the Royal Society A 20120683, 2013.\\ [[http://rspa.royalsocietypublishing.org/content/469/2153/20120683.abstract|[PDF]]] [17] Metabolic cost as an organizing principle for cooperative learning\\ Balduzzi D., **Ortega, P.A.** and Besserve, M.\\ Advances in Complex Systems 2013.\\ [[http://arxiv.org/abs/1202.4482|[PDF]]] ==== 2012 ==== [16] Adaptive Coding of Actions and Observations\\ **Ortega, P.A.** and Braun, D.A.\\ NIPS Workshop on Information in Perception and Action, 2012.\\ {{:papers:adaptivecodingactionsobservations.pdf|[PDF]}} [15] A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function\\ **Ortega, P.A.**, Grau-Moya, J., Genewein, T., Balduzzi, D. and Braun, D.A.\\ Neural Information Processing Systems (NIPS) 2012\\ [[http://arxiv.org/pdf/1206.1898|[PDF]]] [[:argmaxprior|[CODE]]]\\ [14] Risk-Sensitivity in Bayesian Sensorimotor Integration\\ Grau-Moya, J., **Ortega, P.A.** and Braun, D.A. (2012)\\ PLOS Computational Biology 8(9): e1002698\\ {{:papers:risksensitivitybayesiansensorimotor.pdf|[PDF]}} [13] :!: Free Energy and the Generalized Optimality Equations for Sequential Decision Making\\ **Ortega, P.A.** and Braun, D.A. \\ European Workshop on Reinforcement Learning 2012\\ {{:papers:generalizedoptimality.pdf|[PDF]}}\\ ==== 2011 ==== [12] **Ortega, P.A.**\\ :!: Bayesian Causal Induction\\ NIPS Workshop on Philosophy and Machine Learning, 2011.\\ {{:papers:bayesiancausalinduction.pdf|[PDF]}}\\ [11] Information, Utility and Bounded Rationality \\ **Ortega, P.A.** and Braun, D.A. \\ The fourth conference on artificial general intelligence, pp. 269-274, 2011. \\ {{:papers:utilityinfoboundedrationality.pdf|[PDF]}} [10] Reinforcement Learning and the Bayesian Control Rule \\ **Ortega, P.A.** and Braun, D.A. and Godsill, S.J. \\ The fourth conference on artificial general intelligence, pp. 281-285, 2011.\\ {{:papers:actor-critic.pdf|[PDF]}} [9] Motor coordination: When two have to act as one \\ Braun, D.A. and **Ortega P.A.** and Wolpert D.M. \\ 2011 Special issue of Experimental Brain Research on Joint Action \\ {{:papers:motorcoordination.pdf|[PDF]}}\\ [8] Path Integral Control and Bounded Rationality \\ Braun, D.A. and **Ortega, P.A.** and Theodorou, E. and Schaal, S. \\ 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris. \\ {{:papers:pathintboundedrationality.pdf|[PDF]}}\\ ==== 2010 ==== [7] :!: A minimum relative entropy principle for learning and acting \\ **Ortega, P.A.** and Braun, D.A. \\ Journal of Artificial Intelligence Research 38, pp. 475-511, 2010. \\ {{:papers:minimumrelentropylearningacting.pdf|[PDF]}}\\ [6] A minimum relative entropy principle for adaptive control in linear quadratic regulators \\ Braun, D.A. and **Ortega, P.A.** \\ Proceedings of the 7th international conference on informatics in control, automation and robotics, pp. 103-108, 2010 \\ {{:papers:mreforlqr.pdf|[PDF]}}\\ [5] A conversion between utility and information \\ **Ortega, P.A.** and Braun, D.A. \\ The third conference on artificial general intelligence, pp. 115-120, 2010 \\ {{:papers:utilityinfoconversion.pdf|[PDF]}}\\ [4] :!: A Bayesian rule for adaptive control based on causal interventions \\ **Ortega, P.A.** and Braun, D.A. \\ The third conference on artificial general intelligence, pp. 121-126, 2010 \\ {{:papers:bayesianruleforcontrol.pdf|[PDF]}}\\ ==== 2009 and earlier ==== [3] :!: Nash equilibria in multi-agent motor interactions \\ Braun D.A., **Ortega P.A.** & Wolpert D.M. (2009) \\ PLoS Computational Biology 5 (8):e1000468 \\ {{:papers:braortwol09.pdf|[PDF]}}\\ [2] :!: Error Backpropagation with Generalized Functional Composition\\ Bassi, A. and **Ortega, P.A.** \\ Technical Report, Department of Computer Science, University of Chile (2006)\\ {{:papers:bpmachine.pdf|[PDF]}}\\ [1] A Medical Claim Fraud/Abuse Detection System based on Data Mining: A Case Study in Chile \\ **Ortega, P.A.** and Figueroa, C. and Ruz, G. \\ DMIN 2006:224-231 \\ {{:papers:frauddetection.pdf|[PDF]}} \\