Georgia Gkioxari I am an Assistant Professor of Computing + Mathematical Sciences at Caltech and a William H. Hurt scholar. I am also a visiting researcher at Meta AI in the Embodied AI team. From 2016 to 2022, I was a research scientist at Meta's FAIR team. I received my PhD from UC Berkeley, where I was advised by Jitendra Malik. I did my bachelors in ECE at NTUA in Athens, Greece, where I worked with Petros Maragos. I am the recipient of the PAMI Young Researcher Award (2021). My teammates and I received the PAMI Mark Everingham Award (2021) for the Detectron Library Suite. I was named one of 30 influential women advancing AI in 2019 by ReWork and was nominated for the Women in AI Awards in 2020 by VentureBeat. Read more about me and my work in this Q&A. |
|
The goal of my work is to design visual perception models that bridge the gap between 2D imagery and our 4D world. My research interests lie in computer vision and machine learning. I want to build intelligent systems that perceive the world from as little as one single image -- just like humans do! Our world is complex, it is three dimensional and it is dynamic. Computational models get to observe this world from imagery but only partially as visual data does not completely capture the richness of the world we live in. Below I highlight work that attempts to transform visual data to semantic scene representations in 2D and 3D.
Caltech students (undergrads and grads): If you are at Caltech and wish to work with me, please read the information in this doc.
Prospective postdocs: If you are interested in a postdoc position and want to conduct research in computer vision, 3D understanding and visual perception, please contact me directly with your CV and a short research statement.
Prospective PhD students: I am looking for Ph.D. students to join my group. If you are interested in my group, please apply directly to the CMS department and mention my name in your statement of purpose. There is no need to email me.
|
|
|
|
|
|
Pixel-Aligned Recurrent Queries
for Multi-View 3D Object
Detection |
Multiview Compressive Coding for
3D Reconstruction |
Omni3D: A Large Benchmark and
Model for 3D Object Detection in the Wild |
Learning 3D Object Shape and
Layout without 3D Supervision |
Differentiable Stereopsis:
Meshes
from multiple views using differentiable rendering |
Recognizing Scenes from Novel
Viewpoints |
PyTorch3D |
3D Shape Reconstruction from
Vision and Touch |
SynSin: End-to-end View Synthesis
from
a Single Image |
Mesh R-CNN |
Embodied Question Answering in
Photorealistic Environments with Point Cloud Perception |
Multi-Target Embodied Question
Answering |
Neural Modular Control for
Embodied Question Answering |
Building Generalizable Agents
With
a Realistic And Rich 3D Environment |
Detecting and Recognizing
Human-Object Interactions |
Embodied Question
Answering |
Detect-and-Track: Efficient Pose
Estimation in Videos |
Data Distillation: Towards
Omni-Supervised Learning |
Mask R-CNN |
Learn2Smile: Learning Non-verbal Interaction through Observation |
Chained Predictions Using
Convolutional Neural Networks |
Contextual Action Recognition
with
R*CNN |
Actions and Attributes from Wholes and Parts |
Finding
Action Tubes |
R-CNNs
for
Pose Estimation and Action Detection |
Using
k-poselets for detecting people and localizing their keypoints |
Articulated Pose Estimation using Discriminative Armlet
Classifiers |
Stolen from Jon Barron