Arxiv Daily - 2024-03-06

Index Arxiv ID Title First Author Submit Date Abstract PDF Links Github Code Paper With Code
0 2309.14737v2 Volumetric Semantically Consistent 3D Panoptic Mapping Yang Miao 2023-09-26 Show Abstract PDF Link
1 2403.02175v2 LiSTA: Geometric Object-Based Change Detection in Cluttered Environments Joseph Rowell 2024-03-04 Show Abstract PDF
2 2309.15065v2 Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding Christina Kassab 2023-09-26 Show Abstract PDF
3 2403.02280v1 Tightly-Coupled LiDAR-Visual-Inertial SLAM and Large-Scale Volumetric Occupancy Mapping Simon Boche 2024-03-04 Show Abstract PDF
4 2403.02235v1 Structure from WiFi (SfW): RSSI-based Geometric Mapping of Indoor Environments Junseo Kim 2024-03-04 Show Abstract PDF
5 2309.06635v3 Collaborative Dynamic 3D Scene Graphs for Automated Driving Elias Greve 2023-09-12 Show Abstract PDF Link Link
6 2309.10314v4 Dive Deeper into Rectifying Homography for Stereo Camera Online Self-Calibration Hongbo Zhao 2023-09-19 Show Abstract PDF
7 2402.03246v3 SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM Mingrui Li 2024-02-05 Show Abstract PDF
8 2403.01110v1 Grid-based Fast and Structural Visual Odometry Zhang Zhihe 2024-03-02 Show Abstract PDF
9 2306.14137v2 BotanicGarden: A High-Quality Dataset for Robot Navigation in Unstructured Natural Environments Yuanzhi Liu 2023-06-25 Show Abstract PDF Link
10 2403.00976v1 Joint Spatial-Temporal Calibration for Camera and Global Pose Sensor Junlin Song 2024-03-01 Show Abstract PDF
11 2402.14591v2 High-Speed Detector For Low-Powered Devices In Aerial Grasping Ashish Kumar 2024-02-22 Show Abstract PDF
12 2309.10225v2 VPRTempo: A Fast Temporally Encoded Spiking Neural Network for Visual Place Recognition Adam D. Hines 2023-09-19 Show Abstract PDF Link
13 2403.00228v1 DISORF: A Distributed Online NeRF Training and Rendering Framework for Mobile Robots Chunlin Li 2024-03-01 Show Abstract PDF
14 2309.05134v2 Benchmarking ground truth trajectories with robotic total stations Effie Daum 2023-09-10 Show Abstract PDF
15 2402.18771v1 NARUTO: Neural Active Reconstruction from Uncertain Target Observations Ziyue Feng 2024-02-29 Show Abstract PDF
16 2402.18318v1 SD-SLAM: A Semantic SLAM Approach for Dynamic Scenes Based on LiDAR Point Clouds Feiya Li 2024-02-28 Show Abstract PDF
17 2402.18174v1 Generation of skill-specific maps from graph world models for robotic systems Koen de Vos 2024-02-28 Show Abstract PDF
18 2402.03762v4 MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction Heng Zhou 2024-02-06 Show Abstract PDF
19 2402.13609v2 VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks Yutong Wang 2024-02-21 Show Abstract PDF Link
20 2402.16082v1 Modeling Point Uncertainty in Radar SLAM Yang Xu 2024-02-25 Show Abstract PDF
21 2402.14345v2 An Error-Matching Exclusion Method for Accelerating Visual SLAM Shaojie Zhang 2024-02-22 Show Abstract PDF
22 2402.13488v2 A Feature Matching Method Based on Multi-Level Refinement Strategy Shaojie Zhang 2024-02-21 Show Abstract PDF
23 2402.15961v1 VOLoc: Visual Place Recognition by Querying Compressed Lidar Map Xudong Cai 2024-02-25 Show Abstract PDF Link
24 2402.11790v2 CoLRIO: LiDAR-Ranging-Inertial Centralized State Estimation for Robotic Swarms Shipeng Zhong 2024-02-19 Show Abstract PDF Link
25 2402.14308v1 Ground-Fusion: A Low-cost Ground SLAM System Robust to Corner Cases Jie Yin 2024-02-22 Show Abstract PDF
26 2402.14280v1 Secure Navigation using Landmark-based Localization in a GPS-denied Environment Ganesh Sapkota 2024-02-22 Show Abstract PDF
27 2402.13817v1 Khronos: A Unified Approach for Spatio-Temporal Metric-Semantic SLAM in Dynamic Environments Lukas Schmid 2024-02-21 Show Abstract PDF Link
28 2402.13537v1 EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization Zhendong Xiao 2024-02-21 Show Abstract PDF
29 2402.13255v1 How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey Fabio Tosi 2024-02-20 Show Abstract PDF
30 2402.07429v2 Particle Filter SLAM for Vehicle Localization Tianrui Liu 2024-02-12 Show Abstract PDF
31 2402.12551v1 Landmark-based Localization using Stereo Vision and Deep Learning in GPS-Denied Battlefield Environment Ganesh Sapkota 2024-02-19 Show Abstract PDF
32 2402.12149v1 MLFEF: Machine Learning Fusion Model with Empirical Formula to Explore the Momentum in Competitive Sports Ruixin Peng 2024-02-19 Show Abstract PDF
33 2402.11680v1 3D Point Cloud Compression with Recurrent Neural Network and Image Compression Methods Till Beemelmanns 2024-02-18 Show Abstract PDF Link
34 2402.09944v1 Loopy-SLAM: Dense Neural SLAM with Loop Closures Lorenzo Liso 2024-02-14 Show Abstract PDF
35 2402.08897v1 RB5 Low-Cost Explorer: Implementing Autonomous Long-Term Exploration on Low-Cost Robotic Hardware Adam Seewald 2024-02-14 Show Abstract PDF Link
36 2402.08846v1 An Embarrassingly Simple Approach for LLM with Strong ASR Capacity Ziyang Ma 2024-02-13 Show Abstract PDF
37 2309.06950v3 3D Active Metric-Semantic SLAM Yuezhan Tao 2023-09-13 Show Abstract PDF
38 2402.08125v1 Customizable Perturbation Synthesis for Robust SLAM Benchmarking Xiaohao Xu 2024-02-12 Show Abstract PDF Link
39 2402.07537v1 UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments Ahmed Radwan 2024-02-12 Show Abstract PDF
40 2402.06951v1 Semantic Object-level Modeling for Robust Visual Camera Relocalization Yifan Zhu 2024-02-10 Show Abstract PDF
41 2212.04745v3 SLAM for Visually Impaired People: a Survey Marziyeh Bamdad 2022-12-09 Show Abstract PDF
42 2402.06131v1 PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes Xinggang Hu 2024-02-09 Show Abstract PDF
43 2309.14063v3 Preferential Multi-Target Search in Indoor Environments using Semantic SLAM Akash Chikhalikar 2023-09-25 Show Abstract PDF
44 2402.05254v1 Online and Certifiably Correct Visual Odometry and Mapping Devansh R Agrawal 2024-02-07 Show Abstract PDF
45 2402.05003v1 Efficient Invariant Kalman Filter for Inertial-based Odometry with Large-sample Environmental Measurements Xinghan Li 2024-02-07 Show Abstract PDF Link
46 2309.14641v2 Adaptive Denoising-Enhanced LiDAR Odometry for Degeneration Resilience in Diverse Terrains Mazeyu Ji 2023-09-26 Show Abstract PDF
Index Arxiv ID Title First Author Submit Date Abstract PDF Links Github Code Paper With Code
0 2402.11431v2 A Robust Error-Resistant View Selection Method for 3D Reconstruction Shaojie Zhang 2024-02-18 Show Abstract PDF
1 2402.14650v1 GaussianPro: 3D Gaussian Splatting with Progressive Propagation Kai Cheng 2024-02-22 Show Abstract PDF
2 2402.12025v1 Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? Marco Gaido 2024-02-19 Show Abstract PDF
3 2402.11287v1 Dense Matchers for Dense Tracking Tomáš Jelínek 2024-02-17 Show Abstract PDF
4 2209.03910v2 PixTrack: Precise 6DoF Object Pose Tracking using NeRF Templates and Feature-metric Alignment Prajwal Chidananda 2022-09-08 Show Abstract PDF Link
5 2309.11883v2 On-the-Fly SfM: What you capture is What you get Zongqian Zhan 2023-09-21 Show Abstract PDF Link
6 2311.17245v4 LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS Zhiwen Fan 2023-11-28 Show Abstract PDF Link
7 2312.10109v2 Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement Xiaofeng Zhang 2023-12-15 Show Abstract PDF Link
8 2401.17592v1 Local Feature Matching Using Deep Learning: A Survey Shibiao Xu 2024-01-31 Show Abstract PDF
9 2304.07250v3 Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments Felix Ott 2023-04-14 Show Abstract PDF
10 2306.15667v4 PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment Jianyuan Wang 2023-06-27 Show Abstract PDF
11 2401.14289v1 Speech foundation models on intelligibility prediction for hearing-impaired listeners Santiago Cuervo 2024-01-24 Show Abstract PDF
12 2401.11711v1 HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided Neural Radiance Fields for Sparse View Inputs Zelin Gao 2024-01-22 Show Abstract PDF
13 2401.10886v1 SCENES: Subpixel Correspondence Estimation With Epipolar Supervision Dominik A. Kloepfer 2024-01-19 Show Abstract PDF
14 2401.09252v1 3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey Thiago Lopes Trugillo da Silveira 2024-01-17 Show Abstract PDF
15 2401.08937v1 ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization Weiyao Wang 2024-01-17 Show Abstract PDF
16 2401.08043v1 Cross-Modal Semi-Dense 6-DoF Tracking of an Event Camera in Challenging Conditions Yi-Fan Zuo 2024-01-16 Show Abstract PDF Link
17 2301.08422v3 A vision-based autonomous UAV inspection framework for unknown tunnel construction sites with dynamic obstacles Zhefan Xu 2023-01-20 Show Abstract PDF Link
18 2304.03930v2 Photometric Correction for Infrared Sensors Jincheng Zhang 2023-04-08 Show Abstract PDF
19 2401.05236v1 Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects Tianhang Cheng 2024-01-10 Show Abstract PDF Link
20 2401.03450v1 A Classification of Critical Configurations for any Number of Projective Views Martin Bråtelund 2024-01-07 Show Abstract PDF Link
21 2312.11153v2 Research on Multilingual Natural Scene Text Detection Algorithm Tao Wang 2023-12-18 Show Abstract PDF
22 2306.09012v3 Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization Dror Aiger 2023-06-15 Show Abstract PDF Link
23 2310.03704v3 Pose-Free Generalizable Rendering Transformer Zhiwen Fan 2023-10-05 Show Abstract PDF
24 2312.15471v1 Residual Learning for Image Point Descriptors Rashik Shrestha 2023-12-24 Show Abstract PDF
25 2312.13977v2 NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views Han Huang 2023-12-21 Show Abstract PDF
26 2312.10529v1 Transformers in Unsupervised Structure-from-Motion Hemang Chawla 2023-12-16 Show Abstract PDF Link
27 2312.08863v1 HeadRecon: High-Fidelity 3D Head Reconstruction from Monocular Video Xueying Wang 2023-12-14 Show Abstract PDF
28 2312.08760v1 CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning Qingsong Yan 2023-12-14 Show Abstract PDF
29 2312.07504v1 COLMAP-Free 3D Gaussian Splatting Yang Fu 2023-12-12 Show Abstract PDF
30 2312.06865v1 Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach Travis Driver 2023-12-11 Show Abstract PDF
31 2312.06741v1 Gaussian Splatting SLAM Hidenobu Matsuki 2023-12-11 Show Abstract PDF
32 2308.08479v3 DeDoDe: Detect, Don't Describe -- Describe, Don't Detect for Local Feature Matching Johan Edstedt 2023-08-16 Show Abstract PDF Link
33 2312.05889v1 SuperPrimitive: Scene Reconstruction at a Primitive Level Kirill Mazur 2023-12-10 Show Abstract PDF
34 2312.04563v1 Visual Geometry Grounded Deep Structure From Motion Jianyuan Wang 2023-12-07 Show Abstract PDF
35 2308.15984v2 Learning Structure-from-Motion with Graph Attention Networks Lucas Brynte 2023-08-30 Show Abstract PDF
36 2312.00451v1 FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting Zehao Zhu 2023-12-01 Show Abstract PDF Link
37 2311.18801v1 Distributed Global Structure-from-Motion with a Deep Front-End Ayush Baid 2023-11-30 Show Abstract PDF Link
38 2307.11702v3 SACReg: Scene-Agnostic Coordinate Regression for Visual Localization Jerome Revaud 2023-07-21 Show Abstract PDF
39 2311.11808v2 Robot Hand-Eye Calibration using Structure-from-Motion Nicolas Andreff 2023-11-20 Show Abstract PDF
Index Arxiv ID Title First Author Submit Date Abstract PDF Links Github Code Paper With Code
0 2403.03218v1 The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Nathaniel Li 2024-03-05 Show Abstract PDF
1 2403.03203v1 CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments Savitha Sam Abraham 2024-03-05 Show Abstract PDF
2 2403.01548v2 In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation Shiqi Chen 2024-03-03 Show Abstract PDF Link
3 2403.03194v1 MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets Hossein Aboutalebi 2024-03-05 Show Abstract PDF
4 2403.01777v2 NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models Lizhou Fan 2024-03-04 Show Abstract PDF
5 2403.03188v1 Towards Democratized Flood Risk Management: An Advanced AI Assistant Enabled by GPT-4 for Enhanced Interpretability and Public Engagement Rafaela Martelo 2024-03-05 Show Abstract PDF Link
6 2403.01616v2 Towards Comprehensive Vietnamese Retrieval-Augmented Generation and Large Language Models Nguyen Quang Duc 2024-03-03 Show Abstract PDF
7 2310.00194v2 A Prefrontal Cortex-inspired Architecture for Planning in Large Language Models Taylor Webb 2023-09-30 Show Abstract PDF Link
8 2403.03170v1 SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection Peng Qi 2024-03-05 Show Abstract PDF
9 2403.03167v1 PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset Arda Uzunoğlu 2024-03-05 Show Abstract PDF Link
10 2403.03163v1 Design2Code: How Far Are We From Automating Front-End Engineering? Chenglei Si 2024-03-05 Show Abstract PDF
11 2306.04735v2 Soft-prompt Tuning for Large Language Models to Evaluate Bias Jacob-Junqi Tian 2023-06-07 Show Abstract PDF
12 2403.03141v1 Language Guided Exploration for RL Agents in Text Environments Hitesh Golchha 2024-03-05 Show Abstract PDF
13 2305.14342v4 Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Hong Liu 2023-05-23 Show Abstract PDF Link
14 2403.03121v1 Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution Flor Miriam Plaza-del-Arco 2024-03-05 Show Abstract PDF
15 2403.03102v1 "In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning Chuanqi Cheng 2024-03-05 Show Abstract PDF
16 2403.03101v1 KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents Yuqi Zhu 2024-03-05 Show Abstract PDF Link
17 2305.14824v3 Mitigating Temporal Misalignment by Discarding Outdated Facts Michael J. Q. Zhang 2023-05-24 Show Abstract PDF Link
18 2403.03031v1 Learning to Use Tools via Cooperative and Interactive Agents Zhengliang Shi 2024-03-05 Show Abstract PDF
19 2309.15065v2 Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding Christina Kassab 2023-09-26 Show Abstract PDF
20 2403.03029v1 Socratic Reasoning Improves Positive Text Rewriting Anmol Goel 2024-03-05 Show Abstract PDF
21 2403.03028v1 Word Importance Explains How Prompts Affect Language Model Outputs Stefan Hackmann 2024-03-05 Show Abstract PDF
22 2403.03017v1 OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following Haochen Shi 2024-03-05 Show Abstract PDF
23 2403.03008v1 Knowledge Graphs as Context Sources for LLM-Based Explanations of Learning Recommendations Hasan Abu-Rasheed 2024-03-05 Show Abstract PDF
24 2401.06071v5 GroundingGPT:Language Enhanced Multi-modal Grounding Model Zhaowei Li 2024-01-11 Show Abstract PDF Link
25 2403.03003v1 Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models Gen Luo 2024-03-05 Show Abstract PDF Link
26 2403.00818v2 DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models Wei He 2024-02-26 Show Abstract PDF Link
27 2403.02993v1 Localized Zeroth-Order Prompt Optimization Wenyang Hu 2024-03-05 Show Abstract PDF
28 2403.02990v1 Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges Bosheng Ding 2024-03-05 Show Abstract PDF
29 2403.00867v2 Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes Xiaomeng Hu 2024-03-01 Show Abstract PDF
30 2403.02969v1 Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception Junwen He 2024-03-05 Show Abstract PDF
31 2403.02966v1 Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering Sungho Ko 2024-03-05 Show Abstract PDF
32 2403.00884v2 Text classification of column headers with a controlled vocabulary: leveraging LLMs for metadata enrichment Margherita Martorana 2024-03-01 Show Abstract PDF
33 2403.02965v1 ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities Ahmad Hassanpour 2024-03-05 Show Abstract PDF
34 2403.02962v1 WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction Zheng Li 2024-03-05 Show Abstract PDF
35 2403.02959v1 SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents Zhitao He 2024-03-05 Show Abstract PDF Link
36 2312.09979v3 LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin Shihan Dou 2023-12-15 Show Abstract PDF
37 2403.02951v1 Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation Bin Zhang 2024-03-05 Show Abstract PDF
38 2312.16044v4 LLMLight: Large Language Models as Traffic Signal Control Agents Siqi Lai 2023-12-26 Show Abstract PDF Link
39 2403.02939v1 PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers Yoonjoo Lee 2024-03-05 Show Abstract PDF
40 2402.18180v3 Human Simulacra: A Step toward the Personification of Large Language Models Qiuejie Xie 2024-02-28 Show Abstract PDF
41 2308.06354v2 Large Language Models to Identify Social Determinants of Health in Electronic Health Records Marco Guevara 2023-08-11 Show Abstract PDF Link
42 2403.02910v1 ImgTrojan: Jailbreaking Vision-Language Models with ONE Image Xijia Tao 2024-03-05 Show Abstract PDF
43 2402.18240v2 Prospect Personalized Recommendation on Large Language Model-based Agent Platform Jizhi Zhang 2024-02-28 Show Abstract PDF Link
44 2403.02901v1 A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods Hanlei Jin 2024-03-05 Show Abstract PDF
45 2401.06509v3 AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents Yuanzhi Liang 2024-01-12 Show Abstract PDF
46 2403.02889v1 In Search of Truth: An Interrogation Approach to Hallucination Detection Yakir Yehuda 2024-03-05 Show Abstract PDF
47 2403.02884v1 MathScale: Scaling Instruction Tuning for Mathematical Reasoning Zhengyang Tang 2024-03-05 Show Abstract PDF
Index Arxiv ID Title First Author Submit Date Abstract PDF Links Github Code Paper With Code
0 2403.03203v1 CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments Savitha Sam Abraham 2024-03-05 Show Abstract PDF
1 2403.03134v1 Simplicity in Complexity Kevin Shen 2024-03-05 Show Abstract PDF
2 2403.02164v2 Cognition is All You Need -- The Next Layer of AI Above Large Language Models Nova Spivack 2024-03-04 Show Abstract PDF
3 2403.02752v1 HINTs: Sensemaking on large collections of documents with Hypergraph visualization and INTelligent agents Sam Yu-Te Lee 2024-03-05 Show Abstract PDF
4 2403.02571v1 DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training Zihao Wang 2024-03-05 Show Abstract PDF
5 2403.02522v1 HeAR -- Health Acoustic Representations Sebastien Baur 2024-03-04 Show Abstract PDF
6 2403.01699v1 Brilla AI: AI Contestant for the National Science and Maths Quiz George Boateng 2024-03-04 Show Abstract PDF
7 2403.01626v1 Using LLMs for Tabletop Exercises within the Security Domain Sam Hays 2024-03-03 Show Abstract PDF
8 2403.01476v1 CCC: Color Classified Colorization Mrityunjoy Gain 2024-03-03 Show Abstract PDF
9 2403.01418v1 A Simple-but-effective Baseline for Training-free Class-Agnostic Counting Yuhao Lin 2024-03-03 Show Abstract PDF
10 2311.05112v4 A Survey of Large Language Models in Medicine: Progress, Application, and Challenge Hongjian Zhou 2023-11-09 Show Abstract PDF Link
11 2403.01323v1 A non-cubic space-filling modular robot Tyler Hummer 2024-03-02 Show Abstract PDF
12 2403.01271v1 Employing LLMs for Incident Response Planning and Review Sam Hays 2024-03-02 Show Abstract PDF
13 2401.05638v2 MatSAM: Efficient Extraction of Microstructures of Materials via Visual Large Model Changtai Li 2024-01-11 Show Abstract PDF Link
14 2402.16338v3 BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning of SAM Li Zhang 2024-02-26 Show Abstract PDF
15 2310.05670v2 Reinforcement learning for freeform robot design Muhan Li 2023-10-09 Show Abstract PDF
16 2303.08774v5 GPT-4 Technical Report OpenAI 2023-03-15 Show Abstract PDF Link
17 2303.18242v2 $\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States Sam Bond-Taylor 2023-03-31 Show Abstract PDF Link
18 2403.00574v1 Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms Toki Tahmid Inan 2024-03-01 Show Abstract PDF
19 2403.00334v1 NOVA: A visual interface for assessing polarizing media coverage Keshav Dasu 2024-03-01 Show Abstract PDF
20 2402.04140v3 Advancing Legal Reasoning: The Integration of AI to Navigate Complexities and Biases in Global Jurisprudence with Semi-Automated Arbitration Processes (SAAPs) Michael De'Shazer 2024-02-06 Show Abstract PDF
21 2402.19145v1 A SAM-guided Two-stream Lightweight Model for Anomaly Detection Chenghao Li 2024-02-29 Show Abstract PDF Link
22 2402.19102v1 FlatNAS: optimizing Flatness in Neural Architecture Search for Out-of-Distribution Robustness Matteo Gambella 2024-02-29 Show Abstract PDF
23 2402.19004v1 RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation Jie Zhang 2024-02-29 Show Abstract PDF
24 2312.05760v2 RepViT-SAM: Towards Real-Time Segmenting Anything Ao Wang 2023-12-10 Show Abstract PDF Link Link
25 2307.09283v7 RepViT: Revisiting Mobile CNN From ViT Perspective Ao Wang 2023-07-18 Show Abstract PDF Link Link
26 2402.18728v1 Not All the Same: Understanding and Informing Similarity Estimation in Tile-Based Video Games Sebastian Berns 2024-02-28 Show Abstract PDF Link Link
27 2402.18659v1 Large Language Models and Games: A Survey and Roadmap Roberto Gallotta 2024-02-28 Show Abstract PDF
28 2402.18204v1 ConvDTW-ACS: Audio Segmentation for Track Type Detection During Car Manufacturing Álvaro López-Chilet 2024-02-28 Show Abstract PDF
29 2309.00655v4 RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion Zhiqiang Yan 2023-09-01 Show Abstract PDF
30 2310.10010v2 Black-box Targeted Adversarial Attack on Segment Anything (SAM) Sheng Zheng 2023-10-16 Show Abstract PDF
31 2402.17972v1 From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments Kanyifeechukwu J. Oguine 2024-02-28 Show Abstract PDF
32 2311.02189v3 FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling Yu Tian 2023-11-03 Show Abstract PDF Link
Index Arxiv ID Title First Author Submit Date Abstract PDF Links Github Code Paper With Code
0 2403.03203v1 CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments Savitha Sam Abraham 2024-03-05 Show Abstract PDF
1 2403.03145v1 Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization Yuxin Guo 2024-03-05 Show Abstract PDF Link
2 2403.03134v1 Simplicity in Complexity Kevin Shen 2024-03-05 Show Abstract PDF
3 2305.14342v4 Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Hong Liu 2023-05-23 Show Abstract PDF Link
4 2309.15065v2 Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding Christina Kassab 2023-09-26 Show Abstract PDF
5 2403.02991v1 MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer Jianjian Cao 2024-03-05 Show Abstract PDF Link
6 2403.02781v1 PromptKD: Unsupervised Prompt Distillation for Vision-Language Models Zheng Li 2024-03-05 Show Abstract PDF Link
7 2403.02714v1 DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization Feng Hou 2024-03-05 Show Abstract PDF
8 2403.02677v1 Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters Weizhi Wang 2024-03-05 Show Abstract PDF
9 2403.02626v1 Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use Imad Eddine Toubal 2024-03-05 Show Abstract PDF
10 2403.02580v1 What do we learn from inverting CLIP models? Hamid Kazemi 2024-03-05 Show Abstract PDF Link
11 2403.02522v1 HeAR -- Health Acoustic Representations Sebastien Baur 2024-03-04 Show Abstract PDF
12 2311.12075v3 BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning Siyuan Liang 2023-11-20 Show Abstract PDF
13 2403.02041v1 A Generative Approach for Wikipedia-Scale Visual Entity Recognition Mathilde Caron 2024-03-04 Show Abstract PDF
14 2310.06836v2 What Does Stable Diffusion Know about the 3D Scene? Guanqi Zhan 2023-10-10 Show Abstract PDF Link
15 2403.01849v1 One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models Lin Li 2024-03-04 Show Abstract PDF Link
16 2403.01840v1 FreeA: Human-object Interaction Detection using Free Annotation Labels Yuxiao Wang 2024-03-04 Show Abstract PDF
17 2403.01560v1 Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition Kun-Yu Lin 2024-03-03 Show Abstract PDF Link
18 2312.05588v2 Language-assisted Vision Model Debugger: A Sample-Free Approach to Finding and Fixing Bugs Chaoquan Jiang 2023-12-09 Show Abstract PDF
19 2403.01422v1 MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies Zhende Song 2024-03-03 Show Abstract PDF
20 2311.00453v2 CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection Xuhai Chen 2023-11-01 Show Abstract PDF
21 2403.01209v1 Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning Shuo Yang 2024-03-02 Show Abstract PDF
22 2308.15109v2 DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection Henghao Zhao 2023-08-29 Show Abstract PDF
23 2306.12105v2 Mass-Producing Failures of Multimodal Systems with Language Models Shengbang Tong 2023-06-21 Show Abstract PDF Link
24 2403.00939v1 G3DR: Generative 3D Reconstruction in ImageNet Pradyumna Reddy 2024-03-01 Show Abstract PDF
25 2403.00436v1 Abductive Ego-View Accident Video Understanding for Safe Driving Perception Jianwu Fang 2024-03-01 Show Abstract PDF
26 2403.00376v1 Invariant Test-Time Adaptation for Vision-Language Model Generalization Huan Ma 2024-03-01 Show Abstract PDF
27 2306.08173v2 Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training Alyssa Huang 2023-06-13 Show Abstract PDF Link
28 2402.19467v2 TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning Kate Sanders 2024-02-29 Show Abstract PDF
29 2402.15021v2 CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models Santiago Castro 2024-02-22 Show Abstract PDF Link
30 2403.00219v1 Multi-modal Attribute Prompting for Vision-Language Models Xin Liu 2024-03-01 Show Abstract PDF
31 2402.19479v1 Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Tsai-Shien Chen 2024-02-29 Show Abstract PDF
32 2309.16782v2 Surgical Tattoos in Infrared: A Dataset for Quantifying Tissue Tracking and Mapping Adam Schmidt 2023-09-28 Show Abstract PDF
33 2403.00853v1 Distributed Momentum Methods Under Biased Gradient Estimations Ali Beikmohammadi 2024-02-29 Show Abstract PDF
34 2402.19150v1 Typographic Attacks in Large Multimodal Models Can be Alleviated by More Informative Prompts Hao Cheng 2024-02-29 Show Abstract PDF
35 2402.19091v1 Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection Christos Koutlis 2024-02-29 Show Abstract PDF
36 2401.04350v2 Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness Sibo Wang 2024-01-09 Show Abstract PDF
37 2402.18490v1 TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding Zhihao Zhang 2024-02-28 Show Abstract PDF
38 2402.18400v1 Balanced Similarity with Auxiliary Prompts: Towards Alleviating Text-to-Image Retrieval Bias for CLIP in Zero-shot Learning Hanyao Wang 2024-02-28 Show Abstract PDF
39 2310.01018v2 Controlling Vision-Language Models for Multi-Task Image Restoration Ziwei Luo 2023-10-02 Show Abstract PDF Link
40 2402.13250v3 Video ReCap: Recursive Captioning of Hour-Long Videos Md Mohaiminul Islam 2024-02-20 Show Abstract PDF
41 2402.17412v2 DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models Shyam Marjit 2024-02-27 Show Abstract PDF
42 2402.17930v1 Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning Tan Zhi-Xuan 2024-02-27 Show Abstract PDF Link
43 2402.17535v1 Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control Thong Nguyen 2024-02-27 Show Abstract PDF Link
44 2402.17205v1 Measuring Vision-Language STEM Skills of Neural Models Jianhao Shen 2024-02-27 Show Abstract PDF
Index Arxiv ID Title First Author Submit Date Abstract PDF Links Github Code Paper With Code
0 2310.06836v2 What Does Stable Diffusion Know about the 3D Scene? Guanqi Zhan 2023-10-10 Show Abstract PDF Link
1 2403.00459v2 Deformable One-shot Face Stylization via DINO Semantic Guidance Yang Zhou 2024-03-01 Show Abstract PDF Link
2 2309.10726v3 Few-Shot Panoptic Segmentation With Foundation Models Markus Käppeler 2023-09-19 Show Abstract PDF Link
3 2402.18362v1 Objective and Interpretable Breast Cosmesis Evaluation with Attention Guided Denoising Diffusion Anomaly Detection Model Sangjoon Park 2024-02-28 Show Abstract PDF
4 2402.15687v1 General Purpose Image Encoder DINOv2 for Medical Image Registration Xinrui Song 2024-02-24 Show Abstract PDF
5 2402.14976v1 Unsupervised Domain Adaptation within Deep Foundation Latent Spaces Dmitry Kangin 2024-02-22 Show Abstract PDF
6 2402.14957v1 The Common Stability Mechanism behind most Self-Supervised Learning Approaches Abhishek Jha 2024-02-22 Show Abstract PDF Link
7 2402.14566v1 Self-supervised Visualisation of Medical Image Datasets Ifeoma Veronica Nwabufo 2024-02-22 Show Abstract PDF Link
8 2402.13181v1 DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models Norman Di Palo 2024-02-20 Show Abstract PDF
9 2402.10513v2 Understanding Delays in AF\_XDP-based Applications Killian Castillon du Perron 2024-02-16 Show Abstract PDF
10 2402.10793v1 Masked Attention is All You Need for Graphs David Buterez 2024-02-16 Show Abstract PDF
11 2402.10717v1 BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion Raktim Kumar Mondol 2024-02-16 Show Abstract PDF
12 2311.18237v2 Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models Raviteja Vemulapalli 2023-11-30 Show Abstract PDF
13 2402.09608v1 Exact, Fast and Expressive Poisson Point Processes via Squared Neural Families Russell Tsuchida 2024-02-14 Show Abstract PDF Link
14 2402.06287v1 AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems Clara Punzi 2024-02-09 Show Abstract PDF
15 2402.03138v1 Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained Representations Stefan Sylvius Wagner 2024-02-05 Show Abstract PDF
16 2402.02851v1 Enhancing Compositional Generalization via Compositional Feature Alignment Haoxiang Wang 2024-02-05 Show Abstract PDF Link
17 2402.02352v1 Region-Based Representations Revisited Michal Shlapentokh-Rothman 2024-02-04 Show Abstract PDF
18 2304.07193v2 DINOv2: Learning Robust Visual Features without Supervision Maxime Oquab 2023-04-14 Show Abstract PDF Link
19 2401.17981v1 Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study Qirui Jiao 2024-01-31 Show Abstract PDF
20 2401.17632v1 What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis Takanori Ashihara 2024-01-31 Show Abstract PDF
21 2310.08873v2 Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language Models Zhen Zhang 2023-10-13 Show Abstract PDF
22 2401.05925v3 CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion Bin Dou 2024-01-11 Show Abstract PDF
23 2401.14555v1 Revisiting Active Learning in the Era of Vision Foundation Models Sanket Rajan Gupte 2024-01-25 Show Abstract PDF Link
24 2401.14159v1 Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Tianhe Ren 2024-01-25 Show Abstract PDF Link
25 2401.13987v1 Cross-Domain Few-Shot Learning via Adaptive Transformer Networks Naeem Paeedeh 2024-01-25 Show Abstract PDF Link
26 2401.11673v1 MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo Chenjie Cao 2024-01-22 Show Abstract PDF Link
27 2401.11311v1 A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models Reda Bensaid 2024-01-20 Show Abstract PDF
28 2401.10815v1 RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision Fernando Pérez-García 2024-01-19 Show Abstract PDF
29 2401.07951v1 Image Similarity using An Ensemble of Context-Sensitive Models Zukang Liao 2024-01-15 Show Abstract PDF
30 2401.06013v2 Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery Beilei Cui 2024-01-11 Show Abstract PDF Link
31 2305.14093v4 Weakly Supervised 3D Open-vocabulary Segmentation Kunhao Liu 2023-05-23 Show Abstract PDF Link
32 2401.02957v1 Denoising Vision Transformers Jiawei Yang 2024-01-05 Show Abstract PDF
33 2401.02361v2 An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Xiangyu Zhao 2024-01-04 Show Abstract PDF Link
34 2211.12735v2 Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration Yunjie Tian 2022-11-23 Show Abstract PDF Link

© 2023 Li Yingping - Powered by Arxiv API.

Made with Pingendo Free  Pingendo logo