(West Meeting Room 118-120, Vancouver, December 14, 2024, Website)

Accepted Papers

  1. Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting
  2. Probabilistic Proof State Compression: Optimizing LLM-Guided Formal Verification
  3. Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning
  4. Synchronizing Verbal Responses and Board Writing for Multimodal Math Instruction with LLMs
  5. ABEL: Sample Efficient Online Reinforcement Learning for Neural Theorem Proving
  6. AI-Assisted Generation of Difficult Math Questions
  7. How Transformers Reason: A Case Study on a Synthetic Propositional Logic Problem
  8. Learning Elementary Cellular Automata with Transformers
  9. Math for AI: On the Generalization of Learning Mathematical Problem Solving
  10. Genetic Curriculum Learning for Distribution Generalization on the Travelling Salesman Problem
  11. Structure Based Dataset on SAT Solving with Graph Neural Networks
  12. A Hessian View of Grokking in Mathematical Reasoning
  13. Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning
  14. Generative Verifiers: Reward Modeling as Next-Token Prediction
  15. MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula
  16. Not All LLM Reasoners Are Created Equal
  17. Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically
  18. Machine Learning meets Algebraic Combinatorics: A Suite of Datasets to Accelerate AI for Mathematics Research
  19. Repeated examples help learn arithmetic
  20. VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning
  21. Transformers to Predict the Applicability of Symbolic Integration Routines
  22. NLIR: Natural Language Intermediate Representation for Mechanized Theorem Proving
  23. DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images
  24. Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
  25. Library Learning Doesn't: The Curious Case of the Single-Use "Library"
  26. On Memorization of Large Language Models in Logical Reasoning
  27. Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
  28. MathDSL: A Domain-Specific Language for Concise Mathematical Solutions Via Program Synthesis
  29. Transformers Can Do Arithmetic with the Right Embeddings
  30. miniCTX: Neural Theorem Proving with (Long-)Contexts
  31. Mining Math Conjectures from LLMs: A Pruning Approach
  32. The Art of Knowing When to Stop: Analysis of Optimal Stopping in People and Machines
  33. The Karp Dataset
  34. Towards Faster Quantum Circuit Simulation Using Graph Decompositions, GNNs and Reinforcement Learning
  35. Intermediate Fine-Tuning Improves Mathematical Reasoning in Smaller Models
  36. Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
  37. Interleaving Text and Number Embeddings to Solve Mathemathics Problems
  38. Looped Transformers for Length Generalization
  39. TurtleBench: A Visual Programming Benchmark in Turtle Geometry
  40. Wu's Method Boosts Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry
  41. InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
  42. Models Can and Should Embrace the Communicative Nature of Human-Generated Math
  43. CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models
  44. FEABench: Evaluating Language Models on Real World Physics Reasoning Ability
  45. Reasoning and Tools for Forecasting
  46. Reasoning in Reasoning: A Hierarchical Framework for Better and Faster Neural Theorem Proving
  47. CAFA: Coding as Auto-Formulation Can Boost Large Language Models in Solving Linear Programming Problem
  48. Synthesizing Verified Mathematical Problems
  49. LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
  50. HARDMATH: A Benchmark Dataset for Challenging Problems in Applied Mathematics
  51. SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation
  52. Give me a hint: Can LLMs take a hint to solve math problems?
  53. SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance
  54. Lean-STaR: Learning to Interleave Thinking and Proving
  55. Math2Sym: A System for Solving Elementary Problems via Large Language Models and Symbolic Solvers
  56. Skywork-Math: Data Scaling Laws for Mathematical Reasoning in LLMs — The Story Goes On
  57. Proving Olympiad Algebraic Inequalities without Human Demonstrations
  58. Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
  59. STEM-PoM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing
  60. Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes
  61. Attention Bias as an Inductive Bias: How to Teach Transformers Simple Arithmetic
  62. Learning Mathematical Rules with Large Language Models
  63. Formal Representation and Solution of Plane Geometric Problems
  64. OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
  65. WILT: A Multi-turn, Memorization-Robust Inductive Logic Benchmark for LLMs
  66. Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
  67. VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search
  68. Regress, Don't Guess – A Regression-like Loss on Number Tokens for Language Models
  69. DafnyBench: A Benchmark for Formal Software Verification

The list of accepted papers can be found on OpenReview here.

Reviewers

We are grateful to our fantastic reviewers for making our workshop reviewing process run smoothly: