COARSE_TYPES.txt 2.4 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
  1. Atlas Coarse Types for LLM Extraction
  2. ======================================
  3. Use these 12 types when prompting a cheap/small LLM for entity type suggestion.
  4. The suggested type is a hint to the entity resolver for candidate ranking — not
  5. a final classification. Pass 2 (Wikidata QID lookup) promotes to the fine-grained
  6. subtype from the full ontology.
  7. COARSE TYPES
  8. ------------
  9. Person
  10. Organization
  11. Location
  12. CreativeWork
  13. Event
  14. Product
  15. FinancialInstrument
  16. Animal
  17. Disease
  18. Building
  19. FictionalCharacter
  20. Other
  21. PASS 2 PROMOTION MAP
  22. --------------------
  23. Person -> Person
  24. Organization -> Organization
  25. PoliticalParty
  26. MilitaryUnit
  27. MediaOrganization
  28. Location -> Location
  29. Continent
  30. Country
  31. Region
  32. PopulatedPlace
  33. Neighbourhood
  34. NaturalFeature
  35. AdministrativeArea
  36. CreativeWork -> CreativeWork
  37. Film
  38. Book
  39. MusicAlbum
  40. TVSeries
  41. VideoGame
  42. Event -> Event
  43. Product -> Product
  44. Drug
  45. Food
  46. FinancialInstrument -> FinancialInstrument
  47. PublicCompany
  48. StockIndex
  49. Commodity
  50. Cryptocurrency
  51. Currency
  52. Animal -> Animal
  53. Disease -> Disease
  54. Building -> Building
  55. FictionalCharacter -> FictionalCharacter
  56. Other -> Other
  57. Award
  58. Sport
  59. EthnicGroup
  60. Concept
  61. NOTES
  62. -----
  63. - Animal and Disease are kept separate because confusing them with Product
  64. or Concept causes hard resolution failures.
  65. - Building is kept separate because landmarks (Eiffel Tower, White House)
  66. resolve very differently from cities or countries.
  67. - FictionalCharacter is kept separate because confusing a fictional entity
  68. with a real person is a hard failure, not a soft one.
  69. - Award, Sport, EthnicGroup and Concept fall into Other at the coarse level.
  70. A small model will mis-classify these anyway; the QID lookup in pass 2
  71. recovers the correct fine-grained type reliably.