Skip to content

GH-46299: [C++][Compute] Don't use static inline const for default options #46303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2025

Conversation

kou
Copy link
Member

@kou kou commented May 3, 2025

Rationale for this change

We use static variables for compute function option types. For example:

static auto kFilterOptionsType = GetFunctionOptionsType<FilterOptions>(
DataMember("null_selection_behavior", &FilterOptions::null_selection_behavior));
static auto kTakeOptionsType = GetFunctionOptionsType<TakeOptions>(
DataMember("boundscheck", &TakeOptions::boundscheck));
static auto kDictionaryEncodeOptionsType =
GetFunctionOptionsType<DictionaryEncodeOptions>(DataMember(
"null_encoding_behavior", &DictionaryEncodeOptions::null_encoding_behavior));
static auto kRunEndEncodeOptionsType = GetFunctionOptionsType<RunEndEncodeOptions>(
DataMember("run_end_type", &RunEndEncodeOptions::run_end_type));
static auto kArraySortOptionsType = GetFunctionOptionsType<ArraySortOptions>(
DataMember("order", &ArraySortOptions::order),
DataMember("null_placement", &ArraySortOptions::null_placement));
static auto kSortOptionsType = GetFunctionOptionsType<SortOptions>(
DataMember("sort_keys", &SortOptions::sort_keys),
DataMember("null_placement", &SortOptions::null_placement));
static auto kPartitionNthOptionsType = GetFunctionOptionsType<PartitionNthOptions>(
DataMember("pivot", &PartitionNthOptions::pivot),
DataMember("null_placement", &PartitionNthOptions::null_placement));
static auto kWinsorizeOptionsType = GetFunctionOptionsType<WinsorizeOptions>(
DataMember("lower_limit", &WinsorizeOptions::lower_limit),
DataMember("upper_limit", &WinsorizeOptions::upper_limit));
static auto kSelectKOptionsType = GetFunctionOptionsType<SelectKOptions>(
DataMember("k", &SelectKOptions::k),
DataMember("sort_keys", &SelectKOptions::sort_keys));
static auto kCumulativeOptionsType = GetFunctionOptionsType<CumulativeOptions>(
DataMember("start", &CumulativeOptions::start),
DataMember("skip_nulls", &CumulativeOptions::skip_nulls));
static auto kRankOptionsType = GetFunctionOptionsType<RankOptions>(
DataMember("sort_keys", &RankOptions::sort_keys),
DataMember("null_placement", &RankOptions::null_placement),
DataMember("tiebreaker", &RankOptions::tiebreaker));
static auto kRankQuantileOptionsType = GetFunctionOptionsType<RankQuantileOptions>(
DataMember("sort_keys", &RankQuantileOptions::sort_keys),
DataMember("null_placement", &RankQuantileOptions::null_placement));
static auto kPairwiseOptionsType = GetFunctionOptionsType<PairwiseOptions>(
DataMember("periods", &PairwiseOptions::periods));
static auto kListFlattenOptionsType = GetFunctionOptionsType<ListFlattenOptions>(
DataMember("recursive", &ListFlattenOptions::recursive));
static auto kInversePermutationOptionsType =
GetFunctionOptionsType<InversePermutationOptions>(
DataMember("max_index", &InversePermutationOptions::max_index),
DataMember("output_type", &InversePermutationOptions::output_type));
static auto kScatterOptionsType = GetFunctionOptionsType<ScatterOptions>(
DataMember("max_index", &ScatterOptions::max_index));

These option types are used for compute function options. For example:

: FunctionOptions(internal::kRankOptionsType),

If we use static inline const for compute function options, these compute function options may be initialized BEFORE compute function option types.

We use static inline const only for rank, rank_quantile and rank_normal:

RankMetaFunction()
: RankMetaFunctionBase("rank", Arity::Unary(), rank_doc, &kDefaultOptions) {}
static inline const auto kDefaultOptions = RankOptions::Defaults();

RankQuantileMetaFunction()
: RankMetaFunctionBase("rank_quantile", Arity::Unary(), rank_quantile_doc,
&kDefaultOptions) {}
static inline const auto kDefaultOptions = RankQuantileOptions::Defaults();

RankNormalMetaFunction()
: RankMetaFunctionBase("rank_normal", Arity::Unary(), rank_normal_doc,
&kDefaultOptions) {}
static inline const auto kDefaultOptions = RankQuantileOptions::Defaults();

We can avoid this initialization order problem by deferring default compute function options initialization.

What changes are included in this PR?

  • Use static function instead of static inline const for default compute function options
  • Add tests for default compute function options for rank, rank_quantile and rank_normal

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

…fault options

We use static variables for compute function option types. For
example:
https://github.com/apache/arrow/blob/068416bd411d6a8e2949f8ebcb2f80e2c302ef6b/cpp/src/arrow/compute/api_vector.cc#L127-L170

These option types are used for compute function options. For example:
https://github.com/apache/arrow/blob/068416bd411d6a8e2949f8ebcb2f80e2c302ef6b/cpp/src/arrow/compute/api_vector.cc#L237

If we use `static inline const` for compute function options, these
compute function options may be initialized BEFORE compute function
option types.

We use `static inline const` only for `rank`, `rank_quantile` and
`rank_normal`:

https://github.com/apache/arrow/blob/068416bd411d6a8e2949f8ebcb2f80e2c302ef6b/cpp/src/arrow/compute/kernels/vector_rank.cc#L382-L386

https://github.com/apache/arrow/blob/068416bd411d6a8e2949f8ebcb2f80e2c302ef6b/cpp/src/arrow/compute/kernels/vector_rank.cc#L399-L403

https://github.com/apache/arrow/blob/068416bd411d6a8e2949f8ebcb2f80e2c302ef6b/cpp/src/arrow/compute/kernels/vector_rank.cc#L416-L420

We can avoid this initialization order problem by deferring
default compute function options initialization.
@kou kou requested review from zanmato1984, bkietz and pitrou May 3, 2025 22:36
Copy link

github-actions bot commented May 3, 2025

⚠️ GitHub issue #46299 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This edit LGTM but I wonder is this initialize time long or it breaks some invariant?

@kou
Copy link
Member Author

kou commented May 4, 2025

Sorry, I couldn't understand your concerns... Could you explain them more?

@pitrou
Copy link
Member

pitrou commented May 5, 2025

Aha. I wish they was an easy one-liner for the static function pattern. @bkietz

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for this. I hadn't thought about initialization order.

@mapleFU
Copy link
Member

mapleFU commented May 5, 2025

Sorry, I couldn't understand your concerns... Could you explain them more?

Oh I've seen the issue here, so this is caused by initialization order in different translation units, and this patch changes this as call order. I've no problem now

@@ -381,9 +381,13 @@ class RankMetaFunction : public RankMetaFunctionBase<RankMetaFunction> {
}

RankMetaFunction()
: RankMetaFunctionBase("rank", Arity::Unary(), rank_doc, &kDefaultOptions) {}
: RankMetaFunctionBase("rank", Arity::Unary(), rank_doc, GetDefaultOptions()) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou well it's not one line, but...

Suggested change
: RankMetaFunctionBase("rank", Arity::Unary(), rank_doc, GetDefaultOptions()) {}
: RankMetaFunctionBase("rank", Arity::Unary(), rank_doc, [] {
static const auto kDefaultOptions = RankOptions::Defaults();
return &kDefaultOptions;
}()) {}

An IIFE can introduce a static lifetime object too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIFE?

Copy link
Member

@bkietz bkietz May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Immediately Invoked Function Expression. I don't believe it's an official C++ concept

The expression is (<<nullary lambda for scope>>) (/* called with no args */)

Copy link
Member

@pitrou pitrou May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would work to do something like:

// in .h file
template <typename OptionsType>
const FunctionOptions* StaticOptionsInit() {
  static const auto kDefaultOptions = OptionsType::Defaults();
  return &kDefaultOptions;
};

// in .cc file
class RankMetaFunction : public RankMetaFunctionBase<RankMetaFunction> {
 public:
  RankMetaFunction()
      : RankMetaFunctionBase("rank", Arity::Unary(), rank_doc, StaticOptionsInit<RankOptions>()) {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It seems to work on my machine but I really expected this to be an ODR violation. On my machine, the static local becomes a weak symbol so the linker ignores multiple definitions and just uses the first.

After reading through the provisions for multiple definitions, I tried using an inline variable and that seemed to work also:

template <typename OptionsType>
inline const auto kDefault = OptionsType::Defaults();

I'll keep thinking about this.

Copy link
Member

@bkietz bkietz May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry; that wasn't intended as a serious proposal for this issue- it was just another instance of me being surprised about ODR. I think your StaticOptionsInit is fine; I can't convince myself looking at the standard but here's some precedent. Given that the issue was raised in LLVM and nobody exclaimed about ODR, my intuition is just incorrect here. (Those issues discuss the section group in which the static is emitted, which shouldn't bother us since we're not planning constexpr construction of FunctionOptions)

Also StaticOptionsInit is neat, and maybe we should extend it to be

template <typename T>
const T &Default() {
  static const T kDefault;
  return kDefault;
};

Then it might replace multiple instances of the same pattern in accessors which must return const std::shared_ptr<T>& but don't have one, as in FileSource::path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also StaticOptionsInit is neat, and maybe we should extend it to be

Given the LLVM issue you linked to above, I'm now convinced we shouldn't do something like this :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, what approach should we use here? :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @kou. I think we can use either GetDefaultOptions or the one suggested in #46303 (comment) . At your preference :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's choose GetDefaultOptions() here. GetDefaultOptions() is more straightforward than #46303 (comment) because developers don't need to know IIFE and static variable lifetime.

@github-actions github-actions bot added awaiting merge Awaiting merge awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review awaiting merge Awaiting merge labels May 5, 2025
@kou kou merged commit cf7d7e2 into apache:main May 9, 2025
50 checks passed
@kou kou deleted the cpp-rank-options branch May 9, 2025 21:24
@kou kou removed the awaiting changes Awaiting changes label May 9, 2025
@github-actions github-actions bot added the awaiting changes Awaiting changes label May 9, 2025
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit cf7d7e2.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants